TABLESAMPLE patch
Hello,
Attached is a basic implementation of TABLESAMPLE clause. It's SQL
standard clause and couple of people tried to submit it before so I
think I don't need to explain in length what it does - basically returns
"random" sample of a table using a specified sampling method.
I implemented both SYSTEM and BERNOULLI sampling as specified by SQL
standard. The SYSTEM sampling does block level sampling using same
algorithm as ANALYZE, BERNOULLI scans whole table and picks tuple randomly.
There is API for sampling methods which consists of 4 functions at the
moment - init, end, nextblock and nexttuple. I added catalog which maps
the sampling method to the functions implementing this API. The grammar
creates new TableSampleRange struct that I added for sampling. Parser
then uses the catalog to load information about the sampling method into
TableSampleClause which is then attached to RTE. Planner checks for if
this parameter is present in the RTE and if it finds it it will create
plan with just one path - SampleScan. SampleScan implements standard
executor API and calls the sampling method API as needed.
It is possible to write custom sampling methods. The sampling method
parameters are not limited to just percent number as in standard but
dynamic list of expressions which is checked against the definition of
the init function in a similar fashion (although much simplified) as
function calls are.
Notable lacking parts are:
- proper costing and returned row count estimation - given the dynamic
nature of parameters I think for we'll need to let the sampling method
do this, so there will have to be fifth function in the API.
- ruleutils support (it needs a bit of code in get_from_clause_item
function)
- docs are sparse at the moment
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
tablesample-v1.patchtext/x-diff; name=tablesample-v1.patchDownload
diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml
index 01d24a5..250ae29 100644
--- a/doc/src/sgml/ref/select.sgml
+++ b/doc/src/sgml/ref/select.sgml
@@ -49,7 +49,7 @@ SELECT [ ALL | DISTINCT [ ON ( <replaceable class="parameter">expression</replac
<phrase>where <replaceable class="parameter">from_item</replaceable> can be one of:</phrase>
- [ ONLY ] <replaceable class="parameter">table_name</replaceable> [ * ] [ [ AS ] <replaceable class="parameter">alias</replaceable> [ ( <replaceable class="parameter">column_alias</replaceable> [, ...] ) ] ]
+ [ ONLY ] <replaceable class="parameter">table_name</replaceable> [ * ] [ TABLESAMPLE <replaceable class="parameter">sampling_method</replaceable> ( <replaceable class="parameter">argument</replaceable> [, ...] ) [ REPEATABLE ( <replaceable class="parameter">seed</replaceable> ) ] ] [ [ AS ] <replaceable class="parameter">alias</replaceable> [ ( <replaceable class="parameter">column_alias</replaceable> [, ...] ) ] ]
[ LATERAL ] ( <replaceable class="parameter">select</replaceable> ) [ AS ] <replaceable class="parameter">alias</replaceable> [ ( <replaceable class="parameter">column_alias</replaceable> [, ...] ) ]
<replaceable class="parameter">with_query_name</replaceable> [ [ AS ] <replaceable class="parameter">alias</replaceable> [ ( <replaceable class="parameter">column_alias</replaceable> [, ...] ) ] ]
[ LATERAL ] <replaceable class="parameter">function_name</replaceable> ( [ <replaceable class="parameter">argument</replaceable> [, ...] ] )
@@ -317,6 +317,38 @@ TABLE [ ONLY ] <replaceable class="parameter">table_name</replaceable> [ * ]
</varlistentry>
<varlistentry>
+ <term>TABLESAMPLE <replaceable class="parameter">sampling_method</replaceable> ( <replaceable class="parameter">argument</replaceable> [, ...] ) [ REPEATABLE ( <replaceable class="parameter">seed</replaceable> ) ]</term>
+ <listitem>
+ <para>
+ Table sample clause after
+ <replaceable class="parameter">table_name</replaceable> indicates that
+ a <replaceable class="parameter">sampling_method</replaceable> should
+ be used to retrieve subset of rows in the table.
+ The <replaceable class="parameter">sampling_method</replaceable> can be
+ one of:
+ <itemizedlist>
+ <listitem>
+ <para><literal>SYSTEM</literal></para>
+ </listitem>
+ <listitem>
+ <para><literal>BERNOULLI</literal></para>
+ </listitem>
+ </itemizedlist>
+ Both of those sampling methods currently accept only single argument
+ which is the percent (floating point from 0 to 100) of the rows to
+ be returned.
+ The <literal>SYSTEM</literal> sampling method does block level
+ sampling with each block having same chance of being selected and
+ returns all rows from each selected block.
+ The <literal>BERNOULLI</literal> scans whole table and returns
+ individual rows with equal probability.
+ The optional numeric parameter <literal>REPEATABLE</literal> is used
+ as random seed for sampling.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><replaceable class="parameter">alias</replaceable></term>
<listitem>
<para>
diff --git a/src/backend/access/Makefile b/src/backend/access/Makefile
index 21721b4..595737c 100644
--- a/src/backend/access/Makefile
+++ b/src/backend/access/Makefile
@@ -8,6 +8,7 @@ subdir = src/backend/access
top_builddir = ../../..
include $(top_builddir)/src/Makefile.global
-SUBDIRS = brin common gin gist hash heap index nbtree rmgrdesc spgist transam
+SUBDIRS = brin common gin gist hash heap index nbtree rmgrdesc spgist \
+ transam tsm
include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/tsm/Makefile b/src/backend/access/tsm/Makefile
new file mode 100644
index 0000000..73bbbd7
--- /dev/null
+++ b/src/backend/access/tsm/Makefile
@@ -0,0 +1,17 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+# Makefile for access/tsm
+#
+# IDENTIFICATION
+# src/backend/access/tsm/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/backend/access/tsm
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS = tsm_system.o tsm_bernoulli.o
+
+include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/tsm/tsm_bernoulli.c b/src/backend/access/tsm/tsm_bernoulli.c
new file mode 100644
index 0000000..c273ca6
--- /dev/null
+++ b/src/backend/access/tsm/tsm_bernoulli.c
@@ -0,0 +1,135 @@
+/*-------------------------------------------------------------------------
+ *
+ * tsm_bernoulli.c
+ * interface routines for BERNOULLI table sample method
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/access/tsm/tsm_bernoulli.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "fmgr.h"
+
+#include "access/tsm_bernoulli.h"
+
+#include "nodes/execnodes.h"
+#include "storage/bufmgr.h"
+#include "utils/sampling.h"
+
+
+/* Data structure for Algorithm S from Knuth 3.4.2 */
+typedef struct
+{
+ long seed;
+ BlockNumber tblocks;
+ BlockNumber blockno;
+ float percent;
+ OffsetNumber lt; /* last tuple returned from current block */
+} BernoulliSamplerData;
+
+
+Datum
+tsm_bernoulli_init(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ long seed = PG_GETARG_UINT32(1);
+ float4 percent = PG_GETARG_FLOAT4(2);
+ Relation rel = scanstate->ss.ss_currentRelation;
+ BernoulliSamplerData *sampler;
+
+ if (percent < 0 || percent > 100)
+ ereport(ERROR,
+ (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+ errmsg("invalid sample size"),
+ errhint("Sample size can be numeric value between 0 and 100 (inclusive).")));
+
+ sampler = palloc0(sizeof(BernoulliSamplerData));
+
+ /* Remember initial values for reinit */
+ sampler->seed = seed;
+ sampler->tblocks = RelationGetNumberOfBlocks(rel);
+ sampler->blockno = InvalidBlockNumber;
+ sampler->percent = percent / 100;
+ sampler->lt = InvalidOffsetNumber;
+
+ sampler_setseed(seed);
+
+ scanstate->tsmdata = (void *) sampler;
+
+ PG_RETURN_VOID();
+}
+
+Datum
+tsm_bernoulli_nextblock(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ BernoulliSamplerData *sampler = (BernoulliSamplerData *) scanstate->tsmdata;
+
+ if (sampler->blockno == InvalidBlockNumber)
+ sampler->blockno = 0;
+ else if (++sampler->blockno >= sampler->tblocks)
+ PG_RETURN_UINT32(InvalidBlockNumber);
+
+ PG_RETURN_UINT32(sampler->blockno);
+}
+
+Datum
+tsm_bernoulli_nexttuple(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ OffsetNumber maxoffset = PG_GETARG_UINT16(2);
+ BernoulliSamplerData *sampler = (BernoulliSamplerData *) scanstate->tsmdata;
+ OffsetNumber tupoffset = sampler->lt;
+ double percent = sampler->percent;
+
+ if (tupoffset == InvalidOffsetNumber)
+ tupoffset = FirstOffsetNumber;
+ else
+ tupoffset++;
+
+ /* Every tuple has percent chance of being returned */
+ while (sampler_random_fract() > percent)
+ {
+ tupoffset++;
+
+ if (tupoffset > maxoffset)
+ break;
+ }
+
+ if (tupoffset > maxoffset)
+ /* Tell SampleScan that we want next block. */
+ tupoffset = InvalidOffsetNumber;
+
+ sampler->lt = tupoffset;
+
+ PG_RETURN_UINT16(tupoffset);
+}
+
+Datum
+tsm_bernoulli_end(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+
+ pfree(scanstate->tsmdata);
+
+ PG_RETURN_VOID();
+}
+
+Datum
+tsm_bernoulli_reset(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ BernoulliSamplerData *sampler = (BernoulliSamplerData *) scanstate->tsmdata;
+
+ sampler->blockno = InvalidBlockNumber;
+ sampler->lt = InvalidOffsetNumber;
+ sampler_setseed(sampler->seed);
+
+ PG_RETURN_VOID();
+}
diff --git a/src/backend/access/tsm/tsm_system.c b/src/backend/access/tsm/tsm_system.c
new file mode 100644
index 0000000..5834078
--- /dev/null
+++ b/src/backend/access/tsm/tsm_system.c
@@ -0,0 +1,124 @@
+/*-------------------------------------------------------------------------
+ *
+ * tsm_system.c
+ * interface routines for system table sample method
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/access/tsm/tsm_system.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "fmgr.h"
+
+#include "access/tsm_system.h"
+
+#include "nodes/execnodes.h"
+#include "storage/bufmgr.h"
+#include "utils/sampling.h"
+
+
+/* Data structure for Algorithm S from Knuth 3.4.2 */
+typedef struct
+{
+ BlockSamplerData bs;
+ long seed;
+ BlockNumber tblocks;
+ int samplesize;
+ OffsetNumber lt; /* last tuple returned from current block */
+} SystemSamplerData;
+
+
+Datum
+tsm_system_init(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ long seed = PG_GETARG_UINT32(1);
+ float4 percent = PG_GETARG_FLOAT4(2);
+ SystemSamplerData *sampler;
+
+ if (percent < 0 || percent > 100)
+ ereport(ERROR,
+ (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+ errmsg("invalid sample size"),
+ errhint("Sample size can be numeric value between 0 and 100 (inclusive).")));
+
+ sampler = palloc0(sizeof(SystemSamplerData));
+
+ /* Remember initial values for reinit */
+ sampler->seed = seed;
+ sampler->tblocks = RelationGetNumberOfBlocks(scanstate->ss.ss_currentRelation);
+ sampler->samplesize = 1 + (int) (sampler->tblocks * (percent / 100.0));
+ sampler->lt = InvalidOffsetNumber;
+
+ sampler_setseed(seed);
+ BlockSampler_Init(&sampler->bs, sampler->tblocks, sampler->samplesize);
+
+ scanstate->tsmdata = (void *) sampler;
+
+ PG_RETURN_VOID();
+}
+
+Datum
+tsm_system_nextblock(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ SystemSamplerData *sampler = (SystemSamplerData *) scanstate->tsmdata;
+ BlockNumber blockno;
+
+ if (!BlockSampler_HasMore(&sampler->bs))
+ PG_RETURN_UINT32(InvalidBlockNumber);
+
+ blockno = BlockSampler_Next(&sampler->bs);
+
+ PG_RETURN_UINT32(blockno);
+}
+
+Datum
+tsm_system_nexttuple(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ OffsetNumber maxoffset = PG_GETARG_UINT16(2);
+ SystemSamplerData *sampler = (SystemSamplerData *) scanstate->tsmdata;
+ OffsetNumber tupoffset = sampler->lt;
+
+ if (tupoffset == InvalidOffsetNumber)
+ tupoffset = FirstOffsetNumber;
+ else
+ tupoffset++;
+
+ if (tupoffset > maxoffset)
+ tupoffset = InvalidOffsetNumber;
+
+ sampler->lt = tupoffset;
+
+ PG_RETURN_UINT16(tupoffset);
+}
+
+Datum
+tsm_system_end(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+
+ pfree(scanstate->tsmdata);
+
+ PG_RETURN_VOID();
+}
+
+Datum
+tsm_system_reset(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ SystemSamplerData *sampler = (SystemSamplerData *) scanstate->tsmdata;
+
+ sampler->lt = InvalidOffsetNumber;
+ sampler_setseed(sampler->seed);
+ BlockSampler_Init(&sampler->bs, sampler->tblocks, sampler->samplesize);
+
+ PG_RETURN_VOID();
+}
diff --git a/src/backend/catalog/Makefile b/src/backend/catalog/Makefile
index a403c64..5598244 100644
--- a/src/backend/catalog/Makefile
+++ b/src/backend/catalog/Makefile
@@ -39,7 +39,7 @@ POSTGRES_BKI_SRCS = $(addprefix $(top_srcdir)/src/include/catalog/,\
pg_ts_config.h pg_ts_config_map.h pg_ts_dict.h \
pg_ts_parser.h pg_ts_template.h pg_extension.h \
pg_foreign_data_wrapper.h pg_foreign_server.h pg_user_mapping.h \
- pg_foreign_table.h pg_policy.h \
+ pg_foreign_table.h pg_policy.h pg_tablesamplemethod.h \
pg_default_acl.h pg_seclabel.h pg_shseclabel.h pg_collation.h pg_range.h \
toasting.h indexing.h \
)
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 732ab22..4b011c7 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -50,23 +50,13 @@
#include "utils/lsyscache.h"
#include "utils/memutils.h"
#include "utils/pg_rusage.h"
+#include "utils/sampling.h"
#include "utils/sortsupport.h"
#include "utils/syscache.h"
#include "utils/timestamp.h"
#include "utils/tqual.h"
-/* Data structure for Algorithm S from Knuth 3.4.2 */
-typedef struct
-{
- BlockNumber N; /* number of blocks, known in advance */
- int n; /* desired sample size */
- BlockNumber t; /* current block number */
- int m; /* blocks selected so far */
-} BlockSamplerData;
-
-typedef BlockSamplerData *BlockSampler;
-
/* Per-index data for ANALYZE */
typedef struct AnlIndexData
{
@@ -88,10 +78,6 @@ static BufferAccessStrategy vac_strategy;
static void do_analyze_rel(Relation onerel, VacuumStmt *vacstmt,
AcquireSampleRowsFunc acquirefunc, BlockNumber relpages,
bool inh, bool in_outer_xact, int elevel);
-static void BlockSampler_Init(BlockSampler bs, BlockNumber nblocks,
- int samplesize);
-static bool BlockSampler_HasMore(BlockSampler bs);
-static BlockNumber BlockSampler_Next(BlockSampler bs);
static void compute_index_stats(Relation onerel, double totalrows,
AnlIndexData *indexdata, int nindexes,
HeapTuple *rows, int numrows,
@@ -947,94 +933,6 @@ examine_attribute(Relation onerel, int attnum, Node *index_expr)
}
/*
- * BlockSampler_Init -- prepare for random sampling of blocknumbers
- *
- * BlockSampler is used for stage one of our new two-stage tuple
- * sampling mechanism as discussed on pgsql-hackers 2004-04-02 (subject
- * "Large DB"). It selects a random sample of samplesize blocks out of
- * the nblocks blocks in the table. If the table has less than
- * samplesize blocks, all blocks are selected.
- *
- * Since we know the total number of blocks in advance, we can use the
- * straightforward Algorithm S from Knuth 3.4.2, rather than Vitter's
- * algorithm.
- */
-static void
-BlockSampler_Init(BlockSampler bs, BlockNumber nblocks, int samplesize)
-{
- bs->N = nblocks; /* measured table size */
-
- /*
- * If we decide to reduce samplesize for tables that have less or not much
- * more than samplesize blocks, here is the place to do it.
- */
- bs->n = samplesize;
- bs->t = 0; /* blocks scanned so far */
- bs->m = 0; /* blocks selected so far */
-}
-
-static bool
-BlockSampler_HasMore(BlockSampler bs)
-{
- return (bs->t < bs->N) && (bs->m < bs->n);
-}
-
-static BlockNumber
-BlockSampler_Next(BlockSampler bs)
-{
- BlockNumber K = bs->N - bs->t; /* remaining blocks */
- int k = bs->n - bs->m; /* blocks still to sample */
- double p; /* probability to skip block */
- double V; /* random */
-
- Assert(BlockSampler_HasMore(bs)); /* hence K > 0 and k > 0 */
-
- if ((BlockNumber) k >= K)
- {
- /* need all the rest */
- bs->m++;
- return bs->t++;
- }
-
- /*----------
- * It is not obvious that this code matches Knuth's Algorithm S.
- * Knuth says to skip the current block with probability 1 - k/K.
- * If we are to skip, we should advance t (hence decrease K), and
- * repeat the same probabilistic test for the next block. The naive
- * implementation thus requires an anl_random_fract() call for each block
- * number. But we can reduce this to one anl_random_fract() call per
- * selected block, by noting that each time the while-test succeeds,
- * we can reinterpret V as a uniform random number in the range 0 to p.
- * Therefore, instead of choosing a new V, we just adjust p to be
- * the appropriate fraction of its former value, and our next loop
- * makes the appropriate probabilistic test.
- *
- * We have initially K > k > 0. If the loop reduces K to equal k,
- * the next while-test must fail since p will become exactly zero
- * (we assume there will not be roundoff error in the division).
- * (Note: Knuth suggests a "<=" loop condition, but we use "<" just
- * to be doubly sure about roundoff error.) Therefore K cannot become
- * less than k, which means that we cannot fail to select enough blocks.
- *----------
- */
- V = anl_random_fract();
- p = 1.0 - (double) k / (double) K;
- while (V < p)
- {
- /* skip */
- bs->t++;
- K--; /* keep K == N - t */
-
- /* adjust p to be new cutoff point in reduced range */
- p *= 1.0 - (double) k / (double) K;
- }
-
- /* select */
- bs->m++;
- return bs->t++;
-}
-
-/*
* acquire_sample_rows -- acquire a random sample of rows from the table
*
* Selected rows are returned in the caller-allocated array rows[], which
@@ -1089,6 +987,8 @@ acquire_sample_rows(Relation onerel, int elevel,
/* Need a cutoff xmin for HeapTupleSatisfiesVacuum */
OldestXmin = GetOldestXmin(onerel, true);
+ /* Seed the sampler random number generator */
+ sampler_setseed(random());
/* Prepare for sampling block numbers */
BlockSampler_Init(&bs, totalblocks, targrows);
/* Prepare for sampling rows */
@@ -1249,7 +1149,7 @@ acquire_sample_rows(Relation onerel, int elevel,
* Found a suitable tuple, so save it, replacing one
* old tuple at random
*/
- int k = (int) (targrows * anl_random_fract());
+ int k = (int) (targrows * sampler_random_fract());
Assert(k >= 0 && k < targrows);
heap_freetuple(rows[k]);
@@ -1308,13 +1208,6 @@ acquire_sample_rows(Relation onerel, int elevel,
return numrows;
}
-/* Select a random value R uniformly distributed in (0 - 1) */
-double
-anl_random_fract(void)
-{
- return ((double) random() + 1) / ((double) MAX_RANDOM_VALUE + 2);
-}
-
/*
* These two routines embody Algorithm Z from "Random sampling with a
* reservoir" by Jeffrey S. Vitter, in ACM Trans. Math. Softw. 11, 1
@@ -1333,7 +1226,7 @@ double
anl_init_selection_state(int n)
{
/* Initial value of W (for use when Algorithm Z is first applied) */
- return exp(-log(anl_random_fract()) / n);
+ return exp(-log(sampler_random_fract()) / n);
}
double
@@ -1348,7 +1241,7 @@ anl_get_next_S(double t, int n, double *stateptr)
double V,
quot;
- V = anl_random_fract(); /* Generate V */
+ V = sampler_random_fract(); /* Generate V */
S = 0;
t += 1;
/* Note: "num" in Vitter's code is always equal to t - n */
@@ -1380,7 +1273,7 @@ anl_get_next_S(double t, int n, double *stateptr)
tmp;
/* Generate U and X */
- U = anl_random_fract();
+ U = sampler_random_fract();
X = t * (W - 1.0);
S = floor(X); /* S is tentatively set to floor(X) */
/* Test if U <= h(S)/cg(X) in the manner of (6.3) */
@@ -1409,7 +1302,7 @@ anl_get_next_S(double t, int n, double *stateptr)
y *= numer / denom;
denom -= 1;
}
- W = exp(-log(anl_random_fract()) / n); /* Generate W in advance */
+ W = exp(-log(sampler_random_fract()) / n); /* Generate W in advance */
if (exp(log(y) / n) <= (t + X) / t)
break;
}
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 332f04a..2b1186b 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -725,6 +725,7 @@ ExplainPreScanNode(PlanState *planstate, Bitmapset **rels_used)
case T_WorkTableScan:
case T_ForeignScan:
case T_CustomScan:
+ case T_SampleScan:
*rels_used = bms_add_member(*rels_used,
((Scan *) plan)->scanrelid);
break;
@@ -951,6 +952,9 @@ ExplainNode(PlanState *planstate, List *ancestors,
else
pname = sname;
break;
+ case T_SampleScan:
+ pname = sname = "Sample Scan";
+ break;
case T_Material:
pname = sname = "Materialize";
break;
@@ -1068,6 +1072,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
case T_WorkTableScan:
case T_ForeignScan:
case T_CustomScan:
+ case T_SampleScan:
ExplainScanTarget((Scan *) plan, es);
break;
case T_IndexScan:
@@ -1320,6 +1325,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
case T_CteScan:
case T_WorkTableScan:
case T_SubqueryScan:
+ case T_SampleScan:
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
@@ -2148,6 +2154,7 @@ ExplainTargetRel(Plan *plan, Index rti, ExplainState *es)
case T_TidScan:
case T_ForeignScan:
case T_CustomScan:
+ case T_SampleScan:
case T_ModifyTable:
/* Assert it's on a real relation */
Assert(rte->rtekind == RTE_RELATION);
diff --git a/src/backend/executor/Makefile b/src/backend/executor/Makefile
index af707b0..75f799c 100644
--- a/src/backend/executor/Makefile
+++ b/src/backend/executor/Makefile
@@ -21,7 +21,7 @@ OBJS = execAmi.o execCurrent.o execGrouping.o execJunk.o execMain.o \
nodeLimit.o nodeLockRows.o \
nodeMaterial.o nodeMergeAppend.o nodeMergejoin.o nodeModifyTable.o \
nodeNestloop.o nodeFunctionscan.o nodeRecursiveunion.o nodeResult.o \
- nodeSeqscan.o nodeSetOp.o nodeSort.o nodeUnique.o \
+ nodeSamplescan.o nodeSeqscan.o nodeSetOp.o nodeSort.o nodeUnique.o \
nodeValuesscan.o nodeCtescan.o nodeWorktablescan.o \
nodeGroup.o nodeSubplan.o nodeSubqueryscan.o nodeTidscan.o \
nodeForeignscan.o nodeWindowAgg.o tstoreReceiver.o spi.o
diff --git a/src/backend/executor/execAmi.c b/src/backend/executor/execAmi.c
index 7027d7f..1826059 100644
--- a/src/backend/executor/execAmi.c
+++ b/src/backend/executor/execAmi.c
@@ -39,6 +39,7 @@
#include "executor/nodeNestloop.h"
#include "executor/nodeRecursiveunion.h"
#include "executor/nodeResult.h"
+#include "executor/nodeSamplescan.h"
#include "executor/nodeSeqscan.h"
#include "executor/nodeSetOp.h"
#include "executor/nodeSort.h"
@@ -155,6 +156,10 @@ ExecReScan(PlanState *node)
ExecReScanSeqScan((SeqScanState *) node);
break;
+ case T_SampleScanState:
+ ExecReScanSampleScan((SampleScanState *) node);
+ break;
+
case T_IndexScanState:
ExecReScanIndexScan((IndexScanState *) node);
break;
@@ -480,6 +485,9 @@ ExecSupportsBackwardScan(Plan *node)
}
return false;
+ case T_SampleScan:
+ return false;
+
case T_Material:
case T_Sort:
/* these don't evaluate tlist */
diff --git a/src/backend/executor/execCurrent.c b/src/backend/executor/execCurrent.c
index d5079ef..613f799 100644
--- a/src/backend/executor/execCurrent.c
+++ b/src/backend/executor/execCurrent.c
@@ -261,6 +261,7 @@ search_plan_tree(PlanState *node, Oid table_oid)
* Relation scan nodes can all be treated alike
*/
case T_SeqScanState:
+ case T_SampleScanState:
case T_IndexScanState:
case T_IndexOnlyScanState:
case T_BitmapHeapScanState:
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index e27c062..a1cba97 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -102,6 +102,7 @@
#include "executor/nodeNestloop.h"
#include "executor/nodeRecursiveunion.h"
#include "executor/nodeResult.h"
+#include "executor/nodeSamplescan.h"
#include "executor/nodeSeqscan.h"
#include "executor/nodeSetOp.h"
#include "executor/nodeSort.h"
@@ -190,6 +191,11 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
estate, eflags);
break;
+ case T_SampleScan:
+ result = (PlanState *) ExecInitSampleScan((SampleScan *) node,
+ estate, eflags);
+ break;
+
case T_IndexScan:
result = (PlanState *) ExecInitIndexScan((IndexScan *) node,
estate, eflags);
@@ -406,6 +412,10 @@ ExecProcNode(PlanState *node)
result = ExecSeqScan((SeqScanState *) node);
break;
+ case T_SampleScanState:
+ result = ExecSampleScan((SampleScanState *) node);
+ break;
+
case T_IndexScanState:
result = ExecIndexScan((IndexScanState *) node);
break;
@@ -644,6 +654,10 @@ ExecEndNode(PlanState *node)
ExecEndSeqScan((SeqScanState *) node);
break;
+ case T_SampleScanState:
+ ExecEndSampleScan((SampleScanState *) node);
+ break;
+
case T_IndexScanState:
ExecEndIndexScan((IndexScanState *) node);
break;
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
new file mode 100644
index 0000000..818cddd
--- /dev/null
+++ b/src/backend/executor/nodeSamplescan.c
@@ -0,0 +1,388 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeSamplescan.c
+ * Support routines for sample scans of relations (table sampling).
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ * src/backend/executor/nodeSamplescan.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "catalog/pg_tablesamplemethod.h"
+#include "executor/executor.h"
+#include "access/relscan.h"
+#include "executor/nodeSamplescan.h"
+#include "parser/parsetree.h"
+#include "storage/bufmgr.h"
+#include "utils/rel.h"
+#include "utils/syscache.h"
+#include "utils/tqual.h"
+
+static void InitScanRelation(SampleScanState *node, EState *estate, int eflags);
+static TupleTableSlot *SampleNext(SampleScanState *node);
+
+/*
+ * Initialize the sampling method - loads function info and
+ * calls the tsminit function.
+ *
+ * We need special handling for this because the tsminit function
+ * is allowed to take optional additional arguments.
+ */
+static void
+InitSamplingMethod(SampleScanState *scanstate, TableSampleClause *tablesample)
+{
+ FunctionCallInfoData fcinfo;
+ int i;
+ List *args = tablesample->args;
+ ListCell *arg;
+ ExprContext *econtext = scanstate->ss.ps.ps_ExprContext;
+
+ /* Load functions */
+ fmgr_info(tablesample->tsminit, &(scanstate->tsminit));
+ fmgr_info(tablesample->tsmnextblock, &(scanstate->tsmnextblock));
+ fmgr_info(tablesample->tsmnexttuple, &(scanstate->tsmnexttuple));
+ fmgr_info(tablesample->tsmend, &(scanstate->tsmend));
+ fmgr_info(tablesample->tsmreset, &(scanstate->tsmreset));
+
+ InitFunctionCallInfoData(fcinfo, &scanstate->tsminit,
+ list_length(args) + 1,
+ InvalidOid, NULL, NULL);
+
+ /* First arg is always SampleScanState */
+ fcinfo.arg[0] = PointerGetDatum(scanstate);
+ fcinfo.argnull[0] = false;
+
+ i = 1;
+ foreach(arg, args)
+ {
+ Expr *argexpr = (Expr *) lfirst(arg);
+ ExprState *argstate = ExecInitExpr(argexpr, (PlanState *) scanstate);
+
+ if (argstate == NULL)
+ {
+ fcinfo.argnull[i] = true;
+ fcinfo.arg[i] = (Datum) 0;;
+ }
+
+ fcinfo.arg[i] = ExecEvalExpr(argstate, econtext,
+ &fcinfo.argnull[i], NULL);
+ i++;
+ }
+ Assert(i == fcinfo.nargs);
+
+ /* REPEATABLE was not specified */
+ if (fcinfo.argnull[1])
+ {
+ fcinfo.arg[1] = UInt32GetDatum(random());
+ fcinfo.argnull[1] = false;
+ }
+
+ (void) FunctionCallInvoke(&fcinfo);
+}
+
+
+/* ----------------------------------------------------------------
+ * Scan Support
+ * ----------------------------------------------------------------
+ */
+
+/* ----------------------------------------------------------------
+ * SampleNext
+ *
+ * This is a workhorse for ExecSampleScan
+ * ----------------------------------------------------------------
+ */
+static TupleTableSlot *
+SampleNext(SampleScanState *node)
+{
+ EState *estate;
+ TupleTableSlot *slot;
+ BlockNumber blockno = InvalidBlockNumber;
+ Snapshot snapshot;
+ Relation relation;
+ bool found = false;
+ bool retry = false;
+ OffsetNumber tupoffset, maxoffset;
+ Buffer buffer;
+ Page page;
+ HeapTuple tuple = &(node->tup);
+
+ /*
+ * get information from the estate and scan state
+ */
+ estate = node->ss.ps.state;
+ snapshot = estate->es_snapshot;
+ slot = node->ss.ss_ScanTupleSlot;
+ relation = node->ss.ss_currentRelation;
+ buffer = node->openbuffer;
+
+ if (BufferIsValid(buffer))
+ {
+ blockno = BufferGetBlockNumber(buffer);
+ page = BufferGetPage(buffer);
+ maxoffset = PageGetMaxOffsetNumber(page);
+ }
+
+ /*
+ * get the next tuple from the table
+ */
+ for (;;)
+ {
+ ItemId itemid;
+
+ /* Load next block if needed. */
+ if (!BufferIsValid(buffer))
+ {
+ blockno = DatumGetInt32(FunctionCall2(&node->tsmnextblock,
+ PointerGetDatum(node),
+ BoolGetDatum(retry)));
+ /* No more blocks to fetch */
+ if (!BlockNumberIsValid(blockno))
+ break;
+
+ buffer = ReadBufferExtended(relation, MAIN_FORKNUM, blockno,
+ RBM_NORMAL, NULL);
+ LockBuffer(buffer, BUFFER_LOCK_SHARE);
+
+ node->openbuffer = buffer;
+ page = BufferGetPage(buffer);
+ maxoffset = PageGetMaxOffsetNumber(page);
+ }
+
+ tupoffset = DatumGetUInt16(FunctionCall4(&node->tsmnexttuple,
+ PointerGetDatum(node),
+ UInt32GetDatum(blockno),
+ UInt16GetDatum(maxoffset),
+ BoolGetDatum(retry)));
+ /* Go to next block. */
+ if (!OffsetNumberIsValid(tupoffset))
+ {
+ UnlockReleaseBuffer(buffer);
+ node->openbuffer = buffer = InvalidBuffer;
+ continue;
+ }
+ retry = true;
+
+ /* Skip invalid tuple pointers. */
+ itemid = PageGetItemId(page, tupoffset);
+ if (!ItemIdIsNormal(itemid))
+ continue;
+
+ tuple->t_tableOid = RelationGetRelid(relation);
+ tuple->t_data = (HeapTupleHeader) PageGetItem(page, itemid);
+ tuple->t_len = ItemIdGetLength(itemid);
+ ItemPointerSet(&tuple->t_self, blockno, tupoffset);
+
+ /* Found visible tuple, return it. */
+ if (HeapTupleSatisfiesVisibility(tuple, snapshot, buffer))
+ {
+ found = true;
+ break;
+ }
+ }
+
+ if (found)
+ ExecStoreTuple(tuple, /* tuple to store */
+ slot, /* slot to store in */
+ buffer, /* buffer associated with this tuple */
+ false); /* don't pfree this pointer */
+ else
+ ExecClearTuple(slot);
+
+ return slot;
+}
+
+/*
+ * SampleRecheck -- access method routine to recheck a tuple in EvalPlanQual
+ */
+static bool
+SampleRecheck(SampleScanState *node, TupleTableSlot *slot)
+{
+ /* No need to recheck for SampleScan */
+ return true;
+}
+
+/* ----------------------------------------------------------------
+ * ExecSampleScan(node)
+ *
+ * Scans the relation sequentially and returns the next qualifying
+ * tuple while calling the sampling method functions.
+ * We call the ExecScan() routine and pass it the appropriate
+ * access method functions.
+ * ----------------------------------------------------------------
+ */
+TupleTableSlot *
+ExecSampleScan(SampleScanState *node)
+{
+ return ExecScan((ScanState *) node,
+ (ExecScanAccessMtd) SampleNext,
+ (ExecScanRecheckMtd) SampleRecheck);
+}
+
+/* ----------------------------------------------------------------
+ * InitScanRelation
+ *
+ * Set up to access the scan relation.
+ * ----------------------------------------------------------------
+ */
+static void
+InitScanRelation(SampleScanState *node, EState *estate, int eflags)
+{
+ Relation currentRelation;
+
+ /*
+ * get the relation object id from the relid'th entry in the range table,
+ * open that relation and acquire appropriate lock on it.
+ */
+ currentRelation = ExecOpenScanRelation(estate,
+ ((SampleScan *) node->ss.ps.plan)->scanrelid,
+ eflags);
+
+ node->ss.ss_currentRelation = currentRelation;
+ node->ss.ss_currentScanDesc = NULL;
+
+ /* and report the scan tuple slot's rowtype */
+ ExecAssignScanType(&node->ss, RelationGetDescr(currentRelation));
+}
+
+
+/* ----------------------------------------------------------------
+ * ExecInitSampleScan
+ * ----------------------------------------------------------------
+ */
+SampleScanState *
+ExecInitSampleScan(SampleScan *node, EState *estate, int eflags)
+{
+ SampleScanState *scanstate;
+ RangeTblEntry *rte = rt_fetch(node->scanrelid,
+ estate->es_range_table);
+
+ /*
+ * Once upon a time it was possible to have an outerPlan of a SanpleScan, but
+ * not any more.
+ */
+ Assert(outerPlan(node) == NULL);
+ Assert(innerPlan(node) == NULL);
+ Assert(rte->tablesample != NULL);
+
+ /*
+ * create state structure
+ */
+ scanstate = makeNode(SampleScanState);
+ scanstate->ss.ps.plan = (Plan *) node;
+ scanstate->ss.ps.state = estate;
+
+ /*
+ * Miscellaneous initialization
+ *
+ * create expression context for node
+ */
+ ExecAssignExprContext(estate, &scanstate->ss.ps);
+
+ /*
+ * initialize child expressions
+ */
+ scanstate->ss.ps.targetlist = (List *)
+ ExecInitExpr((Expr *) node->plan.targetlist,
+ (PlanState *) scanstate);
+ scanstate->ss.ps.qual = (List *)
+ ExecInitExpr((Expr *) node->plan.qual,
+ (PlanState *) scanstate);
+
+ /*
+ * tuple table initialization
+ */
+ ExecInitResultTupleSlot(estate, &scanstate->ss.ps);
+ ExecInitScanTupleSlot(estate, &scanstate->ss);
+
+ /*
+ * initialize scan relation
+ */
+ InitScanRelation(scanstate, estate, eflags);
+
+ scanstate->ss.ps.ps_TupFromTlist = false;
+
+ /*
+ * Initialize result tuple type and projection info.
+ */
+ ExecAssignResultTypeFromTL(&scanstate->ss.ps);
+ ExecAssignScanProjectionInfo(&scanstate->ss);
+
+ scanstate->openbuffer = InvalidBuffer;
+
+ InitSamplingMethod(scanstate, rte->tablesample);
+
+ return scanstate;
+}
+
+/* ----------------------------------------------------------------
+ * ExecEndSampleScan
+ *
+ * frees any storage allocated through C routines.
+ * ----------------------------------------------------------------
+ */
+void
+ExecEndSampleScan(SampleScanState *node)
+{
+ /*
+ * Tell sampling function that we finished thes can.
+ */
+ FunctionCall1(&node->tsmend, PointerGetDatum(node));
+
+ if (BufferIsValid(node->openbuffer))
+ {
+ UnlockReleaseBuffer(node->openbuffer);
+ node->openbuffer = InvalidBuffer;
+ }
+
+ /*
+ * Free the exprcontext
+ */
+ ExecFreeExprContext(&node->ss.ps);
+
+ /*
+ * clean out the tuple table
+ */
+ ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
+
+ /*
+ * close the heap relation.
+ */
+ ExecCloseScanRelation(node->ss.ss_currentRelation);
+}
+
+/* ----------------------------------------------------------------
+ * Join Support
+ * ----------------------------------------------------------------
+ */
+
+/* ----------------------------------------------------------------
+ * ExecReScanSampleScan
+ *
+ * Rescans the relation.
+ *
+ * ----------------------------------------------------------------
+ */
+void
+ExecReScanSampleScan(SampleScanState *node)
+{
+ if (BufferIsValid(node->openbuffer))
+ {
+ UnlockReleaseBuffer(node->openbuffer);
+ node->openbuffer = InvalidBuffer;
+ }
+
+ /*
+ * Tell sampling function to reset its state for rescan.
+ */
+ FunctionCall1(&node->tsmreset, PointerGetDatum(node));
+
+ ExecReScan((PlanState *) node);
+}
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 6b1bf7b..47769d0 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -628,6 +628,22 @@ _copyCustomScan(const CustomScan *from)
}
/*
+ * _copySampleScan
+ */
+static SampleScan *
+_copySampleScan(const SampleScan *from)
+{
+ SampleScan *newnode = makeNode(SampleScan);
+
+ /*
+ * copy node superclass fields
+ */
+ CopyScanFields((const Scan *) from, (Scan *) newnode);
+
+ return newnode;
+}
+
+/*
* CopyJoinFields
*
* This function copies the fields of the Join node. It is used by
@@ -2006,6 +2022,7 @@ _copyRangeTblEntry(const RangeTblEntry *from)
COPY_SCALAR_FIELD(rtekind);
COPY_SCALAR_FIELD(relid);
COPY_SCALAR_FIELD(relkind);
+ COPY_NODE_FIELD(tablesample);
COPY_NODE_FIELD(subquery);
COPY_SCALAR_FIELD(security_barrier);
COPY_SCALAR_FIELD(jointype);
@@ -2138,6 +2155,34 @@ _copyCommonTableExpr(const CommonTableExpr *from)
return newnode;
}
+static RangeTableSample *
+_copyRangeTableSample(const RangeTableSample *from)
+{
+ RangeTableSample *newnode = makeNode(RangeTableSample);
+
+ COPY_NODE_FIELD(relation);
+ COPY_STRING_FIELD(method);
+ COPY_NODE_FIELD(args);
+
+ return newnode;
+}
+
+static TableSampleClause *
+_copyTableSampleClause(const TableSampleClause *from)
+{
+ TableSampleClause *newnode = makeNode(TableSampleClause);
+
+ COPY_SCALAR_FIELD(tsmid);
+ COPY_SCALAR_FIELD(tsminit);
+ COPY_SCALAR_FIELD(tsmnextblock);
+ COPY_SCALAR_FIELD(tsmnexttuple);
+ COPY_SCALAR_FIELD(tsmend);
+ COPY_SCALAR_FIELD(tsmreset);
+ COPY_NODE_FIELD(args);
+
+ return newnode;
+}
+
static A_Expr *
_copyAExpr(const A_Expr *from)
{
@@ -4076,6 +4121,9 @@ copyObject(const void *from)
case T_CustomScan:
retval = _copyCustomScan(from);
break;
+ case T_SampleScan:
+ retval = _copySampleScan(from);
+ break;
case T_Join:
retval = _copyJoin(from);
break;
@@ -4724,6 +4772,12 @@ copyObject(const void *from)
case T_CommonTableExpr:
retval = _copyCommonTableExpr(from);
break;
+ case T_RangeTableSample:
+ retval = _copyRangeTableSample(from);
+ break;
+ case T_TableSampleClause:
+ retval = _copyTableSampleClause(from);
+ break;
case T_PrivGrantee:
retval = _copyPrivGrantee(from);
break;
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index d5db71d..9125b43 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -2324,6 +2324,7 @@ _equalRangeTblEntry(const RangeTblEntry *a, const RangeTblEntry *b)
COMPARE_SCALAR_FIELD(rtekind);
COMPARE_SCALAR_FIELD(relid);
COMPARE_SCALAR_FIELD(relkind);
+ COMPARE_NODE_FIELD(tablesample);
COMPARE_NODE_FIELD(subquery);
COMPARE_SCALAR_FIELD(security_barrier);
COMPARE_SCALAR_FIELD(jointype);
@@ -2443,6 +2444,30 @@ _equalCommonTableExpr(const CommonTableExpr *a, const CommonTableExpr *b)
}
static bool
+_equalRangeTableSample(const RangeTableSample *a, const RangeTableSample *b)
+{
+ COMPARE_NODE_FIELD(relation);
+ COMPARE_STRING_FIELD(method);
+ COMPARE_NODE_FIELD(args);
+
+ return true;
+}
+
+static bool
+_equalTableSampleClause(const TableSampleClause *a, const TableSampleClause *b)
+{
+ COMPARE_SCALAR_FIELD(tsmid);
+ COMPARE_SCALAR_FIELD(tsminit);
+ COMPARE_SCALAR_FIELD(tsmnextblock);
+ COMPARE_SCALAR_FIELD(tsmnexttuple);
+ COMPARE_SCALAR_FIELD(tsmend);
+ COMPARE_SCALAR_FIELD(tsmreset);
+ COMPARE_NODE_FIELD(args);
+
+ return true;
+}
+
+static bool
_equalXmlSerialize(const XmlSerialize *a, const XmlSerialize *b)
{
COMPARE_SCALAR_FIELD(xmloption);
@@ -3151,6 +3176,12 @@ equal(const void *a, const void *b)
case T_CommonTableExpr:
retval = _equalCommonTableExpr(a, b);
break;
+ case T_RangeTableSample:
+ retval = _equalRangeTableSample(a, b);
+ break;
+ case T_TableSampleClause:
+ retval = _equalTableSampleClause(a, b);
+ break;
case T_PrivGrantee:
retval = _equalPrivGrantee(a, b);
break;
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index ae857a0..557aa3a 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -3209,6 +3209,16 @@ raw_expression_tree_walker(Node *node,
return walker(((WithClause *) node)->ctes, context);
case T_CommonTableExpr:
return walker(((CommonTableExpr *) node)->ctequery, context);
+ case T_RangeTableSample:
+ {
+ RangeTableSample *rts = (RangeTableSample *) node;
+
+ if (walker(rts->relation, context))
+ return true;
+ if (walker(rts->args, context))
+ return true;
+ }
+ break;
default:
elog(ERROR, "unrecognized node type: %d",
(int) nodeTag(node));
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index edbd09f..be91e13 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -578,6 +578,14 @@ _outCustomScan(StringInfo str, const CustomScan *node)
}
static void
+_outSampleScan(StringInfo str, const SampleScan *node)
+{
+ WRITE_NODE_TYPE("SAMPLESCAN");
+
+ _outScanInfo(str, (const Scan *) node);
+}
+
+static void
_outJoin(StringInfo str, const Join *node)
{
WRITE_NODE_TYPE("JOIN");
@@ -2391,6 +2399,30 @@ _outCommonTableExpr(StringInfo str, const CommonTableExpr *node)
}
static void
+_outRangeTableSample(StringInfo str, const RangeTableSample *node)
+{
+ WRITE_NODE_TYPE("RANGETABLESAMPLE");
+
+ WRITE_NODE_FIELD(relation);
+ WRITE_STRING_FIELD(method);
+ WRITE_NODE_FIELD(args);
+}
+
+static void
+_outTableSampleClause(StringInfo str, const TableSampleClause *node)
+{
+ WRITE_NODE_TYPE("TABLESAMPLECLAUSE");
+
+ WRITE_OID_FIELD(tsmid);
+ WRITE_OID_FIELD(tsminit);
+ WRITE_OID_FIELD(tsmnextblock);
+ WRITE_OID_FIELD(tsmnexttuple);
+ WRITE_OID_FIELD(tsmend);
+ WRITE_OID_FIELD(tsmreset);
+ WRITE_NODE_FIELD(args);
+}
+
+static void
_outSetOperationStmt(StringInfo str, const SetOperationStmt *node)
{
WRITE_NODE_TYPE("SETOPERATIONSTMT");
@@ -2420,6 +2452,7 @@ _outRangeTblEntry(StringInfo str, const RangeTblEntry *node)
case RTE_RELATION:
WRITE_OID_FIELD(relid);
WRITE_CHAR_FIELD(relkind);
+ WRITE_NODE_FIELD(tablesample);
break;
case RTE_SUBQUERY:
WRITE_NODE_FIELD(subquery);
@@ -2887,6 +2920,9 @@ _outNode(StringInfo str, const void *obj)
case T_CustomScan:
_outCustomScan(str, obj);
break;
+ case T_SampleScan:
+ _outSampleScan(str, obj);
+ break;
case T_Join:
_outJoin(str, obj);
break;
@@ -3228,6 +3264,12 @@ _outNode(StringInfo str, const void *obj)
case T_CommonTableExpr:
_outCommonTableExpr(str, obj);
break;
+ case T_RangeTableSample:
+ _outRangeTableSample(str, obj);
+ break;
+ case T_TableSampleClause:
+ _outTableSampleClause(str, obj);
+ break;
case T_SetOperationStmt:
_outSetOperationStmt(str, obj);
break;
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index a3efdd4..50aca4e 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -350,6 +350,40 @@ _readCommonTableExpr(void)
}
/*
+ * _readRangeTableSample
+ */
+static RangeTableSample *
+_readRangeTableSample(void)
+{
+ READ_LOCALS(RangeTableSample);
+
+ READ_NODE_FIELD(relation);
+ READ_STRING_FIELD(method);
+ READ_NODE_FIELD(args);
+
+ READ_DONE();
+}
+
+/*
+ * _readTableSampleClause
+ */
+static TableSampleClause *
+_readTableSampleClause(void)
+{
+ READ_LOCALS(TableSampleClause);
+
+ READ_OID_FIELD(tsmid);
+ READ_OID_FIELD(tsminit);
+ READ_OID_FIELD(tsmnextblock);
+ READ_OID_FIELD(tsmnexttuple);
+ READ_OID_FIELD(tsmend);
+ READ_OID_FIELD(tsmreset);
+ READ_NODE_FIELD(args);
+
+ READ_DONE();
+}
+
+/*
* _readSetOperationStmt
*/
static SetOperationStmt *
@@ -1216,6 +1250,7 @@ _readRangeTblEntry(void)
case RTE_RELATION:
READ_OID_FIELD(relid);
READ_CHAR_FIELD(relkind);
+ READ_NODE_FIELD(tablesample);
break;
case RTE_SUBQUERY:
READ_NODE_FIELD(subquery);
@@ -1311,6 +1346,10 @@ parseNodeString(void)
return_value = _readRowMarkClause();
else if (MATCH("COMMONTABLEEXPR", 15))
return_value = _readCommonTableExpr();
+ else if (MATCH("RANGETABLESAMPLE", 16))
+ return_value = _readRangeTableSample();
+ else if (MATCH("TABLESAMPLECLAUSE", 17))
+ return_value = _readTableSampleClause();
else if (MATCH("SETOPERATIONSTMT", 16))
return_value = _readSetOperationStmt();
else if (MATCH("ALIAS", 5))
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 449fdc3..53fa356 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -71,6 +71,8 @@ static void set_plain_rel_size(PlannerInfo *root, RelOptInfo *rel,
RangeTblEntry *rte);
static void set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
RangeTblEntry *rte);
+static void set_tablesample_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
+ RangeTblEntry *rte);
static void set_foreign_size(PlannerInfo *root, RelOptInfo *rel,
RangeTblEntry *rte);
static void set_foreign_pathlist(PlannerInfo *root, RelOptInfo *rel,
@@ -332,6 +334,11 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
/* Foreign table */
set_foreign_pathlist(root, rel, rte);
}
+ else if (rte->tablesample != NULL)
+ {
+ /* Build sample scan on relation */
+ set_tablesample_rel_pathlist(root, rel, rte);
+ }
else
{
/* Plain relation */
@@ -418,6 +425,34 @@ set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
}
/*
+ * set_tablesample_rel_pathlist
+ * Build access paths for a sampled relation
+ *
+ * There is only one possible path - sampling scan
+ */
+static void
+set_tablesample_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
+{
+ Relids required_outer;
+
+ /*
+ * We don't support pushing join clauses into the quals of a seqscan, but
+ * it could still have required parameterization due to LATERAL refs in
+ * its tlist.
+ */
+ required_outer = rel->lateral_relids;
+
+ /* We only do sample scan if it was requested */
+ add_path(rel, create_samplescan_path(root, rel, required_outer));
+
+ /*
+ * There is only one plan to consider but we still need to set
+ * parameters for RelOptInfo.
+ */
+ set_cheapest(rel);
+}
+
+/*
* set_foreign_size
* Set size estimates for a foreign table RTE
*/
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 659daa2..615c3f5 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -219,6 +219,54 @@ cost_seqscan(Path *path, PlannerInfo *root,
}
/*
+ * cost_samplescan
+ * Determines and returns the cost of scanning a relation using sampling.
+ *
+ * 'baserel' is the relation to be scanned
+ * 'param_info' is the ParamPathInfo if this is a parameterized path, else NULL
+ */
+void
+cost_samplescan(Path *path, PlannerInfo *root,
+ RelOptInfo *baserel, ParamPathInfo *param_info)
+{
+ Cost startup_cost = 0;
+ Cost run_cost = 0;
+ double spc_sample_page_cost;
+ QualCost qpqual_cost;
+ Cost cpu_per_tuple;
+
+ /* Should only be applied to base relations */
+ Assert(baserel->relid > 0);
+ Assert(baserel->rtekind == RTE_RELATION);
+
+ /* Mark the path with the correct row estimate */
+ if (param_info)
+ path->rows = param_info->ppi_rows;
+ else
+ path->rows = baserel->rows;
+
+ /* fetch estimated page cost for tablespace containing table */
+ get_tablespace_page_costs(baserel->reltablespace,
+ NULL,
+ &spc_sample_page_cost);
+
+ /*
+ * disk costs
+ */
+ run_cost += spc_sample_page_cost * baserel->pages;
+
+ /* CPU costs */
+ get_restriction_qual_cost(root, baserel, param_info, &qpqual_cost);
+
+ startup_cost += qpqual_cost.startup;
+ cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple;
+ run_cost += cpu_per_tuple * baserel->tuples;
+
+ path->startup_cost = startup_cost;
+ path->total_cost = startup_cost + run_cost;
+}
+
+/*
* cost_index
* Determines and returns the cost of scanning a relation using an index.
*
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index bf8dbe0..86dc6e1 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -57,6 +57,8 @@ static Material *create_material_plan(PlannerInfo *root, MaterialPath *best_path
static Plan *create_unique_plan(PlannerInfo *root, UniquePath *best_path);
static SeqScan *create_seqscan_plan(PlannerInfo *root, Path *best_path,
List *tlist, List *scan_clauses);
+static SampleScan *create_samplescan_plan(PlannerInfo *root, Path *best_path,
+ List *tlist, List *scan_clauses);
static Scan *create_indexscan_plan(PlannerInfo *root, IndexPath *best_path,
List *tlist, List *scan_clauses, bool indexonly);
static BitmapHeapScan *create_bitmap_scan_plan(PlannerInfo *root,
@@ -99,6 +101,7 @@ static List *order_qual_clauses(PlannerInfo *root, List *clauses);
static void copy_path_costsize(Plan *dest, Path *src);
static void copy_plan_costsize(Plan *dest, Plan *src);
static SeqScan *make_seqscan(List *qptlist, List *qpqual, Index scanrelid);
+static SampleScan *make_samplescan(List *qptlist, List *qpqual, Index scanrelid);
static IndexScan *make_indexscan(List *qptlist, List *qpqual, Index scanrelid,
Oid indexid, List *indexqual, List *indexqualorig,
List *indexorderby, List *indexorderbyorig,
@@ -227,6 +230,7 @@ create_plan_recurse(PlannerInfo *root, Path *best_path)
switch (best_path->pathtype)
{
case T_SeqScan:
+ case T_SampleScan:
case T_IndexScan:
case T_IndexOnlyScan:
case T_BitmapHeapScan:
@@ -342,6 +346,13 @@ create_scan_plan(PlannerInfo *root, Path *best_path)
scan_clauses);
break;
+ case T_SampleScan:
+ plan = (Plan *) create_samplescan_plan(root,
+ best_path,
+ tlist,
+ scan_clauses);
+ break;
+
case T_IndexScan:
plan = (Plan *) create_indexscan_plan(root,
(IndexPath *) best_path,
@@ -545,6 +556,7 @@ disuse_physical_tlist(PlannerInfo *root, Plan *plan, Path *path)
switch (path->pathtype)
{
case T_SeqScan:
+ case T_SampleScan:
case T_IndexScan:
case T_IndexOnlyScan:
case T_BitmapHeapScan:
@@ -1132,6 +1144,45 @@ create_seqscan_plan(PlannerInfo *root, Path *best_path,
}
/*
+ * create_samplescan_plan
+ * Returns a samplecan plan for the base relation scanned by 'best_path'
+ * with restriction clauses 'scan_clauses' and targetlist 'tlist'.
+ */
+static SampleScan *
+create_samplescan_plan(PlannerInfo *root, Path *best_path,
+ List *tlist, List *scan_clauses)
+{
+ SampleScan *scan_plan;
+ Index scan_relid = best_path->parent->relid;
+
+ /* it should be a base rel with tablesample clause... */
+ Assert(scan_relid > 0);
+ Assert(best_path->parent->rtekind == RTE_RELATION);
+ Assert(best_path->pathtype == T_SampleScan);
+
+ /* Sort clauses into best execution order */
+ scan_clauses = order_qual_clauses(root, scan_clauses);
+
+ /* Reduce RestrictInfo list to bare expressions; ignore pseudoconstants */
+ scan_clauses = extract_actual_clauses(scan_clauses, false);
+
+ /* Replace any outer-relation variables with nestloop params */
+ if (best_path->param_info)
+ {
+ scan_clauses = (List *)
+ replace_nestloop_params(root, (Node *) scan_clauses);
+ }
+
+ scan_plan = make_samplescan(tlist,
+ scan_clauses,
+ scan_relid);
+
+ copy_path_costsize(&scan_plan->plan, best_path);
+
+ return scan_plan;
+}
+
+/*
* create_indexscan_plan
* Returns an indexscan plan for the base relation scanned by 'best_path'
* with restriction clauses 'scan_clauses' and targetlist 'tlist'.
@@ -3317,6 +3368,24 @@ make_seqscan(List *qptlist,
return node;
}
+static SampleScan *
+make_samplescan(List *qptlist,
+ List *qpqual,
+ Index scanrelid)
+{
+ SampleScan *node = makeNode(SampleScan);
+ Plan *plan = &node->plan;
+
+ /* cost should be inserted by caller */
+ plan->targetlist = qptlist;
+ plan->qual = qpqual;
+ plan->lefttree = NULL;
+ plan->righttree = NULL;
+ node->scanrelid = scanrelid;
+
+ return node;
+}
+
static IndexScan *
make_indexscan(List *qptlist,
List *qpqual,
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 4d3fbca..0d78f27 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -446,6 +446,17 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
fix_scan_list(root, splan->plan.qual, rtoffset);
}
break;
+ case T_SampleScan:
+ {
+ SampleScan *splan = (SampleScan *) plan;
+
+ splan->scanrelid += rtoffset;
+ splan->plan.targetlist =
+ fix_scan_list(root, splan->plan.targetlist, rtoffset);
+ splan->plan.qual =
+ fix_scan_list(root, splan->plan.qual, rtoffset);
+ }
+ break;
case T_IndexScan:
{
IndexScan *splan = (IndexScan *) plan;
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index 579d021..7da1a44 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -2163,6 +2163,7 @@ finalize_plan(PlannerInfo *root, Plan *plan, Bitmapset *valid_params,
break;
case T_SeqScan:
+ case T_SampleScan:
context.paramids = bms_add_members(context.paramids, scan_params);
break;
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 319e8b2..766d276 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -706,6 +706,26 @@ create_seqscan_path(PlannerInfo *root, RelOptInfo *rel, Relids required_outer)
}
/*
+ * create_samplescan_path
+ * Like seqscan but uses sampling function while scanning.
+ */
+Path *
+create_samplescan_path(PlannerInfo *root, RelOptInfo *rel, Relids required_outer)
+{
+ Path *pathnode = makeNode(Path);
+
+ pathnode->pathtype = T_SampleScan;
+ pathnode->parent = rel;
+ pathnode->param_info = get_baserel_parampathinfo(root, rel,
+ required_outer);
+ pathnode->pathkeys = NIL; /* samplescan has unordered result */
+
+ cost_samplescan(pathnode, root, rel, pathnode->param_info);
+
+ return pathnode;
+}
+
+/*
* create_index_path
* Creates a path node for an index scan.
*
@@ -1921,6 +1941,8 @@ reparameterize_path(PlannerInfo *root, Path *path,
case T_SubqueryScan:
return create_subqueryscan_path(root, rel, path->pathkeys,
required_outer);
+ case T_SampleScan:
+ return create_samplescan_path(root, rel, required_outer);
default:
break;
}
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 4b5009b..87a797a 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -447,6 +447,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
%type <range> relation_expr
%type <range> relation_expr_opt_alias
%type <target> target_el single_set_clause set_target insert_column_item
+%type <node> relation_expr_tablesample tablesample_clause opt_repeatable_clause
%type <str> generic_option_name
%type <node> generic_option_arg
@@ -611,8 +612,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
STATEMENT STATISTICS STDIN STDOUT STORAGE STRICT_P STRIP_P SUBSTRING
SYMMETRIC SYSID SYSTEM_P
- TABLE TABLES TABLESPACE TEMP TEMPLATE TEMPORARY TEXT_P THEN TIME TIMESTAMP
- TO TRAILING TRANSACTION TREAT TRIGGER TRIM TRUE_P
+ TABLE TABLES TABLESAMPLE TABLESPACE TEMP TEMPLATE TEMPORARY TEXT_P THEN
+ TIME TIMESTAMP TO TRAILING TRANSACTION TREAT TRIGGER TRIM TRUE_P
TRUNCATE TRUSTED TYPE_P TYPES_P
UNBOUNDED UNCOMMITTED UNENCRYPTED UNION UNIQUE UNKNOWN UNLISTEN UNLOGGED
@@ -10109,6 +10110,12 @@ table_ref: relation_expr opt_alias_clause
$1->alias = $2;
$$ = (Node *) $1;
}
+ | relation_expr_tablesample opt_alias_clause
+ {
+ RangeTableSample *n = (RangeTableSample *) $1;
+ n->relation->alias = $2;
+ $$ = (Node *) n;
+ }
| func_table func_alias_clause
{
RangeFunction *n = (RangeFunction *) $1;
@@ -10404,7 +10411,6 @@ relation_expr_list:
| relation_expr_list ',' relation_expr { $$ = lappend($1, $3); }
;
-
/*
* Given "UPDATE foo set set ...", we have to decide without looking any
* further ahead whether the first "set" is an alias or the UPDATE's SET
@@ -10434,6 +10440,30 @@ relation_expr_opt_alias: relation_expr %prec UMINUS
}
;
+
+relation_expr_tablesample: relation_expr tablesample_clause
+ {
+ RangeTableSample *n = (RangeTableSample *) $2;
+ n->relation = $1;
+ $$ = (Node *) n;
+ }
+ ;
+
+tablesample_clause:
+ TABLESAMPLE ColId '(' func_arg_list ')' opt_repeatable_clause
+ {
+ RangeTableSample *n = makeNode(RangeTableSample);
+ n->method = $2;
+ n->args = lcons($6, $4);
+ $$ = (Node *) n;
+ }
+ ;
+
+opt_repeatable_clause:
+ REPEATABLE '(' Iconst ')' { $$ = makeIntConst($3, @3); }
+ | /*EMPTY*/ { $$ = makeNullAConst(-1); }
+ ;
+
/*
* func_table represents a function invocation in a FROM list. It can be
* a plain function call, like "foo(...)", or a ROWS FROM expression with
@@ -13216,7 +13246,6 @@ unreserved_keyword:
| RELATIVE_P
| RELEASE
| RENAME
- | REPEATABLE
| REPLACE
| REPLICA
| RESET
@@ -13391,6 +13420,7 @@ type_func_name_keyword:
| OVERLAPS
| RIGHT
| SIMILAR
+ | TABLESAMPLE
| VERBOSE
;
@@ -13459,6 +13489,7 @@ reserved_keyword:
| PLACING
| PRIMARY
| REFERENCES
+ | REPEATABLE
| RETURNING
| SELECT
| SESSION_USER
diff --git a/src/backend/parser/parse_clause.c b/src/backend/parser/parse_clause.c
index 4931dca..c246a9c 100644
--- a/src/backend/parser/parse_clause.c
+++ b/src/backend/parser/parse_clause.c
@@ -16,6 +16,7 @@
#include "postgres.h"
#include "access/heapam.h"
+#include "access/htup_details.h"
#include "catalog/heap.h"
#include "catalog/pg_type.h"
#include "commands/defrem.h"
@@ -29,6 +30,7 @@
#include "parser/parse_coerce.h"
#include "parser/parse_collate.h"
#include "parser/parse_expr.h"
+#include "parser/parse_func.h"
#include "parser/parse_oper.h"
#include "parser/parse_relation.h"
#include "parser/parse_target.h"
@@ -36,6 +38,7 @@
#include "utils/guc.h"
#include "utils/lsyscache.h"
#include "utils/rel.h"
+#include "utils/syscache.h"
/* Convenience macro for the most common makeNamespaceItem() case */
@@ -413,6 +416,19 @@ transformJoinOnClause(ParseState *pstate, JoinExpr *j, List *namespace)
return result;
}
+static RangeTblEntry *
+transformTableSampleEntry(ParseState *pstate, RangeTableSample *r)
+{
+ RangeTblEntry *rte;
+ TableSampleClause *tablesample = NULL;
+
+ rte = transformTableEntry(pstate, r->relation);
+ tablesample = ParseTableSample(pstate, r->method, r->args);
+ rte->tablesample = tablesample;
+
+ return rte;
+}
+
/*
* transformTableEntry --- transform a RangeVar (simple relation reference)
*/
@@ -421,7 +437,7 @@ transformTableEntry(ParseState *pstate, RangeVar *r)
{
RangeTblEntry *rte;
- /* We need only build a range table entry */
+ /* We first need to build a range table entry */
rte = addRangeTableEntry(pstate, r, r->alias,
interpretInhOption(r->inhOpt), true);
@@ -1121,6 +1137,26 @@ transformFromClauseItem(ParseState *pstate, Node *n,
return (Node *) j;
}
+ else if (IsA(n, RangeTableSample))
+ {
+ /* Tablesample reference */
+ RangeTableSample *rv = (RangeTableSample *) n;
+ RangeTblRef *rtr;
+ RangeTblEntry *rte = NULL;
+ int rtindex;
+
+ rte = transformTableSampleEntry(pstate, rv);
+
+ /* assume new rte is at end */
+ rtindex = list_length(pstate->p_rtable);
+ Assert(rte == rt_fetch(rtindex, pstate->p_rtable));
+ *top_rte = rte;
+ *top_rti = rtindex;
+ *namespace = list_make1(makeDefaultNSItem(rte));
+ rtr = makeNode(RangeTblRef);
+ rtr->rtindex = rtindex;
+ return (Node *) rtr;
+ }
else
elog(ERROR, "unrecognized node type: %d", (int) nodeTag(n));
return NULL; /* can't get here, keep compiler quiet */
diff --git a/src/backend/parser/parse_func.c b/src/backend/parser/parse_func.c
index 9ebd3fd..77a28ac 100644
--- a/src/backend/parser/parse_func.c
+++ b/src/backend/parser/parse_func.c
@@ -18,6 +18,7 @@
#include "catalog/pg_aggregate.h"
#include "catalog/pg_proc.h"
#include "catalog/pg_type.h"
+#include "catalog/pg_tablesamplemethod.h"
#include "funcapi.h"
#include "lib/stringinfo.h"
#include "nodes/makefuncs.h"
@@ -26,6 +27,7 @@
#include "parser/parse_clause.h"
#include "parser/parse_coerce.h"
#include "parser/parse_func.h"
+#include "parser/parse_expr.h"
#include "parser/parse_relation.h"
#include "parser/parse_target.h"
#include "parser/parse_type.h"
@@ -760,6 +762,104 @@ ParseFuncOrColumn(ParseState *pstate, List *funcname, List *fargs,
}
+/*
+ * ParseTableSample
+ *
+ * Parse TABLESAMPLE clause and process the arguments
+ */
+extern TableSampleClause *
+ParseTableSample(ParseState *pstate, char *samplemethod, List *sampleargs)
+{
+ HeapTuple tuple;
+ Form_pg_tablesamplemethod tsm;
+ Form_pg_proc procform;
+ TableSampleClause *tablesample;
+ List *fargs;
+ ListCell *larg;
+ int nargs, pronargs;
+ Oid actual_arg_types[FUNC_MAX_ARGS];
+ Oid declared_arg_types[FUNC_MAX_ARGS];
+
+ /* Load the table sample method */
+ tuple = SearchSysCache1(TABLESAMPLEMETHODNAME, PointerGetDatum(samplemethod));
+ if (!HeapTupleIsValid(tuple))
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_OBJECT),
+ errmsg("table sampling method \"%s\" does not exist",
+ samplemethod)));
+
+ tablesample = makeNode(TableSampleClause);
+ tablesample->tsmid = HeapTupleGetOid(tuple);
+
+ tsm = (Form_pg_tablesamplemethod) GETSTRUCT(tuple);
+
+ tablesample->tsminit = tsm->tsminit;
+ tablesample->tsmnextblock = tsm->tsmnextblock;
+ tablesample->tsmnexttuple = tsm->tsmnexttuple;
+ tablesample->tsmend = tsm->tsmend;
+ tablesample->tsmreset = tsm->tsmreset;
+
+ ReleaseSysCache(tuple);
+
+ /* Load the table sample method's init procedure. */
+ tuple = SearchSysCache1(PROCOID,
+ ObjectIdGetDatum(tablesample->tsminit));
+
+ if (!HeapTupleIsValid(tuple)) /* should not happen */
+ elog(ERROR, "cache lookup failed for function %u",
+ tablesample->tsminit);
+
+ procform = (Form_pg_proc) GETSTRUCT(tuple);
+ pronargs = procform->pronargs;
+ Assert(pronargs >= 3);
+
+ /*
+ * First parameter is used to pass the SampleScanState,
+ * skip the processing for it here, just assert that it's the correct type.
+ */
+ Assert(procform->proargtypes.values[0] == INTERNALOID);
+ pronargs--;
+ memcpy(declared_arg_types, procform->proargtypes.values + 1,
+ pronargs * sizeof(Oid));
+
+ /* Now we are done with the catalog */
+ ReleaseSysCache(tuple);
+
+ /* Transform the list of arguments ... */
+ fargs = NIL;
+ nargs = 0;
+ foreach(larg, sampleargs)
+ {
+ Node *arg = transformExpr(pstate, (Node *) lfirst(larg), EXPR_KIND_FROM_FUNCTION);
+ Oid argtype = exprType(arg);
+
+ fargs = lappend(fargs, arg);
+
+ actual_arg_types[nargs++] = argtype;
+ }
+
+ /*
+ * Check if parameters are correct.
+ *
+ * XXX: can we do better at hinting here?
+ */
+ if (pronargs != nargs ||
+ !can_coerce_type(pronargs, actual_arg_types, declared_arg_types,
+ COERCION_IMPLICIT))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("wrong parameters for TABLESAMPLE method \"%s\"",
+ samplemethod)));
+
+ /* perform the necessary typecasting of arguments */
+ make_fn_arguments(pstate, fargs, actual_arg_types, declared_arg_types);
+
+ /* Pass the arguments down */
+ tablesample->args = fargs;
+
+ return tablesample;
+}
+
/* func_match_argtypes()
*
* Given a list of candidate functions (having the right name and number
diff --git a/src/backend/utils/cache/syscache.c b/src/backend/utils/cache/syscache.c
index 94d951c..6832e0b 100644
--- a/src/backend/utils/cache/syscache.c
+++ b/src/backend/utils/cache/syscache.c
@@ -55,6 +55,7 @@
#include "catalog/pg_shdescription.h"
#include "catalog/pg_shseclabel.h"
#include "catalog/pg_statistic.h"
+#include "catalog/pg_tablesamplemethod.h"
#include "catalog/pg_tablespace.h"
#include "catalog/pg_ts_config.h"
#include "catalog/pg_ts_config_map.h"
@@ -642,6 +643,28 @@ static const struct cachedesc cacheinfo[] = {
},
128
},
+ {TableSampleMethodRelationId, /* TABLESAMPLEMETHODNAME */
+ TableSampleMethodNameIndexId,
+ 1,
+ {
+ Anum_pg_tablesamplemethod_tsmname,
+ 0,
+ 0,
+ 0,
+ },
+ 2
+ },
+ {TableSampleMethodRelationId, /* TABLESAMPLEMETHODOID */
+ TableSampleMethodOidIndexId,
+ 1,
+ {
+ ObjectIdAttributeNumber,
+ 0,
+ 0,
+ 0,
+ },
+ 2
+ },
{TableSpaceRelationId, /* TABLESPACEOID */
TablespaceOidIndexId,
1,
diff --git a/src/backend/utils/misc/Makefile b/src/backend/utils/misc/Makefile
index c7b745e..f311c74 100644
--- a/src/backend/utils/misc/Makefile
+++ b/src/backend/utils/misc/Makefile
@@ -15,7 +15,7 @@ include $(top_builddir)/src/Makefile.global
override CPPFLAGS := -I. -I$(srcdir) $(CPPFLAGS)
OBJS = guc.o help_config.o pg_rusage.o ps_status.o rbtree.o \
- superuser.o timeout.o tzparser.o
+ sampling.o superuser.o timeout.o tzparser.o
# This location might depend on the installation directories. Therefore
# we can't subsitute it into pg_config.h.
diff --git a/src/backend/utils/misc/sampling.c b/src/backend/utils/misc/sampling.c
new file mode 100644
index 0000000..c07f01e
--- /dev/null
+++ b/src/backend/utils/misc/sampling.c
@@ -0,0 +1,131 @@
+/*-------------------------------------------------------------------------
+ *
+ * sampling.c
+ * Block sampling routines shared by ANALYZE and TABLESAMPLE.
+ *
+ * Portions Copyright (c) 1996-2012, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ * src/backend/utils/misc/sampling.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include <math.h>
+
+#include "utils/sampling.h"
+
+static unsigned short _sampler_seed[3] = { 0x330e, 0xabcd, 0x1234 };
+
+/*
+ * BlockSampler_Init -- prepare for random sampling of blocknumbers
+ *
+ * BlockSampler is used for stage one of our new two-stage tuple
+ * sampling mechanism as discussed on pgsql-hackers 2004-04-02 (subject
+ * "Large DB"). It selects a random sample of samplesize blocks out of
+ * the nblocks blocks in the table. If the table has less than
+ * samplesize blocks, all blocks are selected.
+ *
+ * Since we know the total number of blocks in advance, we can use the
+ * straightforward Algorithm S from Knuth 3.4.2, rather than Vitter's
+ * algorithm.
+ */
+void
+BlockSampler_Init(BlockSampler bs, BlockNumber nblocks, int samplesize)
+{
+ bs->N = nblocks; /* measured table size */
+
+ /*
+ * If we decide to reduce samplesize for tables that have less or not much
+ * more than samplesize blocks, here is the place to do it.
+ */
+ bs->n = samplesize;
+ bs->t = 0; /* blocks scanned so far */
+ bs->m = 0; /* blocks selected so far */
+}
+
+bool
+BlockSampler_HasMore(BlockSampler bs)
+{
+ return (bs->t < bs->N) && (bs->m < bs->n);
+}
+
+BlockNumber
+BlockSampler_Next(BlockSampler bs)
+{
+ BlockNumber K = bs->N - bs->t; /* remaining blocks */
+ int k = bs->n - bs->m; /* blocks still to sample */
+ double p; /* probability to skip block */
+ double V; /* random */
+
+ Assert(BlockSampler_HasMore(bs)); /* hence K > 0 and k > 0 */
+
+ if ((BlockNumber) k >= K)
+ {
+ /* need all the rest */
+ bs->m++;
+ return bs->t++;
+ }
+
+ /*----------
+ * It is not obvious that this code matches Knuth's Algorithm S.
+ * Knuth says to skip the current block with probability 1 - k/K.
+ * If we are to skip, we should advance t (hence decrease K), and
+ * repeat the same probabilistic test for the next block. The naive
+ * implementation thus requires an sampler_random_fract() call for each
+ * block number. But we can reduce this to one sampler_random_fract()
+ * call per selected block, by noting that each time the while-test
+ * succeeds, we can reinterpret V as a uniform random number in the range
+ * 0 to p. Therefore, instead of choosing a new V, we just adjust p to be
+ * the appropriate fraction of its former value, and our next loop
+ * makes the appropriate probabilistic test.
+ *
+ * We have initially K > k > 0. If the loop reduces K to equal k,
+ * the next while-test must fail since p will become exactly zero
+ * (we assume there will not be roundoff error in the division).
+ * (Note: Knuth suggests a "<=" loop condition, but we use "<" just
+ * to be doubly sure about roundoff error.) Therefore K cannot become
+ * less than k, which means that we cannot fail to select enough blocks.
+ *----------
+ */
+ V = sampler_random_fract();
+ p = 1.0 - (double) k / (double) K;
+ while (V < p)
+ {
+ /* skip */
+ bs->t++;
+ K--; /* keep K == N - t */
+
+ /* adjust p to be new cutoff point in reduced range */
+ p *= 1.0 - (double) k / (double) K;
+ }
+
+ /* select */
+ bs->m++;
+ return bs->t++;
+}
+
+
+/*----------
+ * Random number generator used by sampling
+ *----------
+ */
+
+void
+sampler_setseed(long seed)
+{
+ _sampler_seed[0] = 0x330e;
+ _sampler_seed[1] = (unsigned short) seed;
+ _sampler_seed[2] = (unsigned short) (seed >> 16);
+}
+
+/* Select a random value R uniformly distributed in (0 - 1) */
+double
+sampler_random_fract(void)
+{
+ return pg_erand48(_sampler_seed);
+}
diff --git a/src/include/access/tsm_bernoulli.h b/src/include/access/tsm_bernoulli.h
new file mode 100644
index 0000000..9488710
--- /dev/null
+++ b/src/include/access/tsm_bernoulli.h
@@ -0,0 +1,19 @@
+/*--------------------------------------------------------------------------
+ * tsm_bernoulli.h
+ * Header file for BERNOULLI table sampling method.
+ *
+ * Copyright (c) 2006-2014, PostgreSQL Global Development Group
+ *
+ * src/include/access/tsm_bernoulli.h
+ *--------------------------------------------------------------------------
+ */
+#ifndef TSM_BERNOULLI_H
+#define TSM_BERNOULLI_H
+
+extern Datum tsm_bernoulli_init(PG_FUNCTION_ARGS);
+extern Datum tsm_bernoulli_nextblock(PG_FUNCTION_ARGS);
+extern Datum tsm_bernoulli_nexttuple(PG_FUNCTION_ARGS);
+extern Datum tsm_bernoulli_end(PG_FUNCTION_ARGS);
+extern Datum tsm_bernoulli_reset(PG_FUNCTION_ARGS);
+
+#endif /* TSM_SYSTEM_H */
diff --git a/src/include/access/tsm_system.h b/src/include/access/tsm_system.h
new file mode 100644
index 0000000..37253da
--- /dev/null
+++ b/src/include/access/tsm_system.h
@@ -0,0 +1,19 @@
+/*--------------------------------------------------------------------------
+ * tsm_system.h
+ * Header file for SYSTEM table sampling method.
+ *
+ * Copyright (c) 2006-2014, PostgreSQL Global Development Group
+ *
+ * src/include/access/tsm_system.h
+ *--------------------------------------------------------------------------
+ */
+#ifndef TSM_SYSTEM_H
+#define TSM_SYSTEM_H
+
+extern Datum tsm_system_init(PG_FUNCTION_ARGS);
+extern Datum tsm_system_nextblock(PG_FUNCTION_ARGS);
+extern Datum tsm_system_nexttuple(PG_FUNCTION_ARGS);
+extern Datum tsm_system_end(PG_FUNCTION_ARGS);
+extern Datum tsm_system_reset(PG_FUNCTION_ARGS);
+
+#endif /* TSM_SYSTEM_H */
diff --git a/src/include/catalog/indexing.h b/src/include/catalog/indexing.h
index bde1a84..d40cfe6 100644
--- a/src/include/catalog/indexing.h
+++ b/src/include/catalog/indexing.h
@@ -305,6 +305,11 @@ DECLARE_UNIQUE_INDEX(pg_policy_oid_index, 3257, on pg_policy using btree(oid oid
DECLARE_UNIQUE_INDEX(pg_policy_polrelid_polname_index, 3258, on pg_policy using btree(polrelid oid_ops, polname name_ops));
#define PolicyPolrelidPolnameIndexId 3258
+DECLARE_UNIQUE_INDEX(pg_tablesamplemethod_name_index, 3262, on pg_tablesamplemethod using btree(tsmname name_ops));
+#define TableSampleMethodNameIndexId 3262
+DECLARE_UNIQUE_INDEX(pg_tablesamplemethod_oid_index, 3263, on pg_tablesamplemethod using btree(oid oid_ops));
+#define TableSampleMethodOidIndexId 3263
+
/* last step of initialization script: build the indexes declared above */
BUILD_INDICES
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 910cfc6..fdd83bb 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -5104,6 +5104,27 @@ DESCR("rank of hypothetical row without gaps");
DATA(insert OID = 3993 ( dense_rank_final PGNSP PGUID 12 1 0 2276 0 f f f f f f i 2 0 20 "2281 2276" "{2281,2276}" "{i,v}" _null_ _null_ hypothetical_dense_rank_final _null_ _null_ _null_ ));
DESCR("aggregate final function");
+DATA(insert OID = 3265 ( tsm_system_init PGNSP PGUID 12 1 0 0 0 f f f f t f v 3 0 2278 "2281 23 700" _null_ _null_ _null_ _null_ tsm_system_init _null_ _null_ _null_ ));
+DESCR("tsm_system_init(internal)");
+DATA(insert OID = 3266 ( tsm_system_nextblock PGNSP PGUID 12 1 0 0 0 f f f f t f v 1 0 23 "2281" _null_ _null_ _null_ _null_ tsm_system_nextblock _null_ _null_ _null_ ));
+DESCR("tsm_system_nextblock(internal)");
+DATA(insert OID = 3267 ( tsm_system_nexttuple PGNSP PGUID 12 1 0 0 0 f f f f t f v 1 0 21 "2281" _null_ _null_ _null_ _null_ tsm_system_nexttuple _null_ _null_ _null_ ));
+DESCR("tsm_system_nexttuple(internal)");
+DATA(insert OID = 3268 ( tsm_system_end PGNSP PGUID 12 1 0 0 0 f f f f t f v 1 0 2278 "2281" _null_ _null_ _null_ _null_ tsm_system_end _null_ _null_ _null_ ));
+DESCR("tsm_system_end(internal)");
+DATA(insert OID = 3269 ( tsm_system_reset PGNSP PGUID 12 1 0 0 0 f f f f t f v 1 0 2278 "2281" _null_ _null_ _null_ _null_ tsm_system_reset _null_ _null_ _null_ ));
+DESCR("tsm_system_reset(internal)");
+
+DATA(insert OID = 3271 ( tsm_bernoulli_init PGNSP PGUID 12 1 0 0 0 f f f f t f v 3 0 2278 "2281 23 700" _null_ _null_ _null_ _null_ tsm_bernoulli_init _null_ _null_ _null_ ));
+DESCR("tsm_bernoulli_init(internal)");
+DATA(insert OID = 3272 ( tsm_bernoulli_nextblock PGNSP PGUID 12 1 0 0 0 f f f f t f v 1 0 23 "2281" _null_ _null_ _null_ _null_ tsm_bernoulli_nextblock _null_ _null_ _null_ ));
+DESCR("tsm_bernoulli_nextblock(internal)");
+DATA(insert OID = 3273 ( tsm_bernoulli_nexttuple PGNSP PGUID 12 1 0 0 0 f f f f t f v 1 0 21 "2281" _null_ _null_ _null_ _null_ tsm_bernoulli_nexttuple _null_ _null_ _null_ ));
+DESCR("tsm_bernoulli_nexttuple(internal)");
+DATA(insert OID = 3274 ( tsm_bernoulli_end PGNSP PGUID 12 1 0 0 0 f f f f t f v 1 0 2278 "2281" _null_ _null_ _null_ _null_ tsm_bernoulli_end _null_ _null_ _null_ ));
+DESCR("tsm_bernoulli_end(internal)");
+DATA(insert OID = 3275 ( tsm_bernoulli_reset PGNSP PGUID 12 1 0 0 0 f f f f t f v 1 0 2278 "2281" _null_ _null_ _null_ _null_ tsm_bernoulli_reset _null_ _null_ _null_ ));
+DESCR("tsm_bernoulli_reset(internal)");
/*
* Symbolic values for provolatile column: these indicate whether the result
diff --git a/src/include/catalog/pg_tablesamplemethod.h b/src/include/catalog/pg_tablesamplemethod.h
new file mode 100644
index 0000000..229d8d2
--- /dev/null
+++ b/src/include/catalog/pg_tablesamplemethod.h
@@ -0,0 +1,68 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_tablesamplemethod.h
+ * definition of the table scan methods.
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/catalog/pg_tablesamplemethod.h
+ *
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PG_TABLESAMPLEMETHOD_H
+#define PG_TABLESAMPLEMETHOD_H
+
+#include "catalog/genbki.h"
+
+/* ----------------
+ * pg_tablesamplemethod definition. cpp turns this into
+ * typedef struct FormData_pg_tablesamplemethod
+ * ----------------
+ */
+#define TableSampleMethodRelationId 3261
+
+CATALOG(pg_tablesamplemethod,3261)
+{
+ NameData tsmname; /* tablescan method name */
+ regproc tsminit; /* init scan function */
+ regproc tsmnextblock; /* function returning next block to sample
+ or InvalidBlockOffset if finished */
+ regproc tsmnexttuple; /* function returning next tuple offset from current block
+ or InvalidOffsetNumber if end of the block was reacher */
+ regproc tsmend; /* end scan function*/
+ regproc tsmreset; /* reset state - used by rescan */
+} FormData_pg_tablesamplemethod;
+
+/* ----------------
+ * Form_pg_tablesamplemethod corresponds to a pointer to a tuple with
+ * the format of pg_tablesamplemethod relation.
+ * ----------------
+ */
+typedef FormData_pg_tablesamplemethod *Form_pg_tablesamplemethod;
+
+/* ----------------
+ * compiler constants for pg_tablesamplemethod
+ * ----------------
+ */
+#define Natts_pg_tablesamplemethod 6
+#define Anum_pg_tablesamplemethod_tsmname 1
+#define Anum_pg_tablesamplemethod_tsminit 2
+#define Anum_pg_tablesamplemethod_tsmnextblock 3
+#define Anum_pg_tablesamplemethod_tsmnexttuple 4
+#define Anum_pg_tablesamplemethod_tsmend 5
+#define Anum_pg_tablesamplemethod_tsmreset 6
+
+/* ----------------
+ * initial contents of pg_tablesamplemethod
+ * ----------------
+ */
+
+DATA(insert OID = 3264 ( system tsm_system_init tsm_system_nextblock tsm_system_nexttuple tsm_system_end tsm_system_reset ));
+DESCR("SYSTEM table sampling method");
+DATA(insert OID = 3270 ( bernoulli tsm_bernoulli_init tsm_bernoulli_nextblock tsm_bernoulli_nexttuple tsm_bernoulli_end tsm_bernoulli_reset ));
+DESCR("BERNOULLI table sampling method");
+
+#endif /* PG_TABLESAMPLEMETHOD_H */
diff --git a/src/include/executor/nodeSamplescan.h b/src/include/executor/nodeSamplescan.h
new file mode 100644
index 0000000..4b769da
--- /dev/null
+++ b/src/include/executor/nodeSamplescan.h
@@ -0,0 +1,24 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeSamplescan.h
+ *
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/executor/nodeSamplescan.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef NODESAMPLESCAN_H
+#define NODESAMPLESCAN_H
+
+#include "nodes/execnodes.h"
+
+extern SampleScanState *ExecInitSampleScan(SampleScan *node, EState *estate, int eflags);
+extern TupleTableSlot *ExecSampleScan(SampleScanState *node);
+extern void ExecEndSampleScan(SampleScanState *node);
+extern void ExecReScanSampleScan(SampleScanState *node);
+
+#endif /* NODESAMPLESCAN_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 41b13b2..b7f3129 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1212,6 +1212,26 @@ typedef struct ScanState
typedef ScanState SeqScanState;
/*
+ * SampleScan
+ */
+typedef struct SampleScanState
+{
+ ScanState ss;
+
+ /* Sampling method functions. */
+ FmgrInfo tsminit;
+ FmgrInfo tsmnextblock;
+ FmgrInfo tsmnexttuple;
+ FmgrInfo tsmend;
+ FmgrInfo tsmreset;
+
+ Buffer openbuffer; /* currently open buffer */
+ HeapTupleData tup; /* last tuple */
+
+ void *tsmdata; /* for use by table scan method */
+} SampleScanState;
+
+/*
* These structs store information about index quals that don't have simple
* constant right-hand sides. See comments for ExecIndexBuildScanKeys()
* for discussion.
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index bc71fea..cca592e 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -51,6 +51,7 @@ typedef enum NodeTag
T_BitmapOr,
T_Scan,
T_SeqScan,
+ T_SampleScan,
T_IndexScan,
T_IndexOnlyScan,
T_BitmapIndexScan,
@@ -97,6 +98,7 @@ typedef enum NodeTag
T_BitmapOrState,
T_ScanState,
T_SeqScanState,
+ T_SampleScanState,
T_IndexScanState,
T_IndexOnlyScanState,
T_BitmapIndexScanState,
@@ -413,6 +415,8 @@ typedef enum NodeTag
T_XmlSerialize,
T_WithClause,
T_CommonTableExpr,
+ T_RangeTableSample,
+ T_TableSampleClause,
/*
* TAGS FOR REPLICATION GRAMMAR PARSE NODES (replnodes.h)
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 5eaa435..cc1dd40 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -307,6 +307,21 @@ typedef struct FuncCall
} FuncCall;
/*
+ * TableSampleClause - a sampling method information
+ */
+typedef struct TableSampleClause
+{
+ NodeTag type;
+ Oid tsmid;
+ Oid tsminit;
+ Oid tsmnextblock;
+ Oid tsmnexttuple;
+ Oid tsmend;
+ Oid tsmreset;
+ List *args;
+} TableSampleClause;
+
+/*
* A_Star - '*' representing all columns of a table or compound field
*
* This can appear within ColumnRef.fields, A_Indirection.indirection, and
@@ -507,6 +522,20 @@ typedef struct RangeFunction
} RangeFunction;
/*
+ * RangeTableSample - represents <table> TABLESAMPLE <method> (<params>) REPEATABLE (<num>)
+ *
+ * We are more generic than SQL Standard so we pass generic function
+ * arguments to the sampling method.
+ */
+typedef struct RangeTableSample
+{
+ NodeTag type;
+ RangeVar *relation;
+ char *method; /* sampling method */
+ List *args; /* arguments for sampling method */
+} RangeTableSample;
+
+/*
* ColumnDef - column definition (used in various creates)
*
* If the column has a default value, we may have the value expression
@@ -751,6 +780,7 @@ typedef struct RangeTblEntry
*/
Oid relid; /* OID of the relation */
char relkind; /* relation kind (see pg_class.relkind) */
+ TableSampleClause *tablesample; /* sampling method and parameters */
/*
* Fields valid for a subquery RTE (else NULL):
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 48203a0..8427b44 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -278,6 +278,12 @@ typedef struct Scan
typedef Scan SeqScan;
/* ----------------
+ * table sample scan node
+ * ----------------
+ */
+typedef Scan SampleScan;
+
+/* ----------------
* index scan node
*
* indexqualorig is an implicitly-ANDed list of index qual expressions, each
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 75e2afb..889c61c 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -68,6 +68,8 @@ extern double index_pages_fetched(double tuples_fetched, BlockNumber pages,
double index_pages, PlannerInfo *root);
extern void cost_seqscan(Path *path, PlannerInfo *root, RelOptInfo *baserel,
ParamPathInfo *param_info);
+extern void cost_samplescan(Path *path, PlannerInfo *root, RelOptInfo *baserel,
+ ParamPathInfo *param_info);
extern void cost_index(IndexPath *path, PlannerInfo *root,
double loop_count);
extern void cost_bitmap_heap_scan(Path *path, PlannerInfo *root, RelOptInfo *baserel,
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 26b17f5..b96f903 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -32,6 +32,8 @@ extern bool add_path_precheck(RelOptInfo *parent_rel,
extern Path *create_seqscan_path(PlannerInfo *root, RelOptInfo *rel,
Relids required_outer);
+extern Path *create_samplescan_path(PlannerInfo *root, RelOptInfo *rel,
+ Relids required_outer);
extern IndexPath *create_index_path(PlannerInfo *root,
IndexOptInfo *index,
List *indexclauses,
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index e14dc9a..e565082 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -312,7 +312,7 @@ PG_KEYWORD("reindex", REINDEX, UNRESERVED_KEYWORD)
PG_KEYWORD("relative", RELATIVE_P, UNRESERVED_KEYWORD)
PG_KEYWORD("release", RELEASE, UNRESERVED_KEYWORD)
PG_KEYWORD("rename", RENAME, UNRESERVED_KEYWORD)
-PG_KEYWORD("repeatable", REPEATABLE, UNRESERVED_KEYWORD)
+PG_KEYWORD("repeatable", REPEATABLE, RESERVED_KEYWORD)
PG_KEYWORD("replace", REPLACE, UNRESERVED_KEYWORD)
PG_KEYWORD("replica", REPLICA, UNRESERVED_KEYWORD)
PG_KEYWORD("reset", RESET, UNRESERVED_KEYWORD)
@@ -366,6 +366,7 @@ PG_KEYWORD("sysid", SYSID, UNRESERVED_KEYWORD)
PG_KEYWORD("system", SYSTEM_P, UNRESERVED_KEYWORD)
PG_KEYWORD("table", TABLE, RESERVED_KEYWORD)
PG_KEYWORD("tables", TABLES, UNRESERVED_KEYWORD)
+PG_KEYWORD("tablesample", TABLESAMPLE, TYPE_FUNC_NAME_KEYWORD)
PG_KEYWORD("tablespace", TABLESPACE, UNRESERVED_KEYWORD)
PG_KEYWORD("temp", TEMP, UNRESERVED_KEYWORD)
PG_KEYWORD("template", TEMPLATE, UNRESERVED_KEYWORD)
diff --git a/src/include/parser/parse_func.h b/src/include/parser/parse_func.h
index 4423bc0..0202bf5 100644
--- a/src/include/parser/parse_func.h
+++ b/src/include/parser/parse_func.h
@@ -33,6 +33,9 @@ typedef enum
extern Node *ParseFuncOrColumn(ParseState *pstate, List *funcname, List *fargs,
FuncCall *fn, int location);
+extern TableSampleClause *ParseTableSample(ParseState *pstate,
+ char *samplemethod, List *args);
+
extern FuncDetailCode func_get_detail(List *funcname,
List *fargs, List *fargnames,
int nargs, Oid *argtypes,
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index 48ebf59..1ba06b6 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -63,7 +63,6 @@ typedef struct RelationAmInfo
FmgrInfo amcanreturn;
} RelationAmInfo;
-
/*
* Here are the contents of a relation cache entry.
*/
diff --git a/src/include/utils/sampling.h b/src/include/utils/sampling.h
new file mode 100644
index 0000000..607f75f
--- /dev/null
+++ b/src/include/utils/sampling.h
@@ -0,0 +1,43 @@
+/*-------------------------------------------------------------------------
+ *
+ * sampling.h
+ * definitions for sampling functions
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/utils/sampling.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef SAMPLING_H
+#define SAMPLING_H
+
+#include "storage/bufmgr.h"
+
+/* Data structure for Algorithm S from Knuth 3.4.2 */
+typedef struct
+{
+ BlockNumber N; /* number of blocks, known in advance */
+ int n; /* desired sample size */
+ BlockNumber t; /* current block number */
+ int m; /* blocks selected so far */
+} BlockSamplerData;
+
+typedef BlockSamplerData *BlockSampler;
+
+extern void BlockSampler_Init(BlockSampler bs, BlockNumber nblocks,
+ int samplesize);
+extern bool BlockSampler_HasMore(BlockSampler bs);
+extern BlockNumber BlockSampler_Next(BlockSampler bs);
+
+/* Vitter reservoir sampling functions */
+extern double vitter_init_selection_state(int n);
+extern double vitter_get_next_S(double t, int n, double *stateptr);
+
+/* Random generator */
+extern void sampler_setseed(long seed);
+extern double sampler_random_fract(void);
+
+#endif /* SAMPLING_H */
diff --git a/src/include/utils/syscache.h b/src/include/utils/syscache.h
index f97229f..29244c7 100644
--- a/src/include/utils/syscache.h
+++ b/src/include/utils/syscache.h
@@ -79,6 +79,8 @@ enum SysCacheIdentifier
RELOID,
RULERELNAME,
STATRELATTINH,
+ TABLESAMPLEMETHODNAME,
+ TABLESAMPLEMETHODOID,
TABLESPACEOID,
TSCONFIGMAP,
TSCONFIGNAMENSP,
diff --git a/src/test/regress/expected/sanity_check.out b/src/test/regress/expected/sanity_check.out
index c7be273..970d4da 100644
--- a/src/test/regress/expected/sanity_check.out
+++ b/src/test/regress/expected/sanity_check.out
@@ -127,6 +127,7 @@ pg_shdepend|t
pg_shdescription|t
pg_shseclabel|t
pg_statistic|t
+pg_tablesamplemethod|t
pg_tablespace|t
pg_trigger|t
pg_ts_config|t
diff --git a/src/test/regress/expected/tablesample.out b/src/test/regress/expected/tablesample.out
new file mode 100644
index 0000000..79ed140
--- /dev/null
+++ b/src/test/regress/expected/tablesample.out
@@ -0,0 +1,68 @@
+CREATE TABLE test_tablesample (id INT, name text) WITH (fillfactor=10); -- force smaller pages so we don't have to load too much data to get multiple pages
+INSERT INTO test_tablesample SELECT i, repeat(i::text, 200) FROM generate_series(0, 9) s(i) ORDER BY i;
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50) REPEATABLE (10);
+ id
+----
+ 0
+ 1
+ 2
+ 3
+ 4
+ 5
+ 9
+(7 rows)
+
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (100.0/11) REPEATABLE (9999);
+ id
+----
+ 6
+ 7
+ 8
+(3 rows)
+
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (100) REPEATABLE (10);
+ id
+----
+ 0
+ 1
+ 2
+ 3
+ 4
+ 5
+ 6
+ 7
+ 8
+ 9
+(10 rows)
+
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50) REPEATABLE (100);
+ id
+----
+ 0
+ 1
+ 2
+ 6
+ 7
+ 8
+ 9
+(7 rows)
+
+SELECT id FROM test_tablesample TABLESAMPLE BERNOULLI (50) REPEATABLE (100);
+ id
+----
+ 0
+ 1
+ 3
+ 4
+ 5
+ 9
+(6 rows)
+
+SELECT id FROM test_tablesample TABLESAMPLE BERNOULLI (5.5) REPEATABLE (1);
+ id
+----
+ 0
+ 5
+(2 rows)
+
+DROP TABLE test_tablesample;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 62cc198..cf789dc 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -83,7 +83,7 @@ test: select_into select_distinct select_distinct_on select_implicit select_havi
# ----------
# Another group of parallel tests
# ----------
-test: brin gin gist spgist privileges security_label collate matview lock replica_identity rowsecurity
+test: brin gin gist spgist privileges security_label collate matview lock replica_identity rowsecurity tablesample
# ----------
# Another group of parallel tests
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 07fc827..852fed9 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -151,3 +151,4 @@ test: with
test: xml
test: event_trigger
test: stats
+test: tablesample
diff --git a/src/test/regress/sql/tablesample.sql b/src/test/regress/sql/tablesample.sql
new file mode 100644
index 0000000..5f6e828
--- /dev/null
+++ b/src/test/regress/sql/tablesample.sql
@@ -0,0 +1,12 @@
+CREATE TABLE test_tablesample (id INT, name text) WITH (fillfactor=10); -- force smaller pages so we don't have to load too much data to get multiple pages
+
+INSERT INTO test_tablesample SELECT i, repeat(i::text, 200) FROM generate_series(0, 9) s(i) ORDER BY i;
+
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50) REPEATABLE (10);
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (100.0/11) REPEATABLE (9999);
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (100) REPEATABLE (10);
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50) REPEATABLE (100);
+SELECT id FROM test_tablesample TABLESAMPLE BERNOULLI (50) REPEATABLE (100);
+SELECT id FROM test_tablesample TABLESAMPLE BERNOULLI (5.5) REPEATABLE (1);
+
+DROP TABLE test_tablesample;
On 11/12/14 00:24, Petr Jelinek wrote:
Hello,
Attached is a basic implementation of TABLESAMPLE clause. It's SQL
standard clause and couple of people tried to submit it before so I
think I don't need to explain in length what it does - basically returns
"random" sample of a table using a specified sampling method.I implemented both SYSTEM and BERNOULLI sampling as specified by SQL
standard. The SYSTEM sampling does block level sampling using same
algorithm as ANALYZE, BERNOULLI scans whole table and picks tuple randomly.There is API for sampling methods which consists of 4 functions at the
moment - init, end, nextblock and nexttuple. I added catalog which maps
the sampling method to the functions implementing this API. The grammar
creates new TableSampleRange struct that I added for sampling. Parser
then uses the catalog to load information about the sampling method into
TableSampleClause which is then attached to RTE. Planner checks for if
this parameter is present in the RTE and if it finds it it will create
plan with just one path - SampleScan. SampleScan implements standard
executor API and calls the sampling method API as needed.It is possible to write custom sampling methods. The sampling method
parameters are not limited to just percent number as in standard but
dynamic list of expressions which is checked against the definition of
the init function in a similar fashion (although much simplified) as
function calls are.Notable lacking parts are:
- proper costing and returned row count estimation - given the dynamic
nature of parameters I think for we'll need to let the sampling method
do this, so there will have to be fifth function in the API.
- ruleutils support (it needs a bit of code in get_from_clause_item
function)
- docs are sparse at the moment
Forgot the obligatory:
The research leading to these results has received funding from the
European Union's Seventh Framework Programme (FP7/2007-2013) under grant
agreement n° 318633.
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Dec 10, 2014 at 6:24 PM, Petr Jelinek <petr@2ndquadrant.com> wrote:
Hello,
Attached is a basic implementation of TABLESAMPLE clause. It's SQL standard
clause and couple of people tried to submit it before so I think I don't
need to explain in length what it does - basically returns "random" sample
of a table using a specified sampling method.
Tablesample, yay!
Sadly when the jsonb functions patch was committed a few oids where
used, so you should update the ones you are using. at least to make
the patch easier for testing.
The test added for this failed, attached is the diff. i didn't looked
up why it failed
Finally, i created a view with a tablesample clause. i used the view
and the tablesample worked, then dumped and restored and the
tablesample clause went away... actually pg_get_viewdef() didn't see
it at all.
will look at the patch more close tomorrow when my brain wake up ;)
--
Jaime Casanova www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación
Phone: +593 4 5107566 Cell: +593 987171157
Attachments:
regression.diffsapplication/octet-stream; name=regression.diffsDownload
*** /home/jcasanov/Documentos/2ndQuadrant/postgresql/src/test/regress/expected/tablesample.out 2014-12-16 01:10:00.229590530 -0500
--- /home/jcasanov/Documentos/2ndQuadrant/postgresql/src/test/regress/results/tablesample.out 2014-12-16 02:18:43.262035533 -0500
***************
*** 55,62 ****
3
4
5
! 9
! (6 rows)
SELECT id FROM test_tablesample TABLESAMPLE BERNOULLI (5.5) REPEATABLE (1);
id
--- 55,61 ----
3
4
5
! (5 rows)
SELECT id FROM test_tablesample TABLESAMPLE BERNOULLI (5.5) REPEATABLE (1);
id
***************
*** 65,68 ****
5
(2 rows)
- DROP TABLE test_tablesample;
--- 64,66 ----
======================================================================
On 16/12/14 08:43, Jaime Casanova wrote:
On Wed, Dec 10, 2014 at 6:24 PM, Petr Jelinek <petr@2ndquadrant.com> wrote:
Hello,
Attached is a basic implementation of TABLESAMPLE clause. It's SQL standard
clause and couple of people tried to submit it before so I think I don't
need to explain in length what it does - basically returns "random" sample
of a table using a specified sampling method.Tablesample, yay!
Sadly when the jsonb functions patch was committed a few oids where
used, so you should update the ones you are using. at least to make
the patch easier for testing.
Will do.
The test added for this failed, attached is the diff. i didn't looked
up why it failed
It isn't immediately obvious to me why, will look into it.
Finally, i created a view with a tablesample clause. i used the view
and the tablesample worked, then dumped and restored and the
tablesample clause went away... actually pg_get_viewdef() didn't see
it at all.
Yeah, as I mentioned in the submission the ruleutils support is not
there yet, so that's expected.
will look at the patch more close tomorrow when my brain wake up ;)
Thanks!
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
I noticed that this makes REPEATABLE a reserved keyword, which is
currently an unreserved one. Can we avoid that?
--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 17/12/14 19:55, Alvaro Herrera wrote:
I noticed that this makes REPEATABLE a reserved keyword, which is
currently an unreserved one. Can we avoid that?
I added it because I did not find any other way to fix the shift/reduce
conflicts that bison complained about. I am no bison expert though.
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi,
v2 version of this patch is attached.
On 16/12/14 09:31, Petr Jelinek wrote:
On 16/12/14 08:43, Jaime Casanova wrote:
Sadly when the jsonb functions patch was committed a few oids where
used, so you should update the ones you are using. at least to make
the patch easier for testing.Will do.
Done.
The test added for this failed, attached is the diff. i didn't looked
up why it failedIt isn't immediately obvious to me why, will look into it.
Fixed.
Finally, i created a view with a tablesample clause. i used the view
and the tablesample worked, then dumped and restored and the
tablesample clause went away... actually pg_get_viewdef() didn't see
it at all.Yeah, as I mentioned in the submission the ruleutils support is not
there yet, so that's expected.
Also fixed.
I also added proper costing/row estimation. I consider this patch
feature complete now, docs could still use improvement though.
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
tablesample-v2.patchtext/x-diff; name=tablesample-v2.patchDownload
diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml
index 01d24a5..250ae29 100644
--- a/doc/src/sgml/ref/select.sgml
+++ b/doc/src/sgml/ref/select.sgml
@@ -49,7 +49,7 @@ SELECT [ ALL | DISTINCT [ ON ( <replaceable class="parameter">expression</replac
<phrase>where <replaceable class="parameter">from_item</replaceable> can be one of:</phrase>
- [ ONLY ] <replaceable class="parameter">table_name</replaceable> [ * ] [ [ AS ] <replaceable class="parameter">alias</replaceable> [ ( <replaceable class="parameter">column_alias</replaceable> [, ...] ) ] ]
+ [ ONLY ] <replaceable class="parameter">table_name</replaceable> [ * ] [ TABLESAMPLE <replaceable class="parameter">sampling_method</replaceable> ( <replaceable class="parameter">argument</replaceable> [, ...] ) [ REPEATABLE ( <replaceable class="parameter">seed</replaceable> ) ] ] [ [ AS ] <replaceable class="parameter">alias</replaceable> [ ( <replaceable class="parameter">column_alias</replaceable> [, ...] ) ] ]
[ LATERAL ] ( <replaceable class="parameter">select</replaceable> ) [ AS ] <replaceable class="parameter">alias</replaceable> [ ( <replaceable class="parameter">column_alias</replaceable> [, ...] ) ]
<replaceable class="parameter">with_query_name</replaceable> [ [ AS ] <replaceable class="parameter">alias</replaceable> [ ( <replaceable class="parameter">column_alias</replaceable> [, ...] ) ] ]
[ LATERAL ] <replaceable class="parameter">function_name</replaceable> ( [ <replaceable class="parameter">argument</replaceable> [, ...] ] )
@@ -317,6 +317,38 @@ TABLE [ ONLY ] <replaceable class="parameter">table_name</replaceable> [ * ]
</varlistentry>
<varlistentry>
+ <term>TABLESAMPLE <replaceable class="parameter">sampling_method</replaceable> ( <replaceable class="parameter">argument</replaceable> [, ...] ) [ REPEATABLE ( <replaceable class="parameter">seed</replaceable> ) ]</term>
+ <listitem>
+ <para>
+ Table sample clause after
+ <replaceable class="parameter">table_name</replaceable> indicates that
+ a <replaceable class="parameter">sampling_method</replaceable> should
+ be used to retrieve subset of rows in the table.
+ The <replaceable class="parameter">sampling_method</replaceable> can be
+ one of:
+ <itemizedlist>
+ <listitem>
+ <para><literal>SYSTEM</literal></para>
+ </listitem>
+ <listitem>
+ <para><literal>BERNOULLI</literal></para>
+ </listitem>
+ </itemizedlist>
+ Both of those sampling methods currently accept only single argument
+ which is the percent (floating point from 0 to 100) of the rows to
+ be returned.
+ The <literal>SYSTEM</literal> sampling method does block level
+ sampling with each block having same chance of being selected and
+ returns all rows from each selected block.
+ The <literal>BERNOULLI</literal> scans whole table and returns
+ individual rows with equal probability.
+ The optional numeric parameter <literal>REPEATABLE</literal> is used
+ as random seed for sampling.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><replaceable class="parameter">alias</replaceable></term>
<listitem>
<para>
diff --git a/src/backend/access/Makefile b/src/backend/access/Makefile
index 21721b4..595737c 100644
--- a/src/backend/access/Makefile
+++ b/src/backend/access/Makefile
@@ -8,6 +8,7 @@ subdir = src/backend/access
top_builddir = ../../..
include $(top_builddir)/src/Makefile.global
-SUBDIRS = brin common gin gist hash heap index nbtree rmgrdesc spgist transam
+SUBDIRS = brin common gin gist hash heap index nbtree rmgrdesc spgist \
+ transam tsm
include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/tsm/Makefile b/src/backend/access/tsm/Makefile
new file mode 100644
index 0000000..73bbbd7
--- /dev/null
+++ b/src/backend/access/tsm/Makefile
@@ -0,0 +1,17 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+# Makefile for access/tsm
+#
+# IDENTIFICATION
+# src/backend/access/tsm/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/backend/access/tsm
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS = tsm_system.o tsm_bernoulli.o
+
+include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/tsm/tsm_bernoulli.c b/src/backend/access/tsm/tsm_bernoulli.c
new file mode 100644
index 0000000..ad419b9
--- /dev/null
+++ b/src/backend/access/tsm/tsm_bernoulli.c
@@ -0,0 +1,174 @@
+/*-------------------------------------------------------------------------
+ *
+ * tsm_bernoulli.c
+ * interface routines for BERNOULLI table sample method
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/access/tsm/tsm_bernoulli.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "fmgr.h"
+
+#include "access/tsm_bernoulli.h"
+
+#include "nodes/execnodes.h"
+#include "nodes/relation.h"
+#include "optimizer/clauses.h"
+#include "storage/bufmgr.h"
+#include "utils/sampling.h"
+
+
+/* Data structure for Algorithm S from Knuth 3.4.2 */
+typedef struct
+{
+ long seed;
+ BlockNumber tblocks;
+ BlockNumber blockno;
+ float percent;
+ OffsetNumber lt; /* last tuple returned from current block */
+} BernoulliSamplerData;
+
+
+Datum
+tsm_bernoulli_init(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ long seed = PG_GETARG_UINT32(1);
+ float4 percent = PG_GETARG_FLOAT4(2);
+ Relation rel = scanstate->ss.ss_currentRelation;
+ BernoulliSamplerData *sampler;
+
+ if (percent < 0 || percent > 100)
+ ereport(ERROR,
+ (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+ errmsg("invalid sample size"),
+ errhint("Sample size can be numeric value between 0 and 100 (inclusive).")));
+
+ sampler = palloc0(sizeof(BernoulliSamplerData));
+
+ /* Remember initial values for reinit */
+ sampler->seed = seed;
+ sampler->tblocks = RelationGetNumberOfBlocks(rel);
+ sampler->blockno = InvalidBlockNumber;
+ sampler->percent = percent / 100;
+ sampler->lt = InvalidOffsetNumber;
+
+ sampler_setseed(seed);
+
+ scanstate->tsmdata = (void *) sampler;
+
+ PG_RETURN_VOID();
+}
+
+Datum
+tsm_bernoulli_nextblock(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ BernoulliSamplerData *sampler = (BernoulliSamplerData *) scanstate->tsmdata;
+
+ if (sampler->blockno == InvalidBlockNumber)
+ sampler->blockno = 0;
+ else if (++sampler->blockno >= sampler->tblocks)
+ PG_RETURN_UINT32(InvalidBlockNumber);
+
+ PG_RETURN_UINT32(sampler->blockno);
+}
+
+Datum
+tsm_bernoulli_nexttuple(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ OffsetNumber maxoffset = PG_GETARG_UINT16(2);
+ BernoulliSamplerData *sampler = (BernoulliSamplerData *) scanstate->tsmdata;
+ OffsetNumber tupoffset = sampler->lt;
+ double percent = sampler->percent;
+
+ if (tupoffset == InvalidOffsetNumber)
+ tupoffset = FirstOffsetNumber;
+ else
+ tupoffset++;
+
+ /* Every tuple has percent chance of being returned */
+ while (sampler_random_fract() > percent)
+ {
+ tupoffset++;
+
+ if (tupoffset > maxoffset)
+ break;
+ }
+
+ if (tupoffset > maxoffset)
+ /* Tell SampleScan that we want next block. */
+ tupoffset = InvalidOffsetNumber;
+
+ sampler->lt = tupoffset;
+
+ PG_RETURN_UINT16(tupoffset);
+}
+
+Datum
+tsm_bernoulli_end(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+
+ pfree(scanstate->tsmdata);
+
+ PG_RETURN_VOID();
+}
+
+Datum
+tsm_bernoulli_reset(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ BernoulliSamplerData *sampler = (BernoulliSamplerData *) scanstate->tsmdata;
+
+ sampler->blockno = InvalidBlockNumber;
+ sampler->lt = InvalidOffsetNumber;
+ sampler_setseed(sampler->seed);
+
+ PG_RETURN_VOID();
+}
+
+Datum
+tsm_bernoulli_cost(PG_FUNCTION_ARGS)
+{
+ PlannerInfo *root = (PlannerInfo *) PG_GETARG_POINTER(0);
+ SamplePath *path = (SamplePath *) PG_GETARG_POINTER(1);
+ RelOptInfo *baserel = (RelOptInfo *) PG_GETARG_POINTER(2);
+ BlockNumber *pages = (BlockNumber *) PG_GETARG_POINTER(3);
+ double *tuples = (double *) PG_GETARG_POINTER(4);
+ List *args = path->tsmargs;
+ Node *pctnode;
+ float4 percent;
+
+ SamplerAccessStrategy *strategy = (SamplerAccessStrategy *) PG_GETARG_POINTER(5);
+
+ *strategy = SAS_SEQUENTIAL;
+ *pages = baserel->pages;
+
+ pctnode = linitial(args);
+ pctnode = estimate_expression_value(root, pctnode);
+
+ if (IsA(pctnode, RelabelType))
+ pctnode = (Node *) ((RelabelType *) pctnode)->arg;
+
+ if (!IsA(pctnode, Const))
+ {
+ *tuples = baserel->tuples * 0.1;
+ PG_RETURN_VOID();
+ }
+
+ percent = DatumGetFloat4(((Const *) pctnode)->constvalue);
+ percent /= 100.0;
+
+ *tuples = baserel->tuples * percent;
+
+ PG_RETURN_VOID();
+}
diff --git a/src/backend/access/tsm/tsm_system.c b/src/backend/access/tsm/tsm_system.c
new file mode 100644
index 0000000..733e2ae
--- /dev/null
+++ b/src/backend/access/tsm/tsm_system.c
@@ -0,0 +1,164 @@
+/*-------------------------------------------------------------------------
+ *
+ * tsm_system.c
+ * interface routines for system table sample method
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/access/tsm/tsm_system.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "fmgr.h"
+
+#include "access/tsm_system.h"
+
+#include "nodes/execnodes.h"
+#include "nodes/relation.h"
+#include "optimizer/clauses.h"
+#include "storage/bufmgr.h"
+#include "utils/sampling.h"
+
+
+/* Data structure for Algorithm S from Knuth 3.4.2 */
+typedef struct
+{
+ BlockSamplerData bs;
+ long seed;
+ BlockNumber tblocks;
+ int samplesize;
+ OffsetNumber lt; /* last tuple returned from current block */
+} SystemSamplerData;
+
+
+Datum
+tsm_system_init(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ long seed = PG_GETARG_UINT32(1);
+ float4 percent = PG_GETARG_FLOAT4(2);
+ SystemSamplerData *sampler;
+
+ if (percent < 0 || percent > 100)
+ ereport(ERROR,
+ (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+ errmsg("invalid sample size"),
+ errhint("Sample size can be numeric value between 0 and 100 (inclusive).")));
+
+ sampler = palloc0(sizeof(SystemSamplerData));
+
+ /* Remember initial values for reinit */
+ sampler->seed = seed;
+ sampler->tblocks = RelationGetNumberOfBlocks(scanstate->ss.ss_currentRelation);
+ sampler->samplesize = 1 + (int) (sampler->tblocks * (percent / 100.0));
+ sampler->lt = InvalidOffsetNumber;
+
+ sampler_setseed(seed);
+ BlockSampler_Init(&sampler->bs, sampler->tblocks, sampler->samplesize);
+
+ scanstate->tsmdata = (void *) sampler;
+
+ PG_RETURN_VOID();
+}
+
+Datum
+tsm_system_nextblock(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ SystemSamplerData *sampler = (SystemSamplerData *) scanstate->tsmdata;
+ BlockNumber blockno;
+
+ if (!BlockSampler_HasMore(&sampler->bs))
+ PG_RETURN_UINT32(InvalidBlockNumber);
+
+ blockno = BlockSampler_Next(&sampler->bs);
+
+ PG_RETURN_UINT32(blockno);
+}
+
+Datum
+tsm_system_nexttuple(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ OffsetNumber maxoffset = PG_GETARG_UINT16(2);
+ SystemSamplerData *sampler = (SystemSamplerData *) scanstate->tsmdata;
+ OffsetNumber tupoffset = sampler->lt;
+
+ if (tupoffset == InvalidOffsetNumber)
+ tupoffset = FirstOffsetNumber;
+ else
+ tupoffset++;
+
+ if (tupoffset > maxoffset)
+ tupoffset = InvalidOffsetNumber;
+
+ sampler->lt = tupoffset;
+
+ PG_RETURN_UINT16(tupoffset);
+}
+
+Datum
+tsm_system_end(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+
+ pfree(scanstate->tsmdata);
+
+ PG_RETURN_VOID();
+}
+
+Datum
+tsm_system_reset(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ SystemSamplerData *sampler = (SystemSamplerData *) scanstate->tsmdata;
+
+ sampler->lt = InvalidOffsetNumber;
+ sampler_setseed(sampler->seed);
+ BlockSampler_Init(&sampler->bs, sampler->tblocks, sampler->samplesize);
+
+ PG_RETURN_VOID();
+}
+
+Datum
+tsm_system_cost(PG_FUNCTION_ARGS)
+{
+ PlannerInfo *root = (PlannerInfo *) PG_GETARG_POINTER(0);
+ SamplePath *path = (SamplePath *) PG_GETARG_POINTER(1);
+ RelOptInfo *baserel = (RelOptInfo *) PG_GETARG_POINTER(2);
+ BlockNumber *pages = (BlockNumber *) PG_GETARG_POINTER(3);
+ double *tuples = (double *) PG_GETARG_POINTER(4);
+ List *args = path->tsmargs;
+ Node *pctnode;
+ float4 percent;
+
+ SamplerAccessStrategy *strategy = (SamplerAccessStrategy *) PG_GETARG_POINTER(5);
+
+ *strategy = SAS_RANDOM;
+
+ pctnode = linitial(args);
+ pctnode = estimate_expression_value(root, pctnode);
+
+ if (IsA(pctnode, RelabelType))
+ pctnode = (Node *) ((RelabelType *) pctnode)->arg;
+
+ if (!IsA(pctnode, Const))
+ {
+ *pages = baserel->pages * 0.1;
+ *tuples = baserel->tuples * 0.1;
+ PG_RETURN_VOID();
+ }
+
+ percent = DatumGetFloat4(((Const *) pctnode)->constvalue);
+ percent /= 100.0;
+
+ *pages = baserel->pages * percent;
+ *tuples = baserel->tuples * percent;
+
+ PG_RETURN_VOID();
+}
diff --git a/src/backend/catalog/Makefile b/src/backend/catalog/Makefile
index a403c64..5598244 100644
--- a/src/backend/catalog/Makefile
+++ b/src/backend/catalog/Makefile
@@ -39,7 +39,7 @@ POSTGRES_BKI_SRCS = $(addprefix $(top_srcdir)/src/include/catalog/,\
pg_ts_config.h pg_ts_config_map.h pg_ts_dict.h \
pg_ts_parser.h pg_ts_template.h pg_extension.h \
pg_foreign_data_wrapper.h pg_foreign_server.h pg_user_mapping.h \
- pg_foreign_table.h pg_policy.h \
+ pg_foreign_table.h pg_policy.h pg_tablesamplemethod.h \
pg_default_acl.h pg_seclabel.h pg_shseclabel.h pg_collation.h pg_range.h \
toasting.h indexing.h \
)
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 732ab22..4b011c7 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -50,23 +50,13 @@
#include "utils/lsyscache.h"
#include "utils/memutils.h"
#include "utils/pg_rusage.h"
+#include "utils/sampling.h"
#include "utils/sortsupport.h"
#include "utils/syscache.h"
#include "utils/timestamp.h"
#include "utils/tqual.h"
-/* Data structure for Algorithm S from Knuth 3.4.2 */
-typedef struct
-{
- BlockNumber N; /* number of blocks, known in advance */
- int n; /* desired sample size */
- BlockNumber t; /* current block number */
- int m; /* blocks selected so far */
-} BlockSamplerData;
-
-typedef BlockSamplerData *BlockSampler;
-
/* Per-index data for ANALYZE */
typedef struct AnlIndexData
{
@@ -88,10 +78,6 @@ static BufferAccessStrategy vac_strategy;
static void do_analyze_rel(Relation onerel, VacuumStmt *vacstmt,
AcquireSampleRowsFunc acquirefunc, BlockNumber relpages,
bool inh, bool in_outer_xact, int elevel);
-static void BlockSampler_Init(BlockSampler bs, BlockNumber nblocks,
- int samplesize);
-static bool BlockSampler_HasMore(BlockSampler bs);
-static BlockNumber BlockSampler_Next(BlockSampler bs);
static void compute_index_stats(Relation onerel, double totalrows,
AnlIndexData *indexdata, int nindexes,
HeapTuple *rows, int numrows,
@@ -947,94 +933,6 @@ examine_attribute(Relation onerel, int attnum, Node *index_expr)
}
/*
- * BlockSampler_Init -- prepare for random sampling of blocknumbers
- *
- * BlockSampler is used for stage one of our new two-stage tuple
- * sampling mechanism as discussed on pgsql-hackers 2004-04-02 (subject
- * "Large DB"). It selects a random sample of samplesize blocks out of
- * the nblocks blocks in the table. If the table has less than
- * samplesize blocks, all blocks are selected.
- *
- * Since we know the total number of blocks in advance, we can use the
- * straightforward Algorithm S from Knuth 3.4.2, rather than Vitter's
- * algorithm.
- */
-static void
-BlockSampler_Init(BlockSampler bs, BlockNumber nblocks, int samplesize)
-{
- bs->N = nblocks; /* measured table size */
-
- /*
- * If we decide to reduce samplesize for tables that have less or not much
- * more than samplesize blocks, here is the place to do it.
- */
- bs->n = samplesize;
- bs->t = 0; /* blocks scanned so far */
- bs->m = 0; /* blocks selected so far */
-}
-
-static bool
-BlockSampler_HasMore(BlockSampler bs)
-{
- return (bs->t < bs->N) && (bs->m < bs->n);
-}
-
-static BlockNumber
-BlockSampler_Next(BlockSampler bs)
-{
- BlockNumber K = bs->N - bs->t; /* remaining blocks */
- int k = bs->n - bs->m; /* blocks still to sample */
- double p; /* probability to skip block */
- double V; /* random */
-
- Assert(BlockSampler_HasMore(bs)); /* hence K > 0 and k > 0 */
-
- if ((BlockNumber) k >= K)
- {
- /* need all the rest */
- bs->m++;
- return bs->t++;
- }
-
- /*----------
- * It is not obvious that this code matches Knuth's Algorithm S.
- * Knuth says to skip the current block with probability 1 - k/K.
- * If we are to skip, we should advance t (hence decrease K), and
- * repeat the same probabilistic test for the next block. The naive
- * implementation thus requires an anl_random_fract() call for each block
- * number. But we can reduce this to one anl_random_fract() call per
- * selected block, by noting that each time the while-test succeeds,
- * we can reinterpret V as a uniform random number in the range 0 to p.
- * Therefore, instead of choosing a new V, we just adjust p to be
- * the appropriate fraction of its former value, and our next loop
- * makes the appropriate probabilistic test.
- *
- * We have initially K > k > 0. If the loop reduces K to equal k,
- * the next while-test must fail since p will become exactly zero
- * (we assume there will not be roundoff error in the division).
- * (Note: Knuth suggests a "<=" loop condition, but we use "<" just
- * to be doubly sure about roundoff error.) Therefore K cannot become
- * less than k, which means that we cannot fail to select enough blocks.
- *----------
- */
- V = anl_random_fract();
- p = 1.0 - (double) k / (double) K;
- while (V < p)
- {
- /* skip */
- bs->t++;
- K--; /* keep K == N - t */
-
- /* adjust p to be new cutoff point in reduced range */
- p *= 1.0 - (double) k / (double) K;
- }
-
- /* select */
- bs->m++;
- return bs->t++;
-}
-
-/*
* acquire_sample_rows -- acquire a random sample of rows from the table
*
* Selected rows are returned in the caller-allocated array rows[], which
@@ -1089,6 +987,8 @@ acquire_sample_rows(Relation onerel, int elevel,
/* Need a cutoff xmin for HeapTupleSatisfiesVacuum */
OldestXmin = GetOldestXmin(onerel, true);
+ /* Seed the sampler random number generator */
+ sampler_setseed(random());
/* Prepare for sampling block numbers */
BlockSampler_Init(&bs, totalblocks, targrows);
/* Prepare for sampling rows */
@@ -1249,7 +1149,7 @@ acquire_sample_rows(Relation onerel, int elevel,
* Found a suitable tuple, so save it, replacing one
* old tuple at random
*/
- int k = (int) (targrows * anl_random_fract());
+ int k = (int) (targrows * sampler_random_fract());
Assert(k >= 0 && k < targrows);
heap_freetuple(rows[k]);
@@ -1308,13 +1208,6 @@ acquire_sample_rows(Relation onerel, int elevel,
return numrows;
}
-/* Select a random value R uniformly distributed in (0 - 1) */
-double
-anl_random_fract(void)
-{
- return ((double) random() + 1) / ((double) MAX_RANDOM_VALUE + 2);
-}
-
/*
* These two routines embody Algorithm Z from "Random sampling with a
* reservoir" by Jeffrey S. Vitter, in ACM Trans. Math. Softw. 11, 1
@@ -1333,7 +1226,7 @@ double
anl_init_selection_state(int n)
{
/* Initial value of W (for use when Algorithm Z is first applied) */
- return exp(-log(anl_random_fract()) / n);
+ return exp(-log(sampler_random_fract()) / n);
}
double
@@ -1348,7 +1241,7 @@ anl_get_next_S(double t, int n, double *stateptr)
double V,
quot;
- V = anl_random_fract(); /* Generate V */
+ V = sampler_random_fract(); /* Generate V */
S = 0;
t += 1;
/* Note: "num" in Vitter's code is always equal to t - n */
@@ -1380,7 +1273,7 @@ anl_get_next_S(double t, int n, double *stateptr)
tmp;
/* Generate U and X */
- U = anl_random_fract();
+ U = sampler_random_fract();
X = t * (W - 1.0);
S = floor(X); /* S is tentatively set to floor(X) */
/* Test if U <= h(S)/cg(X) in the manner of (6.3) */
@@ -1409,7 +1302,7 @@ anl_get_next_S(double t, int n, double *stateptr)
y *= numer / denom;
denom -= 1;
}
- W = exp(-log(anl_random_fract()) / n); /* Generate W in advance */
+ W = exp(-log(sampler_random_fract()) / n); /* Generate W in advance */
if (exp(log(y) / n) <= (t + X) / t)
break;
}
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 064f880..d5d703d 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -724,6 +724,7 @@ ExplainPreScanNode(PlanState *planstate, Bitmapset **rels_used)
case T_WorkTableScan:
case T_ForeignScan:
case T_CustomScan:
+ case T_SampleScan:
*rels_used = bms_add_member(*rels_used,
((Scan *) plan)->scanrelid);
break;
@@ -950,6 +951,9 @@ ExplainNode(PlanState *planstate, List *ancestors,
else
pname = sname;
break;
+ case T_SampleScan:
+ pname = sname = "Sample Scan";
+ break;
case T_Material:
pname = sname = "Materialize";
break;
@@ -1067,6 +1071,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
case T_WorkTableScan:
case T_ForeignScan:
case T_CustomScan:
+ case T_SampleScan:
ExplainScanTarget((Scan *) plan, es);
break;
case T_IndexScan:
@@ -1319,6 +1324,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
case T_CteScan:
case T_WorkTableScan:
case T_SubqueryScan:
+ case T_SampleScan:
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
@@ -2147,6 +2153,7 @@ ExplainTargetRel(Plan *plan, Index rti, ExplainState *es)
case T_TidScan:
case T_ForeignScan:
case T_CustomScan:
+ case T_SampleScan:
case T_ModifyTable:
/* Assert it's on a real relation */
Assert(rte->rtekind == RTE_RELATION);
diff --git a/src/backend/executor/Makefile b/src/backend/executor/Makefile
index af707b0..75f799c 100644
--- a/src/backend/executor/Makefile
+++ b/src/backend/executor/Makefile
@@ -21,7 +21,7 @@ OBJS = execAmi.o execCurrent.o execGrouping.o execJunk.o execMain.o \
nodeLimit.o nodeLockRows.o \
nodeMaterial.o nodeMergeAppend.o nodeMergejoin.o nodeModifyTable.o \
nodeNestloop.o nodeFunctionscan.o nodeRecursiveunion.o nodeResult.o \
- nodeSeqscan.o nodeSetOp.o nodeSort.o nodeUnique.o \
+ nodeSamplescan.o nodeSeqscan.o nodeSetOp.o nodeSort.o nodeUnique.o \
nodeValuesscan.o nodeCtescan.o nodeWorktablescan.o \
nodeGroup.o nodeSubplan.o nodeSubqueryscan.o nodeTidscan.o \
nodeForeignscan.o nodeWindowAgg.o tstoreReceiver.o spi.o
diff --git a/src/backend/executor/execAmi.c b/src/backend/executor/execAmi.c
index 7027d7f..1826059 100644
--- a/src/backend/executor/execAmi.c
+++ b/src/backend/executor/execAmi.c
@@ -39,6 +39,7 @@
#include "executor/nodeNestloop.h"
#include "executor/nodeRecursiveunion.h"
#include "executor/nodeResult.h"
+#include "executor/nodeSamplescan.h"
#include "executor/nodeSeqscan.h"
#include "executor/nodeSetOp.h"
#include "executor/nodeSort.h"
@@ -155,6 +156,10 @@ ExecReScan(PlanState *node)
ExecReScanSeqScan((SeqScanState *) node);
break;
+ case T_SampleScanState:
+ ExecReScanSampleScan((SampleScanState *) node);
+ break;
+
case T_IndexScanState:
ExecReScanIndexScan((IndexScanState *) node);
break;
@@ -480,6 +485,9 @@ ExecSupportsBackwardScan(Plan *node)
}
return false;
+ case T_SampleScan:
+ return false;
+
case T_Material:
case T_Sort:
/* these don't evaluate tlist */
diff --git a/src/backend/executor/execCurrent.c b/src/backend/executor/execCurrent.c
index d5079ef..613f799 100644
--- a/src/backend/executor/execCurrent.c
+++ b/src/backend/executor/execCurrent.c
@@ -261,6 +261,7 @@ search_plan_tree(PlanState *node, Oid table_oid)
* Relation scan nodes can all be treated alike
*/
case T_SeqScanState:
+ case T_SampleScanState:
case T_IndexScanState:
case T_IndexOnlyScanState:
case T_BitmapHeapScanState:
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index e27c062..a1cba97 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -102,6 +102,7 @@
#include "executor/nodeNestloop.h"
#include "executor/nodeRecursiveunion.h"
#include "executor/nodeResult.h"
+#include "executor/nodeSamplescan.h"
#include "executor/nodeSeqscan.h"
#include "executor/nodeSetOp.h"
#include "executor/nodeSort.h"
@@ -190,6 +191,11 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
estate, eflags);
break;
+ case T_SampleScan:
+ result = (PlanState *) ExecInitSampleScan((SampleScan *) node,
+ estate, eflags);
+ break;
+
case T_IndexScan:
result = (PlanState *) ExecInitIndexScan((IndexScan *) node,
estate, eflags);
@@ -406,6 +412,10 @@ ExecProcNode(PlanState *node)
result = ExecSeqScan((SeqScanState *) node);
break;
+ case T_SampleScanState:
+ result = ExecSampleScan((SampleScanState *) node);
+ break;
+
case T_IndexScanState:
result = ExecIndexScan((IndexScanState *) node);
break;
@@ -644,6 +654,10 @@ ExecEndNode(PlanState *node)
ExecEndSeqScan((SeqScanState *) node);
break;
+ case T_SampleScanState:
+ ExecEndSampleScan((SampleScanState *) node);
+ break;
+
case T_IndexScanState:
ExecEndIndexScan((IndexScanState *) node);
break;
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
new file mode 100644
index 0000000..13af326
--- /dev/null
+++ b/src/backend/executor/nodeSamplescan.c
@@ -0,0 +1,405 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeSamplescan.c
+ * Support routines for sample scans of relations (table sampling).
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ * src/backend/executor/nodeSamplescan.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "catalog/pg_tablesamplemethod.h"
+#include "executor/executor.h"
+#include "access/relscan.h"
+#include "executor/nodeSamplescan.h"
+#include "parser/parsetree.h"
+#include "storage/bufmgr.h"
+#include "utils/rel.h"
+#include "utils/syscache.h"
+#include "utils/tqual.h"
+
+static void InitScanRelation(SampleScanState *node, EState *estate, int eflags);
+static TupleTableSlot *SampleNext(SampleScanState *node);
+
+/*
+ * Initialize the sampling method - loads function info and
+ * calls the tsminit function.
+ *
+ * We need special handling for this because the tsminit function
+ * is allowed to take optional additional arguments.
+ */
+static void
+InitSamplingMethod(SampleScanState *scanstate, TableSampleClause *tablesample)
+{
+ FunctionCallInfoData fcinfo;
+ int i;
+ List *args = tablesample->args;
+ ListCell *arg;
+ ExprContext *econtext = scanstate->ss.ps.ps_ExprContext;
+
+ /* Load functions */
+ fmgr_info(tablesample->tsminit, &(scanstate->tsminit));
+ fmgr_info(tablesample->tsmnextblock, &(scanstate->tsmnextblock));
+ fmgr_info(tablesample->tsmnexttuple, &(scanstate->tsmnexttuple));
+ fmgr_info(tablesample->tsmend, &(scanstate->tsmend));
+ fmgr_info(tablesample->tsmreset, &(scanstate->tsmreset));
+
+ InitFunctionCallInfoData(fcinfo, &scanstate->tsminit,
+ list_length(args) + 2,
+ InvalidOid, NULL, NULL);
+
+ /* First arg is always SampleScanState */
+ fcinfo.arg[0] = PointerGetDatum(scanstate);
+ fcinfo.argnull[0] = false;
+
+ /*
+ * Second arg is always REPEATABLE
+ * When tablesample->repeatable is NULL then REPEATABLE clause was not
+ * specified.
+ * When specified, the expression cannot evaluate to NULL.
+ */
+ if (tablesample->repeatable)
+ {
+ ExprState *argstate = ExecInitExpr((Expr *) tablesample->repeatable,
+ (PlanState *) scanstate);
+ fcinfo.arg[1] = ExecEvalExpr(argstate, econtext,
+ &fcinfo.argnull[1], NULL);
+ if (fcinfo.argnull[1])
+ ereport(ERROR,
+ (errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED),
+ errmsg("REPEATABLE clause cannot be NULL")));
+ }
+ else
+ {
+ fcinfo.arg[1] = UInt32GetDatum(random());
+ fcinfo.argnull[1] = false;
+ }
+
+
+ i = 2;
+ foreach(arg, args)
+ {
+ Expr *argexpr = (Expr *) lfirst(arg);
+ ExprState *argstate = ExecInitExpr(argexpr, (PlanState *) scanstate);
+
+ if (argstate == NULL)
+ {
+ fcinfo.argnull[i] = true;
+ fcinfo.arg[i] = (Datum) 0;;
+ }
+
+ fcinfo.arg[i] = ExecEvalExpr(argstate, econtext,
+ &fcinfo.argnull[i], NULL);
+ i++;
+ }
+ Assert(i == fcinfo.nargs);
+
+ (void) FunctionCallInvoke(&fcinfo);
+}
+
+
+/* ----------------------------------------------------------------
+ * Scan Support
+ * ----------------------------------------------------------------
+ */
+
+/* ----------------------------------------------------------------
+ * SampleNext
+ *
+ * This is a workhorse for ExecSampleScan
+ * ----------------------------------------------------------------
+ */
+static TupleTableSlot *
+SampleNext(SampleScanState *node)
+{
+ EState *estate;
+ TupleTableSlot *slot;
+ BlockNumber blockno = InvalidBlockNumber;
+ Snapshot snapshot;
+ Relation relation;
+ bool found = false;
+ bool retry = false;
+ OffsetNumber tupoffset, maxoffset;
+ Buffer buffer;
+ Page page;
+ HeapTuple tuple = &(node->tup);
+
+ /*
+ * get information from the estate and scan state
+ */
+ estate = node->ss.ps.state;
+ snapshot = estate->es_snapshot;
+ slot = node->ss.ss_ScanTupleSlot;
+ relation = node->ss.ss_currentRelation;
+ buffer = node->openbuffer;
+
+ if (BufferIsValid(buffer))
+ {
+ blockno = BufferGetBlockNumber(buffer);
+ page = BufferGetPage(buffer);
+ maxoffset = PageGetMaxOffsetNumber(page);
+ }
+
+ /*
+ * get the next tuple from the table
+ */
+ for (;;)
+ {
+ ItemId itemid;
+
+ /* Load next block if needed. */
+ if (!BufferIsValid(buffer))
+ {
+ blockno = DatumGetInt32(FunctionCall2(&node->tsmnextblock,
+ PointerGetDatum(node),
+ BoolGetDatum(retry)));
+ /* No more blocks to fetch */
+ if (!BlockNumberIsValid(blockno))
+ break;
+
+ buffer = ReadBufferExtended(relation, MAIN_FORKNUM, blockno,
+ RBM_NORMAL, NULL);
+ LockBuffer(buffer, BUFFER_LOCK_SHARE);
+
+ node->openbuffer = buffer;
+ page = BufferGetPage(buffer);
+ maxoffset = PageGetMaxOffsetNumber(page);
+ }
+
+ tupoffset = DatumGetUInt16(FunctionCall4(&node->tsmnexttuple,
+ PointerGetDatum(node),
+ UInt32GetDatum(blockno),
+ UInt16GetDatum(maxoffset),
+ BoolGetDatum(retry)));
+ /* Go to next block. */
+ if (!OffsetNumberIsValid(tupoffset))
+ {
+ UnlockReleaseBuffer(buffer);
+ node->openbuffer = buffer = InvalidBuffer;
+ continue;
+ }
+ retry = true;
+
+ /* Skip invalid tuple pointers. */
+ itemid = PageGetItemId(page, tupoffset);
+ if (!ItemIdIsNormal(itemid))
+ continue;
+
+ tuple->t_tableOid = RelationGetRelid(relation);
+ tuple->t_data = (HeapTupleHeader) PageGetItem(page, itemid);
+ tuple->t_len = ItemIdGetLength(itemid);
+ ItemPointerSet(&tuple->t_self, blockno, tupoffset);
+
+ /* Found visible tuple, return it. */
+ if (HeapTupleSatisfiesVisibility(tuple, snapshot, buffer))
+ {
+ found = true;
+ break;
+ }
+ }
+
+ if (found)
+ ExecStoreTuple(tuple, /* tuple to store */
+ slot, /* slot to store in */
+ buffer, /* buffer associated with this tuple */
+ false); /* don't pfree this pointer */
+ else
+ ExecClearTuple(slot);
+
+ return slot;
+}
+
+/*
+ * SampleRecheck -- access method routine to recheck a tuple in EvalPlanQual
+ */
+static bool
+SampleRecheck(SampleScanState *node, TupleTableSlot *slot)
+{
+ /* No need to recheck for SampleScan */
+ return true;
+}
+
+/* ----------------------------------------------------------------
+ * ExecSampleScan(node)
+ *
+ * Scans the relation sequentially and returns the next qualifying
+ * tuple while calling the sampling method functions.
+ * We call the ExecScan() routine and pass it the appropriate
+ * access method functions.
+ * ----------------------------------------------------------------
+ */
+TupleTableSlot *
+ExecSampleScan(SampleScanState *node)
+{
+ return ExecScan((ScanState *) node,
+ (ExecScanAccessMtd) SampleNext,
+ (ExecScanRecheckMtd) SampleRecheck);
+}
+
+/* ----------------------------------------------------------------
+ * InitScanRelation
+ *
+ * Set up to access the scan relation.
+ * ----------------------------------------------------------------
+ */
+static void
+InitScanRelation(SampleScanState *node, EState *estate, int eflags)
+{
+ Relation currentRelation;
+
+ /*
+ * get the relation object id from the relid'th entry in the range table,
+ * open that relation and acquire appropriate lock on it.
+ */
+ currentRelation = ExecOpenScanRelation(estate,
+ ((SampleScan *) node->ss.ps.plan)->scanrelid,
+ eflags);
+
+ node->ss.ss_currentRelation = currentRelation;
+ node->ss.ss_currentScanDesc = NULL;
+
+ /* and report the scan tuple slot's rowtype */
+ ExecAssignScanType(&node->ss, RelationGetDescr(currentRelation));
+}
+
+
+/* ----------------------------------------------------------------
+ * ExecInitSampleScan
+ * ----------------------------------------------------------------
+ */
+SampleScanState *
+ExecInitSampleScan(SampleScan *node, EState *estate, int eflags)
+{
+ SampleScanState *scanstate;
+ RangeTblEntry *rte = rt_fetch(node->scanrelid,
+ estate->es_range_table);
+
+ /*
+ * Once upon a time it was possible to have an outerPlan of a SanpleScan, but
+ * not any more.
+ */
+ Assert(outerPlan(node) == NULL);
+ Assert(innerPlan(node) == NULL);
+ Assert(rte->tablesample != NULL);
+
+ /*
+ * create state structure
+ */
+ scanstate = makeNode(SampleScanState);
+ scanstate->ss.ps.plan = (Plan *) node;
+ scanstate->ss.ps.state = estate;
+
+ /*
+ * Miscellaneous initialization
+ *
+ * create expression context for node
+ */
+ ExecAssignExprContext(estate, &scanstate->ss.ps);
+
+ /*
+ * initialize child expressions
+ */
+ scanstate->ss.ps.targetlist = (List *)
+ ExecInitExpr((Expr *) node->plan.targetlist,
+ (PlanState *) scanstate);
+ scanstate->ss.ps.qual = (List *)
+ ExecInitExpr((Expr *) node->plan.qual,
+ (PlanState *) scanstate);
+
+ /*
+ * tuple table initialization
+ */
+ ExecInitResultTupleSlot(estate, &scanstate->ss.ps);
+ ExecInitScanTupleSlot(estate, &scanstate->ss);
+
+ /*
+ * initialize scan relation
+ */
+ InitScanRelation(scanstate, estate, eflags);
+
+ scanstate->ss.ps.ps_TupFromTlist = false;
+
+ /*
+ * Initialize result tuple type and projection info.
+ */
+ ExecAssignResultTypeFromTL(&scanstate->ss.ps);
+ ExecAssignScanProjectionInfo(&scanstate->ss);
+
+ scanstate->openbuffer = InvalidBuffer;
+
+ InitSamplingMethod(scanstate, rte->tablesample);
+
+ return scanstate;
+}
+
+/* ----------------------------------------------------------------
+ * ExecEndSampleScan
+ *
+ * frees any storage allocated through C routines.
+ * ----------------------------------------------------------------
+ */
+void
+ExecEndSampleScan(SampleScanState *node)
+{
+ /*
+ * Tell sampling function that we finished thes can.
+ */
+ FunctionCall1(&node->tsmend, PointerGetDatum(node));
+
+ if (BufferIsValid(node->openbuffer))
+ {
+ UnlockReleaseBuffer(node->openbuffer);
+ node->openbuffer = InvalidBuffer;
+ }
+
+ /*
+ * Free the exprcontext
+ */
+ ExecFreeExprContext(&node->ss.ps);
+
+ /*
+ * clean out the tuple table
+ */
+ ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
+
+ /*
+ * close the heap relation.
+ */
+ ExecCloseScanRelation(node->ss.ss_currentRelation);
+}
+
+/* ----------------------------------------------------------------
+ * Join Support
+ * ----------------------------------------------------------------
+ */
+
+/* ----------------------------------------------------------------
+ * ExecReScanSampleScan
+ *
+ * Rescans the relation.
+ *
+ * ----------------------------------------------------------------
+ */
+void
+ExecReScanSampleScan(SampleScanState *node)
+{
+ if (BufferIsValid(node->openbuffer))
+ {
+ UnlockReleaseBuffer(node->openbuffer);
+ node->openbuffer = InvalidBuffer;
+ }
+
+ /*
+ * Tell sampling function to reset its state for rescan.
+ */
+ FunctionCall1(&node->tsmreset, PointerGetDatum(node));
+
+ ExecReScan((PlanState *) node);
+}
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 491e4db..d69cc4e 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -628,6 +628,22 @@ _copyCustomScan(const CustomScan *from)
}
/*
+ * _copySampleScan
+ */
+static SampleScan *
+_copySampleScan(const SampleScan *from)
+{
+ SampleScan *newnode = makeNode(SampleScan);
+
+ /*
+ * copy node superclass fields
+ */
+ CopyScanFields((const Scan *) from, (Scan *) newnode);
+
+ return newnode;
+}
+
+/*
* CopyJoinFields
*
* This function copies the fields of the Join node. It is used by
@@ -2006,6 +2022,7 @@ _copyRangeTblEntry(const RangeTblEntry *from)
COPY_SCALAR_FIELD(rtekind);
COPY_SCALAR_FIELD(relid);
COPY_SCALAR_FIELD(relkind);
+ COPY_NODE_FIELD(tablesample);
COPY_NODE_FIELD(subquery);
COPY_SCALAR_FIELD(security_barrier);
COPY_SCALAR_FIELD(jointype);
@@ -2138,6 +2155,37 @@ _copyCommonTableExpr(const CommonTableExpr *from)
return newnode;
}
+static RangeTableSample *
+_copyRangeTableSample(const RangeTableSample *from)
+{
+ RangeTableSample *newnode = makeNode(RangeTableSample);
+
+ COPY_NODE_FIELD(relation);
+ COPY_STRING_FIELD(method);
+ COPY_NODE_FIELD(repeatable);
+ COPY_NODE_FIELD(args);
+
+ return newnode;
+}
+
+static TableSampleClause *
+_copyTableSampleClause(const TableSampleClause *from)
+{
+ TableSampleClause *newnode = makeNode(TableSampleClause);
+
+ COPY_SCALAR_FIELD(tsmid);
+ COPY_SCALAR_FIELD(tsminit);
+ COPY_SCALAR_FIELD(tsmnextblock);
+ COPY_SCALAR_FIELD(tsmnexttuple);
+ COPY_SCALAR_FIELD(tsmend);
+ COPY_SCALAR_FIELD(tsmreset);
+ COPY_SCALAR_FIELD(tsmcost);
+ COPY_NODE_FIELD(repeatable);
+ COPY_NODE_FIELD(args);
+
+ return newnode;
+}
+
static A_Expr *
_copyAExpr(const A_Expr *from)
{
@@ -4077,6 +4125,9 @@ copyObject(const void *from)
case T_CustomScan:
retval = _copyCustomScan(from);
break;
+ case T_SampleScan:
+ retval = _copySampleScan(from);
+ break;
case T_Join:
retval = _copyJoin(from);
break;
@@ -4725,6 +4776,12 @@ copyObject(const void *from)
case T_CommonTableExpr:
retval = _copyCommonTableExpr(from);
break;
+ case T_RangeTableSample:
+ retval = _copyRangeTableSample(from);
+ break;
+ case T_TableSampleClause:
+ retval = _copyTableSampleClause(from);
+ break;
case T_PrivGrantee:
retval = _copyPrivGrantee(from);
break;
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 0803674..83c5a25 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -2325,6 +2325,7 @@ _equalRangeTblEntry(const RangeTblEntry *a, const RangeTblEntry *b)
COMPARE_SCALAR_FIELD(rtekind);
COMPARE_SCALAR_FIELD(relid);
COMPARE_SCALAR_FIELD(relkind);
+ COMPARE_NODE_FIELD(tablesample);
COMPARE_NODE_FIELD(subquery);
COMPARE_SCALAR_FIELD(security_barrier);
COMPARE_SCALAR_FIELD(jointype);
@@ -2444,6 +2445,33 @@ _equalCommonTableExpr(const CommonTableExpr *a, const CommonTableExpr *b)
}
static bool
+_equalRangeTableSample(const RangeTableSample *a, const RangeTableSample *b)
+{
+ COMPARE_NODE_FIELD(relation);
+ COMPARE_STRING_FIELD(method);
+ COMPARE_NODE_FIELD(repeatable);
+ COMPARE_NODE_FIELD(args);
+
+ return true;
+}
+
+static bool
+_equalTableSampleClause(const TableSampleClause *a, const TableSampleClause *b)
+{
+ COMPARE_SCALAR_FIELD(tsmid);
+ COMPARE_SCALAR_FIELD(tsminit);
+ COMPARE_SCALAR_FIELD(tsmnextblock);
+ COMPARE_SCALAR_FIELD(tsmnexttuple);
+ COMPARE_SCALAR_FIELD(tsmend);
+ COMPARE_SCALAR_FIELD(tsmreset);
+ COMPARE_SCALAR_FIELD(tsmcost);
+ COMPARE_NODE_FIELD(repeatable);
+ COMPARE_NODE_FIELD(args);
+
+ return true;
+}
+
+static bool
_equalXmlSerialize(const XmlSerialize *a, const XmlSerialize *b)
{
COMPARE_SCALAR_FIELD(xmloption);
@@ -3152,6 +3180,12 @@ equal(const void *a, const void *b)
case T_CommonTableExpr:
retval = _equalCommonTableExpr(a, b);
break;
+ case T_RangeTableSample:
+ retval = _equalRangeTableSample(a, b);
+ break;
+ case T_TableSampleClause:
+ retval = _equalTableSampleClause(a, b);
+ break;
case T_PrivGrantee:
retval = _equalPrivGrantee(a, b);
break;
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index ae857a0..66b40dc 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -3209,6 +3209,18 @@ raw_expression_tree_walker(Node *node,
return walker(((WithClause *) node)->ctes, context);
case T_CommonTableExpr:
return walker(((CommonTableExpr *) node)->ctequery, context);
+ case T_RangeTableSample:
+ {
+ RangeTableSample *rts = (RangeTableSample *) node;
+
+ if (walker(rts->relation, context))
+ return true;
+ if (walker(rts->repeatable, context))
+ return true;
+ if (walker(rts->args, context))
+ return true;
+ }
+ break;
default:
elog(ERROR, "unrecognized node type: %d",
(int) nodeTag(node));
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index e3e29f5..7018512 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -578,6 +578,14 @@ _outCustomScan(StringInfo str, const CustomScan *node)
}
static void
+_outSampleScan(StringInfo str, const SampleScan *node)
+{
+ WRITE_NODE_TYPE("SAMPLESCAN");
+
+ _outScanInfo(str, (const Scan *) node);
+}
+
+static void
_outJoin(StringInfo str, const Join *node)
{
WRITE_NODE_TYPE("JOIN");
@@ -1589,6 +1597,17 @@ _outTidPath(StringInfo str, const TidPath *node)
}
static void
+_outSamplePath(StringInfo str, const SamplePath *node)
+{
+ WRITE_NODE_TYPE("SAMPLEPATH");
+
+ _outPathInfo(str, (const Path *) node);
+
+ WRITE_OID_FIELD(tsmcost);
+ WRITE_NODE_FIELD(tsmargs);
+}
+
+static void
_outForeignPath(StringInfo str, const ForeignPath *node)
{
WRITE_NODE_TYPE("FOREIGNPATH");
@@ -2391,6 +2410,33 @@ _outCommonTableExpr(StringInfo str, const CommonTableExpr *node)
}
static void
+_outRangeTableSample(StringInfo str, const RangeTableSample *node)
+{
+ WRITE_NODE_TYPE("RANGETABLESAMPLE");
+
+ WRITE_NODE_FIELD(relation);
+ WRITE_STRING_FIELD(method);
+ WRITE_NODE_FIELD(repeatable);
+ WRITE_NODE_FIELD(args);
+}
+
+static void
+_outTableSampleClause(StringInfo str, const TableSampleClause *node)
+{
+ WRITE_NODE_TYPE("TABLESAMPLECLAUSE");
+
+ WRITE_OID_FIELD(tsmid);
+ WRITE_OID_FIELD(tsminit);
+ WRITE_OID_FIELD(tsmnextblock);
+ WRITE_OID_FIELD(tsmnexttuple);
+ WRITE_OID_FIELD(tsmend);
+ WRITE_OID_FIELD(tsmreset);
+ WRITE_OID_FIELD(tsmcost);
+ WRITE_NODE_FIELD(repeatable);
+ WRITE_NODE_FIELD(args);
+}
+
+static void
_outSetOperationStmt(StringInfo str, const SetOperationStmt *node)
{
WRITE_NODE_TYPE("SETOPERATIONSTMT");
@@ -2420,6 +2466,7 @@ _outRangeTblEntry(StringInfo str, const RangeTblEntry *node)
case RTE_RELATION:
WRITE_OID_FIELD(relid);
WRITE_CHAR_FIELD(relkind);
+ WRITE_NODE_FIELD(tablesample);
break;
case RTE_SUBQUERY:
WRITE_NODE_FIELD(subquery);
@@ -2887,6 +2934,9 @@ _outNode(StringInfo str, const void *obj)
case T_CustomScan:
_outCustomScan(str, obj);
break;
+ case T_SampleScan:
+ _outSampleScan(str, obj);
+ break;
case T_Join:
_outJoin(str, obj);
break;
@@ -3092,6 +3142,8 @@ _outNode(StringInfo str, const void *obj)
case T_TidPath:
_outTidPath(str, obj);
break;
+ case T_SamplePath:
+ _outSamplePath(str, obj);
case T_ForeignPath:
_outForeignPath(str, obj);
break;
@@ -3228,6 +3280,12 @@ _outNode(StringInfo str, const void *obj)
case T_CommonTableExpr:
_outCommonTableExpr(str, obj);
break;
+ case T_RangeTableSample:
+ _outRangeTableSample(str, obj);
+ break;
+ case T_TableSampleClause:
+ _outTableSampleClause(str, obj);
+ break;
case T_SetOperationStmt:
_outSetOperationStmt(str, obj);
break;
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index a3efdd4..3a510dd 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -350,6 +350,43 @@ _readCommonTableExpr(void)
}
/*
+ * _readRangeTableSample
+ */
+static RangeTableSample *
+_readRangeTableSample(void)
+{
+ READ_LOCALS(RangeTableSample);
+
+ READ_NODE_FIELD(relation);
+ READ_STRING_FIELD(method);
+ READ_NODE_FIELD(repeatable);
+ READ_NODE_FIELD(args);
+
+ READ_DONE();
+}
+
+/*
+ * _readTableSampleClause
+ */
+static TableSampleClause *
+_readTableSampleClause(void)
+{
+ READ_LOCALS(TableSampleClause);
+
+ READ_OID_FIELD(tsmid);
+ READ_OID_FIELD(tsminit);
+ READ_OID_FIELD(tsmnextblock);
+ READ_OID_FIELD(tsmnexttuple);
+ READ_OID_FIELD(tsmend);
+ READ_OID_FIELD(tsmreset);
+ READ_OID_FIELD(tsmcost);
+ READ_NODE_FIELD(repeatable);
+ READ_NODE_FIELD(args);
+
+ READ_DONE();
+}
+
+/*
* _readSetOperationStmt
*/
static SetOperationStmt *
@@ -1216,6 +1253,7 @@ _readRangeTblEntry(void)
case RTE_RELATION:
READ_OID_FIELD(relid);
READ_CHAR_FIELD(relkind);
+ READ_NODE_FIELD(tablesample);
break;
case RTE_SUBQUERY:
READ_NODE_FIELD(subquery);
@@ -1311,6 +1349,10 @@ parseNodeString(void)
return_value = _readRowMarkClause();
else if (MATCH("COMMONTABLEEXPR", 15))
return_value = _readCommonTableExpr();
+ else if (MATCH("RANGETABLESAMPLE", 16))
+ return_value = _readRangeTableSample();
+ else if (MATCH("TABLESAMPLECLAUSE", 17))
+ return_value = _readTableSampleClause();
else if (MATCH("SETOPERATIONSTMT", 16))
return_value = _readSetOperationStmt();
else if (MATCH("ALIAS", 5))
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 449fdc3..dffab29 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -71,6 +71,8 @@ static void set_plain_rel_size(PlannerInfo *root, RelOptInfo *rel,
RangeTblEntry *rte);
static void set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
RangeTblEntry *rte);
+static void set_tablesample_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
+ RangeTblEntry *rte);
static void set_foreign_size(PlannerInfo *root, RelOptInfo *rel,
RangeTblEntry *rte);
static void set_foreign_pathlist(PlannerInfo *root, RelOptInfo *rel,
@@ -332,6 +334,11 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
/* Foreign table */
set_foreign_pathlist(root, rel, rte);
}
+ else if (rte->tablesample != NULL)
+ {
+ /* Build sample scan on relation */
+ set_tablesample_rel_pathlist(root, rel, rte);
+ }
else
{
/* Plain relation */
@@ -418,6 +425,34 @@ set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
}
/*
+ * set_tablesample_rel_pathlist
+ * Build access paths for a sampled relation
+ *
+ * There is only one possible path - sampling scan
+ */
+static void
+set_tablesample_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
+{
+ Relids required_outer;
+
+ /*
+ * We don't support pushing join clauses into the quals of a seqscan, but
+ * it could still have required parameterization due to LATERAL refs in
+ * its tlist.
+ */
+ required_outer = rel->lateral_relids;
+
+ /* We only do sample scan if it was requested */
+ add_path(rel, (Path *) create_samplescan_path(root, rel, required_outer));
+
+ /*
+ * There is only one plan to consider but we still need to set
+ * parameters for RelOptInfo.
+ */
+ set_cheapest(rel);
+}
+
+/*
* set_foreign_size
* Set size estimates for a foreign table RTE
*/
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 659daa2..d0741f0 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -88,6 +88,7 @@
#include "optimizer/restrictinfo.h"
#include "parser/parsetree.h"
#include "utils/lsyscache.h"
+#include "utils/sampling.h"
#include "utils/selfuncs.h"
#include "utils/spccache.h"
#include "utils/tuplesort.h"
@@ -219,6 +220,72 @@ cost_seqscan(Path *path, PlannerInfo *root,
}
/*
+ * cost_samplescan
+ * Determines and returns the cost of scanning a relation using sampling.
+ *
+ * From planner/optimizer perspective, we dont't care all that much about cost
+ * itself since there is always only one scan path to consider when sampling
+ * scan is present, but number of rows estimation is still important.
+ *
+ * 'baserel' is the relation to be scanned
+ * 'param_info' is the ParamPathInfo if this is a parameterized path, else NULL
+ */
+void
+cost_samplescan(SamplePath *path, PlannerInfo *root, RelOptInfo *baserel)
+{
+ Cost startup_cost = 0;
+ Cost run_cost = 0;
+ double spc_seq_page_cost,
+ spc_random_page_cost,
+ spc_page_cost;
+ QualCost qpqual_cost;
+ Cost cpu_per_tuple;
+ BlockNumber pages;
+ double tuples;
+ SamplerAccessStrategy strategy;
+
+ /* Should only be applied to base relations */
+ Assert(baserel->relid > 0);
+ Assert(baserel->rtekind == RTE_RELATION);
+
+ /* Call the sampling method's costing function. */
+ OidFunctionCall6(path->tsmcost, PointerGetDatum(root),
+ PointerGetDatum(path), PointerGetDatum(baserel),
+ PointerGetDatum(&pages), PointerGetDatum(&tuples),
+ PointerGetDatum(&strategy));
+
+ /* Mark the path with the correct row estimate */
+ if (path->path.param_info)
+ path->path.rows = path->path.param_info->ppi_rows;
+ else
+ path->path.rows = tuples;
+
+ /* fetch estimated page cost for tablespace containing table */
+ get_tablespace_page_costs(baserel->reltablespace,
+ &spc_random_page_cost,
+ &spc_seq_page_cost);
+
+
+ spc_page_cost = strategy == SAS_RANDOM ?
+ spc_random_page_cost : spc_seq_page_cost;
+
+ /*
+ * disk costs
+ */
+ run_cost += spc_page_cost * pages;
+
+ /* CPU costs */
+ get_restriction_qual_cost(root, baserel, path->path.param_info, &qpqual_cost);
+
+ startup_cost += qpqual_cost.startup;
+ cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple;
+ run_cost += cpu_per_tuple * tuples;
+
+ path->path.startup_cost = startup_cost;
+ path->path.total_cost = startup_cost + run_cost;
+}
+
+/*
* cost_index
* Determines and returns the cost of scanning a relation using an index.
*
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 8f9ae4f..1056885 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -58,6 +58,8 @@ static Material *create_material_plan(PlannerInfo *root, MaterialPath *best_path
static Plan *create_unique_plan(PlannerInfo *root, UniquePath *best_path);
static SeqScan *create_seqscan_plan(PlannerInfo *root, Path *best_path,
List *tlist, List *scan_clauses);
+static SampleScan *create_samplescan_plan(PlannerInfo *root, Path *best_path,
+ List *tlist, List *scan_clauses);
static Scan *create_indexscan_plan(PlannerInfo *root, IndexPath *best_path,
List *tlist, List *scan_clauses, bool indexonly);
static BitmapHeapScan *create_bitmap_scan_plan(PlannerInfo *root,
@@ -100,6 +102,7 @@ static List *order_qual_clauses(PlannerInfo *root, List *clauses);
static void copy_path_costsize(Plan *dest, Path *src);
static void copy_plan_costsize(Plan *dest, Plan *src);
static SeqScan *make_seqscan(List *qptlist, List *qpqual, Index scanrelid);
+static SampleScan *make_samplescan(List *qptlist, List *qpqual, Index scanrelid);
static IndexScan *make_indexscan(List *qptlist, List *qpqual, Index scanrelid,
Oid indexid, List *indexqual, List *indexqualorig,
List *indexorderby, List *indexorderbyorig,
@@ -228,6 +231,7 @@ create_plan_recurse(PlannerInfo *root, Path *best_path)
switch (best_path->pathtype)
{
case T_SeqScan:
+ case T_SampleScan:
case T_IndexScan:
case T_IndexOnlyScan:
case T_BitmapHeapScan:
@@ -343,6 +347,13 @@ create_scan_plan(PlannerInfo *root, Path *best_path)
scan_clauses);
break;
+ case T_SampleScan:
+ plan = (Plan *) create_samplescan_plan(root,
+ best_path,
+ tlist,
+ scan_clauses);
+ break;
+
case T_IndexScan:
plan = (Plan *) create_indexscan_plan(root,
(IndexPath *) best_path,
@@ -546,6 +557,7 @@ disuse_physical_tlist(PlannerInfo *root, Plan *plan, Path *path)
switch (path->pathtype)
{
case T_SeqScan:
+ case T_SampleScan:
case T_IndexScan:
case T_IndexOnlyScan:
case T_BitmapHeapScan:
@@ -1133,6 +1145,45 @@ create_seqscan_plan(PlannerInfo *root, Path *best_path,
}
/*
+ * create_samplescan_plan
+ * Returns a samplecan plan for the base relation scanned by 'best_path'
+ * with restriction clauses 'scan_clauses' and targetlist 'tlist'.
+ */
+static SampleScan *
+create_samplescan_plan(PlannerInfo *root, Path *best_path,
+ List *tlist, List *scan_clauses)
+{
+ SampleScan *scan_plan;
+ Index scan_relid = best_path->parent->relid;
+
+ /* it should be a base rel with tablesample clause... */
+ Assert(scan_relid > 0);
+ Assert(best_path->parent->rtekind == RTE_RELATION);
+ Assert(best_path->pathtype == T_SampleScan);
+
+ /* Sort clauses into best execution order */
+ scan_clauses = order_qual_clauses(root, scan_clauses);
+
+ /* Reduce RestrictInfo list to bare expressions; ignore pseudoconstants */
+ scan_clauses = extract_actual_clauses(scan_clauses, false);
+
+ /* Replace any outer-relation variables with nestloop params */
+ if (best_path->param_info)
+ {
+ scan_clauses = (List *)
+ replace_nestloop_params(root, (Node *) scan_clauses);
+ }
+
+ scan_plan = make_samplescan(tlist,
+ scan_clauses,
+ scan_relid);
+
+ copy_path_costsize(&scan_plan->plan, best_path);
+
+ return scan_plan;
+}
+
+/*
* create_indexscan_plan
* Returns an indexscan plan for the base relation scanned by 'best_path'
* with restriction clauses 'scan_clauses' and targetlist 'tlist'.
@@ -3318,6 +3369,24 @@ make_seqscan(List *qptlist,
return node;
}
+static SampleScan *
+make_samplescan(List *qptlist,
+ List *qpqual,
+ Index scanrelid)
+{
+ SampleScan *node = makeNode(SampleScan);
+ Plan *plan = &node->plan;
+
+ /* cost should be inserted by caller */
+ plan->targetlist = qptlist;
+ plan->qual = qpqual;
+ plan->lefttree = NULL;
+ plan->righttree = NULL;
+ node->scanrelid = scanrelid;
+
+ return node;
+}
+
static IndexScan *
make_indexscan(List *qptlist,
List *qpqual,
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 4d3fbca..0d78f27 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -446,6 +446,17 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
fix_scan_list(root, splan->plan.qual, rtoffset);
}
break;
+ case T_SampleScan:
+ {
+ SampleScan *splan = (SampleScan *) plan;
+
+ splan->scanrelid += rtoffset;
+ splan->plan.targetlist =
+ fix_scan_list(root, splan->plan.targetlist, rtoffset);
+ splan->plan.qual =
+ fix_scan_list(root, splan->plan.qual, rtoffset);
+ }
+ break;
case T_IndexScan:
{
IndexScan *splan = (IndexScan *) plan;
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index 579d021..7da1a44 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -2163,6 +2163,7 @@ finalize_plan(PlannerInfo *root, Plan *plan, Bitmapset *valid_params,
break;
case T_SeqScan:
+ case T_SampleScan:
context.paramids = bms_add_members(context.paramids, scan_params);
break;
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 319e8b2..3c2c1b8 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -706,6 +706,33 @@ create_seqscan_path(PlannerInfo *root, RelOptInfo *rel, Relids required_outer)
}
/*
+ * create_samplescan_path
+ * Like seqscan but uses sampling function while scanning.
+ */
+SamplePath *
+create_samplescan_path(PlannerInfo *root, RelOptInfo *rel, Relids required_outer)
+{
+ SamplePath *pathnode = makeNode(SamplePath);
+ RangeTblEntry *rte = planner_rt_fetch(rel->relid, root);
+ TableSampleClause *tablesample = rte->tablesample;
+
+ Assert(tablesample);
+
+ pathnode->path.pathtype = T_SampleScan;
+ pathnode->path.parent = rel;
+ pathnode->path.param_info = get_baserel_parampathinfo(root, rel,
+ required_outer);
+ pathnode->path.pathkeys = NIL; /* samplescan has unordered result */
+
+ pathnode->tsmcost = tablesample->tsmcost;
+ pathnode->tsmargs = tablesample->args;
+
+ cost_samplescan(pathnode, root, rel);
+
+ return pathnode;
+}
+
+/*
* create_index_path
* Creates a path node for an index scan.
*
@@ -1921,6 +1948,8 @@ reparameterize_path(PlannerInfo *root, Path *path,
case T_SubqueryScan:
return create_subqueryscan_path(root, rel, path->pathkeys,
required_outer);
+ case T_SampleScan:
+ return (Path *) create_samplescan_path(root, rel, required_outer);
default:
break;
}
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 1f4fe9d..9d0e05f 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -447,6 +447,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
%type <range> relation_expr
%type <range> relation_expr_opt_alias
%type <target> target_el single_set_clause set_target insert_column_item
+%type <node> relation_expr_tablesample tablesample_clause opt_repeatable_clause
%type <str> generic_option_name
%type <node> generic_option_arg
@@ -611,8 +612,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
STATEMENT STATISTICS STDIN STDOUT STORAGE STRICT_P STRIP_P SUBSTRING
SYMMETRIC SYSID SYSTEM_P
- TABLE TABLES TABLESPACE TEMP TEMPLATE TEMPORARY TEXT_P THEN TIME TIMESTAMP
- TO TRAILING TRANSACTION TREAT TRIGGER TRIM TRUE_P
+ TABLE TABLES TABLESAMPLE TABLESPACE TEMP TEMPLATE TEMPORARY TEXT_P THEN
+ TIME TIMESTAMP TO TRAILING TRANSACTION TREAT TRIGGER TRIM TRUE_P
TRUNCATE TRUSTED TYPE_P TYPES_P
UNBOUNDED UNCOMMITTED UNENCRYPTED UNION UNIQUE UNKNOWN UNLISTEN UNLOGGED
@@ -10137,6 +10138,12 @@ table_ref: relation_expr opt_alias_clause
$1->alias = $2;
$$ = (Node *) $1;
}
+ | relation_expr_tablesample opt_alias_clause
+ {
+ RangeTableSample *n = (RangeTableSample *) $1;
+ n->relation->alias = $2;
+ $$ = (Node *) n;
+ }
| func_table func_alias_clause
{
RangeFunction *n = (RangeFunction *) $1;
@@ -10432,7 +10439,6 @@ relation_expr_list:
| relation_expr_list ',' relation_expr { $$ = lappend($1, $3); }
;
-
/*
* Given "UPDATE foo set set ...", we have to decide without looking any
* further ahead whether the first "set" is an alias or the UPDATE's SET
@@ -10462,6 +10468,31 @@ relation_expr_opt_alias: relation_expr %prec UMINUS
}
;
+
+relation_expr_tablesample: relation_expr tablesample_clause
+ {
+ RangeTableSample *n = (RangeTableSample *) $2;
+ n->relation = $1;
+ $$ = (Node *) n;
+ }
+ ;
+
+tablesample_clause:
+ TABLESAMPLE ColId '(' func_arg_list ')' opt_repeatable_clause
+ {
+ RangeTableSample *n = makeNode(RangeTableSample);
+ n->method = $2;
+ n->args = $4;
+ n->repeatable = $6;
+ $$ = (Node *) n;
+ }
+ ;
+
+opt_repeatable_clause:
+ REPEATABLE '(' Iconst ')' { $$ = makeIntConst($3, @3); }
+ | /*EMPTY*/ { $$ = NULL; }
+ ;
+
/*
* func_table represents a function invocation in a FROM list. It can be
* a plain function call, like "foo(...)", or a ROWS FROM expression with
@@ -13244,7 +13275,6 @@ unreserved_keyword:
| RELATIVE_P
| RELEASE
| RENAME
- | REPEATABLE
| REPLACE
| REPLICA
| RESET
@@ -13419,6 +13449,7 @@ type_func_name_keyword:
| OVERLAPS
| RIGHT
| SIMILAR
+ | TABLESAMPLE
| VERBOSE
;
@@ -13487,6 +13518,7 @@ reserved_keyword:
| PLACING
| PRIMARY
| REFERENCES
+ | REPEATABLE
| RETURNING
| SELECT
| SESSION_USER
diff --git a/src/backend/parser/parse_clause.c b/src/backend/parser/parse_clause.c
index 4931dca..6d64a84 100644
--- a/src/backend/parser/parse_clause.c
+++ b/src/backend/parser/parse_clause.c
@@ -16,6 +16,7 @@
#include "postgres.h"
#include "access/heapam.h"
+#include "access/htup_details.h"
#include "catalog/heap.h"
#include "catalog/pg_type.h"
#include "commands/defrem.h"
@@ -29,6 +30,7 @@
#include "parser/parse_coerce.h"
#include "parser/parse_collate.h"
#include "parser/parse_expr.h"
+#include "parser/parse_func.h"
#include "parser/parse_oper.h"
#include "parser/parse_relation.h"
#include "parser/parse_target.h"
@@ -36,6 +38,7 @@
#include "utils/guc.h"
#include "utils/lsyscache.h"
#include "utils/rel.h"
+#include "utils/syscache.h"
/* Convenience macro for the most common makeNamespaceItem() case */
@@ -413,6 +416,19 @@ transformJoinOnClause(ParseState *pstate, JoinExpr *j, List *namespace)
return result;
}
+static RangeTblEntry *
+transformTableSampleEntry(ParseState *pstate, RangeTableSample *r)
+{
+ RangeTblEntry *rte;
+ TableSampleClause *tablesample = NULL;
+
+ rte = transformTableEntry(pstate, r->relation);
+ tablesample = ParseTableSample(pstate, r->method, r->repeatable, r->args);
+ rte->tablesample = tablesample;
+
+ return rte;
+}
+
/*
* transformTableEntry --- transform a RangeVar (simple relation reference)
*/
@@ -421,7 +437,7 @@ transformTableEntry(ParseState *pstate, RangeVar *r)
{
RangeTblEntry *rte;
- /* We need only build a range table entry */
+ /* We first need to build a range table entry */
rte = addRangeTableEntry(pstate, r, r->alias,
interpretInhOption(r->inhOpt), true);
@@ -1121,6 +1137,26 @@ transformFromClauseItem(ParseState *pstate, Node *n,
return (Node *) j;
}
+ else if (IsA(n, RangeTableSample))
+ {
+ /* Tablesample reference */
+ RangeTableSample *rv = (RangeTableSample *) n;
+ RangeTblRef *rtr;
+ RangeTblEntry *rte = NULL;
+ int rtindex;
+
+ rte = transformTableSampleEntry(pstate, rv);
+
+ /* assume new rte is at end */
+ rtindex = list_length(pstate->p_rtable);
+ Assert(rte == rt_fetch(rtindex, pstate->p_rtable));
+ *top_rte = rte;
+ *top_rti = rtindex;
+ *namespace = list_make1(makeDefaultNSItem(rte));
+ rtr = makeNode(RangeTblRef);
+ rtr->rtindex = rtindex;
+ return (Node *) rtr;
+ }
else
elog(ERROR, "unrecognized node type: %d", (int) nodeTag(n));
return NULL; /* can't get here, keep compiler quiet */
diff --git a/src/backend/parser/parse_func.c b/src/backend/parser/parse_func.c
index 9ebd3fd..cc91af2 100644
--- a/src/backend/parser/parse_func.c
+++ b/src/backend/parser/parse_func.c
@@ -18,6 +18,7 @@
#include "catalog/pg_aggregate.h"
#include "catalog/pg_proc.h"
#include "catalog/pg_type.h"
+#include "catalog/pg_tablesamplemethod.h"
#include "funcapi.h"
#include "lib/stringinfo.h"
#include "nodes/makefuncs.h"
@@ -26,6 +27,7 @@
#include "parser/parse_clause.h"
#include "parser/parse_coerce.h"
#include "parser/parse_func.h"
+#include "parser/parse_expr.h"
#include "parser/parse_relation.h"
#include "parser/parse_target.h"
#include "parser/parse_type.h"
@@ -760,6 +762,120 @@ ParseFuncOrColumn(ParseState *pstate, List *funcname, List *fargs,
}
+/*
+ * ParseTableSample
+ *
+ * Parse TABLESAMPLE clause and process the arguments
+ */
+extern TableSampleClause *
+ParseTableSample(ParseState *pstate, char *samplemethod, Node *repeatable,
+ List *sampleargs)
+{
+ HeapTuple tuple;
+ Form_pg_tablesamplemethod tsm;
+ Form_pg_proc procform;
+ TableSampleClause *tablesample;
+ List *fargs;
+ ListCell *larg;
+ int nargs, initnargs;
+ Oid actual_arg_types[FUNC_MAX_ARGS];
+ Oid init_arg_types[FUNC_MAX_ARGS];
+
+ /* Load the table sample method */
+ tuple = SearchSysCache1(TABLESAMPLEMETHODNAME, PointerGetDatum(samplemethod));
+ if (!HeapTupleIsValid(tuple))
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_OBJECT),
+ errmsg("table sample method \"%s\" does not exist",
+ samplemethod)));
+
+ tablesample = makeNode(TableSampleClause);
+ tablesample->tsmid = HeapTupleGetOid(tuple);
+
+ tsm = (Form_pg_tablesamplemethod) GETSTRUCT(tuple);
+
+ tablesample->tsminit = tsm->tsminit;
+ tablesample->tsmnextblock = tsm->tsmnextblock;
+ tablesample->tsmnexttuple = tsm->tsmnexttuple;
+ tablesample->tsmend = tsm->tsmend;
+ tablesample->tsmreset = tsm->tsmreset;
+ tablesample->tsmcost = tsm->tsmcost;
+
+ ReleaseSysCache(tuple);
+
+ /* Validate the parameters against init function definition. */
+ tuple = SearchSysCache1(PROCOID,
+ ObjectIdGetDatum(tablesample->tsminit));
+
+ if (!HeapTupleIsValid(tuple)) /* should not happen */
+ elog(ERROR, "cache lookup failed for function %u",
+ tablesample->tsminit);
+
+ procform = (Form_pg_proc) GETSTRUCT(tuple);
+ initnargs = procform->pronargs;
+ Assert(initnargs >= 3);
+
+ /*
+ * First parameter is used to pass the SampleScanState, second is
+ * seed (REPEATABLE), skip the processing for them here, just assert
+ * that the types are correct.
+ * XXX: maybe make this ereport?
+ */
+ Assert(procform->proargtypes.values[0] == INTERNALOID);
+ Assert(procform->proargtypes.values[1] == INT4OID);
+ initnargs -= 2;
+ memcpy(init_arg_types, procform->proargtypes.values + 2,
+ initnargs * sizeof(Oid));
+
+ /* Now we are done with the catalog */
+ ReleaseSysCache(tuple);
+
+ /* Process repeatable (seed) */
+ if (repeatable != NULL)
+ {
+ Node *arg = transformExpr(pstate, repeatable, EXPR_KIND_FROM_FUNCTION);
+
+ arg = coerce_to_specific_type(pstate, arg, INT4OID, "REPEATABLE");
+ tablesample->repeatable = arg;
+ }
+ else
+ tablesample->repeatable = NULL;
+
+ /* Transform the rest of arguments ... */
+ fargs = NIL;
+ nargs = 0;
+ foreach(larg, sampleargs)
+ {
+ Node *arg = transformExpr(pstate, (Node *) lfirst(larg), EXPR_KIND_FROM_FUNCTION);
+ Oid argtype = exprType(arg);
+
+ fargs = lappend(fargs, arg);
+
+ actual_arg_types[nargs++] = argtype;
+ }
+
+ /*
+ * Check if parameters are correct.
+ *
+ * XXX: can we do better at hinting here?
+ */
+ if (initnargs != nargs ||
+ !can_coerce_type(initnargs, actual_arg_types, init_arg_types,
+ COERCION_IMPLICIT))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("wrong parameters for TABLESAMPLE method \"%s\"",
+ samplemethod)));
+
+ /* perform the necessary typecasting of arguments */
+ make_fn_arguments(pstate, fargs, actual_arg_types, init_arg_types);
+
+ /* Pass the arguments down */
+ tablesample->args = fargs;
+
+ return tablesample;
+}
+
/* func_match_argtypes()
*
* Given a list of candidate functions (having the right name and number
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index 24ade6c..56d1266 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -31,6 +31,7 @@
#include "catalog/pg_opclass.h"
#include "catalog/pg_operator.h"
#include "catalog/pg_proc.h"
+#include "catalog/pg_tablesamplemethod.h"
#include "catalog/pg_trigger.h"
#include "catalog/pg_type.h"
#include "commands/defrem.h"
@@ -343,6 +344,8 @@ static void make_ruledef(StringInfo buf, HeapTuple ruletup, TupleDesc rulettc,
int prettyFlags);
static void make_viewdef(StringInfo buf, HeapTuple ruletup, TupleDesc rulettc,
int prettyFlags, int wrapColumn);
+static void get_tablesample_def(TableSampleClause *tablesample,
+ deparse_context *context);
static void get_query_def(Query *query, StringInfo buf, List *parentnamespace,
TupleDesc resultDesc,
int prettyFlags, int wrapColumn, int startIndent);
@@ -4157,6 +4160,50 @@ make_viewdef(StringInfo buf, HeapTuple ruletup, TupleDesc rulettc,
heap_close(ev_relation, AccessShareLock);
}
+/* ----------
+ * get_tablesample_def - Convert TableSampleClause back to SQL
+ * ----------
+ */
+static void
+get_tablesample_def(TableSampleClause *tablesample, deparse_context *context)
+{
+ StringInfo buf = context->buf;
+ HeapTuple tuple;
+ Form_pg_tablesamplemethod tsm;
+ char *tsmname;
+ int nargs;
+ ListCell *l;
+
+ /* Load the table sample method */
+ tuple = SearchSysCache1(TABLESAMPLEMETHODOID, ObjectIdGetDatum(tablesample->tsmid));
+ if (!HeapTupleIsValid(tuple))
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_OBJECT),
+ errmsg("cache lookup failed for table sample method %u",
+ tablesample->tsmid)));
+
+ tsm = (Form_pg_tablesamplemethod) GETSTRUCT(tuple);
+ tsmname = NameStr(tsm->tsmname);
+ appendStringInfo(buf, " TABLESAMPLE %s (", quote_identifier(tsmname));
+
+ ReleaseSysCache(tuple);
+
+ nargs = 0;
+ foreach(l, tablesample->args)
+ {
+ if (nargs++ > 0)
+ appendStringInfoString(buf, ", ");
+ get_rule_expr((Node *) lfirst(l), context, true);
+ }
+ appendStringInfoChar(buf, ')');
+
+ if (tablesample->repeatable != NULL)
+ {
+ appendStringInfoString(buf, " REPEATABLE (");
+ get_rule_expr(tablesample->repeatable, context, true);
+ }
+ appendStringInfoChar(buf, ')');
+}
/* ----------
* get_query_def - Parse back one query parsetree
@@ -8384,6 +8431,9 @@ get_from_clause_item(Node *jtnode, Query *query, deparse_context *context)
only_marker(rte),
generate_relation_name(rte->relid,
context->namespaces));
+
+ if (rte->tablesample)
+ get_tablesample_def(rte->tablesample, context);
break;
case RTE_SUBQUERY:
/* Subquery RTE */
diff --git a/src/backend/utils/cache/syscache.c b/src/backend/utils/cache/syscache.c
index 94d951c..6832e0b 100644
--- a/src/backend/utils/cache/syscache.c
+++ b/src/backend/utils/cache/syscache.c
@@ -55,6 +55,7 @@
#include "catalog/pg_shdescription.h"
#include "catalog/pg_shseclabel.h"
#include "catalog/pg_statistic.h"
+#include "catalog/pg_tablesamplemethod.h"
#include "catalog/pg_tablespace.h"
#include "catalog/pg_ts_config.h"
#include "catalog/pg_ts_config_map.h"
@@ -642,6 +643,28 @@ static const struct cachedesc cacheinfo[] = {
},
128
},
+ {TableSampleMethodRelationId, /* TABLESAMPLEMETHODNAME */
+ TableSampleMethodNameIndexId,
+ 1,
+ {
+ Anum_pg_tablesamplemethod_tsmname,
+ 0,
+ 0,
+ 0,
+ },
+ 2
+ },
+ {TableSampleMethodRelationId, /* TABLESAMPLEMETHODOID */
+ TableSampleMethodOidIndexId,
+ 1,
+ {
+ ObjectIdAttributeNumber,
+ 0,
+ 0,
+ 0,
+ },
+ 2
+ },
{TableSpaceRelationId, /* TABLESPACEOID */
TablespaceOidIndexId,
1,
diff --git a/src/backend/utils/misc/Makefile b/src/backend/utils/misc/Makefile
index c7b745e..f311c74 100644
--- a/src/backend/utils/misc/Makefile
+++ b/src/backend/utils/misc/Makefile
@@ -15,7 +15,7 @@ include $(top_builddir)/src/Makefile.global
override CPPFLAGS := -I. -I$(srcdir) $(CPPFLAGS)
OBJS = guc.o help_config.o pg_rusage.o ps_status.o rbtree.o \
- superuser.o timeout.o tzparser.o
+ sampling.o superuser.o timeout.o tzparser.o
# This location might depend on the installation directories. Therefore
# we can't subsitute it into pg_config.h.
diff --git a/src/backend/utils/misc/sampling.c b/src/backend/utils/misc/sampling.c
new file mode 100644
index 0000000..c07f01e
--- /dev/null
+++ b/src/backend/utils/misc/sampling.c
@@ -0,0 +1,131 @@
+/*-------------------------------------------------------------------------
+ *
+ * sampling.c
+ * Block sampling routines shared by ANALYZE and TABLESAMPLE.
+ *
+ * Portions Copyright (c) 1996-2012, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ * src/backend/utils/misc/sampling.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include <math.h>
+
+#include "utils/sampling.h"
+
+static unsigned short _sampler_seed[3] = { 0x330e, 0xabcd, 0x1234 };
+
+/*
+ * BlockSampler_Init -- prepare for random sampling of blocknumbers
+ *
+ * BlockSampler is used for stage one of our new two-stage tuple
+ * sampling mechanism as discussed on pgsql-hackers 2004-04-02 (subject
+ * "Large DB"). It selects a random sample of samplesize blocks out of
+ * the nblocks blocks in the table. If the table has less than
+ * samplesize blocks, all blocks are selected.
+ *
+ * Since we know the total number of blocks in advance, we can use the
+ * straightforward Algorithm S from Knuth 3.4.2, rather than Vitter's
+ * algorithm.
+ */
+void
+BlockSampler_Init(BlockSampler bs, BlockNumber nblocks, int samplesize)
+{
+ bs->N = nblocks; /* measured table size */
+
+ /*
+ * If we decide to reduce samplesize for tables that have less or not much
+ * more than samplesize blocks, here is the place to do it.
+ */
+ bs->n = samplesize;
+ bs->t = 0; /* blocks scanned so far */
+ bs->m = 0; /* blocks selected so far */
+}
+
+bool
+BlockSampler_HasMore(BlockSampler bs)
+{
+ return (bs->t < bs->N) && (bs->m < bs->n);
+}
+
+BlockNumber
+BlockSampler_Next(BlockSampler bs)
+{
+ BlockNumber K = bs->N - bs->t; /* remaining blocks */
+ int k = bs->n - bs->m; /* blocks still to sample */
+ double p; /* probability to skip block */
+ double V; /* random */
+
+ Assert(BlockSampler_HasMore(bs)); /* hence K > 0 and k > 0 */
+
+ if ((BlockNumber) k >= K)
+ {
+ /* need all the rest */
+ bs->m++;
+ return bs->t++;
+ }
+
+ /*----------
+ * It is not obvious that this code matches Knuth's Algorithm S.
+ * Knuth says to skip the current block with probability 1 - k/K.
+ * If we are to skip, we should advance t (hence decrease K), and
+ * repeat the same probabilistic test for the next block. The naive
+ * implementation thus requires an sampler_random_fract() call for each
+ * block number. But we can reduce this to one sampler_random_fract()
+ * call per selected block, by noting that each time the while-test
+ * succeeds, we can reinterpret V as a uniform random number in the range
+ * 0 to p. Therefore, instead of choosing a new V, we just adjust p to be
+ * the appropriate fraction of its former value, and our next loop
+ * makes the appropriate probabilistic test.
+ *
+ * We have initially K > k > 0. If the loop reduces K to equal k,
+ * the next while-test must fail since p will become exactly zero
+ * (we assume there will not be roundoff error in the division).
+ * (Note: Knuth suggests a "<=" loop condition, but we use "<" just
+ * to be doubly sure about roundoff error.) Therefore K cannot become
+ * less than k, which means that we cannot fail to select enough blocks.
+ *----------
+ */
+ V = sampler_random_fract();
+ p = 1.0 - (double) k / (double) K;
+ while (V < p)
+ {
+ /* skip */
+ bs->t++;
+ K--; /* keep K == N - t */
+
+ /* adjust p to be new cutoff point in reduced range */
+ p *= 1.0 - (double) k / (double) K;
+ }
+
+ /* select */
+ bs->m++;
+ return bs->t++;
+}
+
+
+/*----------
+ * Random number generator used by sampling
+ *----------
+ */
+
+void
+sampler_setseed(long seed)
+{
+ _sampler_seed[0] = 0x330e;
+ _sampler_seed[1] = (unsigned short) seed;
+ _sampler_seed[2] = (unsigned short) (seed >> 16);
+}
+
+/* Select a random value R uniformly distributed in (0 - 1) */
+double
+sampler_random_fract(void)
+{
+ return pg_erand48(_sampler_seed);
+}
diff --git a/src/include/access/tsm_bernoulli.h b/src/include/access/tsm_bernoulli.h
new file mode 100644
index 0000000..00cd069
--- /dev/null
+++ b/src/include/access/tsm_bernoulli.h
@@ -0,0 +1,20 @@
+/*--------------------------------------------------------------------------
+ * tsm_bernoulli.h
+ * Header file for BERNOULLI table sampling method.
+ *
+ * Copyright (c) 2006-2014, PostgreSQL Global Development Group
+ *
+ * src/include/access/tsm_bernoulli.h
+ *--------------------------------------------------------------------------
+ */
+#ifndef TSM_BERNOULLI_H
+#define TSM_BERNOULLI_H
+
+extern Datum tsm_bernoulli_init(PG_FUNCTION_ARGS);
+extern Datum tsm_bernoulli_nextblock(PG_FUNCTION_ARGS);
+extern Datum tsm_bernoulli_nexttuple(PG_FUNCTION_ARGS);
+extern Datum tsm_bernoulli_end(PG_FUNCTION_ARGS);
+extern Datum tsm_bernoulli_reset(PG_FUNCTION_ARGS);
+extern Datum tsm_bernoulli_cost(PG_FUNCTION_ARGS);
+
+#endif /* TSM_SYSTEM_H */
diff --git a/src/include/access/tsm_system.h b/src/include/access/tsm_system.h
new file mode 100644
index 0000000..4021470
--- /dev/null
+++ b/src/include/access/tsm_system.h
@@ -0,0 +1,20 @@
+/*--------------------------------------------------------------------------
+ * tsm_system.h
+ * Header file for SYSTEM table sampling method.
+ *
+ * Copyright (c) 2006-2014, PostgreSQL Global Development Group
+ *
+ * src/include/access/tsm_system.h
+ *--------------------------------------------------------------------------
+ */
+#ifndef TSM_SYSTEM_H
+#define TSM_SYSTEM_H
+
+extern Datum tsm_system_init(PG_FUNCTION_ARGS);
+extern Datum tsm_system_nextblock(PG_FUNCTION_ARGS);
+extern Datum tsm_system_nexttuple(PG_FUNCTION_ARGS);
+extern Datum tsm_system_end(PG_FUNCTION_ARGS);
+extern Datum tsm_system_reset(PG_FUNCTION_ARGS);
+extern Datum tsm_system_cost(PG_FUNCTION_ARGS);
+
+#endif /* TSM_SYSTEM_H */
diff --git a/src/include/catalog/indexing.h b/src/include/catalog/indexing.h
index bde1a84..5eb4811 100644
--- a/src/include/catalog/indexing.h
+++ b/src/include/catalog/indexing.h
@@ -305,6 +305,11 @@ DECLARE_UNIQUE_INDEX(pg_policy_oid_index, 3257, on pg_policy using btree(oid oid
DECLARE_UNIQUE_INDEX(pg_policy_polrelid_polname_index, 3258, on pg_policy using btree(polrelid oid_ops, polname name_ops));
#define PolicyPolrelidPolnameIndexId 3258
+DECLARE_UNIQUE_INDEX(pg_tablesamplemethod_name_index, 3281, on pg_tablesamplemethod using btree(tsmname name_ops));
+#define TableSampleMethodNameIndexId 3281
+DECLARE_UNIQUE_INDEX(pg_tablesamplemethod_oid_index, 3282, on pg_tablesamplemethod using btree(oid oid_ops));
+#define TableSampleMethodOidIndexId 3282
+
/* last step of initialization script: build the indexes declared above */
BUILD_INDICES
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index eace352..54359f7 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -5136,6 +5136,31 @@ DESCR("rank of hypothetical row without gaps");
DATA(insert OID = 3993 ( dense_rank_final PGNSP PGUID 12 1 0 2276 0 f f f f f f i 2 0 20 "2281 2276" "{2281,2276}" "{i,v}" _null_ _null_ hypothetical_dense_rank_final _null_ _null_ _null_ ));
DESCR("aggregate final function");
+DATA(insert OID = 3285 ( tsm_system_init PGNSP PGUID 12 1 0 0 0 f f f f t f v 3 0 2278 "2281 23 700" _null_ _null_ _null_ _null_ tsm_system_init _null_ _null_ _null_ ));
+DESCR("tsm_system_init(internal)");
+DATA(insert OID = 3286 ( tsm_system_nextblock PGNSP PGUID 12 1 0 0 0 f f f f t f v 1 0 23 "2281" _null_ _null_ _null_ _null_ tsm_system_nextblock _null_ _null_ _null_ ));
+DESCR("tsm_system_nextblock(internal)");
+DATA(insert OID = 3287 ( tsm_system_nexttuple PGNSP PGUID 12 1 0 0 0 f f f f t f v 1 0 21 "2281" _null_ _null_ _null_ _null_ tsm_system_nexttuple _null_ _null_ _null_ ));
+DESCR("tsm_system_nexttuple(internal)");
+DATA(insert OID = 3288 ( tsm_system_end PGNSP PGUID 12 1 0 0 0 f f f f t f v 1 0 2278 "2281" _null_ _null_ _null_ _null_ tsm_system_end _null_ _null_ _null_ ));
+DESCR("tsm_system_end(internal)");
+DATA(insert OID = 3289 ( tsm_system_reset PGNSP PGUID 12 1 0 0 0 f f f f t f v 1 0 2278 "2281" _null_ _null_ _null_ _null_ tsm_system_reset _null_ _null_ _null_ ));
+DESCR("tsm_system_reset(internal)");
+DATA(insert OID = 3290 ( tsm_system_cost PGNSP PGUID 12 1 0 0 0 f f f f t f v 1 0 2278 "700" _null_ _null_ _null_ _null_ tsm_system_cost _null_ _null_ _null_ ));
+DESCR("tsm_system_cost(internal)");
+
+DATA(insert OID = 3291 ( tsm_bernoulli_init PGNSP PGUID 12 1 0 0 0 f f f f t f v 3 0 2278 "2281 23 700" _null_ _null_ _null_ _null_ tsm_bernoulli_init _null_ _null_ _null_ ));
+DESCR("tsm_bernoulli_init(internal)");
+DATA(insert OID = 3292 ( tsm_bernoulli_nextblock PGNSP PGUID 12 1 0 0 0 f f f f t f v 1 0 23 "2281" _null_ _null_ _null_ _null_ tsm_bernoulli_nextblock _null_ _null_ _null_ ));
+DESCR("tsm_bernoulli_nextblock(internal)");
+DATA(insert OID = 3293 ( tsm_bernoulli_nexttuple PGNSP PGUID 12 1 0 0 0 f f f f t f v 1 0 21 "2281" _null_ _null_ _null_ _null_ tsm_bernoulli_nexttuple _null_ _null_ _null_ ));
+DESCR("tsm_bernoulli_nexttuple(internal)");
+DATA(insert OID = 3294 ( tsm_bernoulli_end PGNSP PGUID 12 1 0 0 0 f f f f t f v 1 0 2278 "2281" _null_ _null_ _null_ _null_ tsm_bernoulli_end _null_ _null_ _null_ ));
+DESCR("tsm_bernoulli_end(internal)");
+DATA(insert OID = 3296 ( tsm_bernoulli_reset PGNSP PGUID 12 1 0 0 0 f f f f t f v 1 0 2278 "2281" _null_ _null_ _null_ _null_ tsm_bernoulli_reset _null_ _null_ _null_ ));
+DESCR("tsm_bernoulli_reset(internal)");
+DATA(insert OID = 3297 ( tsm_bernoulli_cost PGNSP PGUID 12 1 0 0 0 f f f f t f v 1 0 2278 "700" _null_ _null_ _null_ _null_ tsm_bernoulli_cost _null_ _null_ _null_ ));
+DESCR("tsm_bernoulli_cost(internal)");
/*
* Symbolic values for provolatile column: these indicate whether the result
diff --git a/src/include/catalog/pg_tablesamplemethod.h b/src/include/catalog/pg_tablesamplemethod.h
new file mode 100644
index 0000000..a0ce3ab
--- /dev/null
+++ b/src/include/catalog/pg_tablesamplemethod.h
@@ -0,0 +1,70 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_tablesamplemethod.h
+ * definition of the table scan methods.
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/catalog/pg_tablesamplemethod.h
+ *
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PG_TABLESAMPLEMETHOD_H
+#define PG_TABLESAMPLEMETHOD_H
+
+#include "catalog/genbki.h"
+
+/* ----------------
+ * pg_tablesamplemethod definition. cpp turns this into
+ * typedef struct FormData_pg_tablesamplemethod
+ * ----------------
+ */
+#define TableSampleMethodRelationId 3280
+
+CATALOG(pg_tablesamplemethod,3280)
+{
+ NameData tsmname; /* tablescan method name */
+ regproc tsminit; /* init scan function */
+ regproc tsmnextblock; /* function returning next block to sample
+ or InvalidBlockOffset if finished */
+ regproc tsmnexttuple; /* function returning next tuple offset from current block
+ or InvalidOffsetNumber if end of the block was reacher */
+ regproc tsmend; /* end scan function*/
+ regproc tsmreset; /* reset state - used by rescan */
+ regproc tsmcost; /* costing function */
+} FormData_pg_tablesamplemethod;
+
+/* ----------------
+ * Form_pg_tablesamplemethod corresponds to a pointer to a tuple with
+ * the format of pg_tablesamplemethod relation.
+ * ----------------
+ */
+typedef FormData_pg_tablesamplemethod *Form_pg_tablesamplemethod;
+
+/* ----------------
+ * compiler constants for pg_tablesamplemethod
+ * ----------------
+ */
+#define Natts_pg_tablesamplemethod 7
+#define Anum_pg_tablesamplemethod_tsmname 1
+#define Anum_pg_tablesamplemethod_tsminit 2
+#define Anum_pg_tablesamplemethod_tsmnextblock 3
+#define Anum_pg_tablesamplemethod_tsmnexttuple 4
+#define Anum_pg_tablesamplemethod_tsmend 5
+#define Anum_pg_tablesamplemethod_tsmreset 6
+#define Anum_pg_tablesamplemethod_tsmcost 7
+
+/* ----------------
+ * initial contents of pg_tablesamplemethod
+ * ----------------
+ */
+
+DATA(insert OID = 3283 ( system tsm_system_init tsm_system_nextblock tsm_system_nexttuple tsm_system_end tsm_system_reset tsm_system_cost ));
+DESCR("SYSTEM table sampling method");
+DATA(insert OID = 3284 ( bernoulli tsm_bernoulli_init tsm_bernoulli_nextblock tsm_bernoulli_nexttuple tsm_bernoulli_end tsm_bernoulli_reset tsm_bernoulli_cost ));
+DESCR("BERNOULLI table sampling method");
+
+#endif /* PG_TABLESAMPLEMETHOD_H */
diff --git a/src/include/executor/nodeSamplescan.h b/src/include/executor/nodeSamplescan.h
new file mode 100644
index 0000000..4b769da
--- /dev/null
+++ b/src/include/executor/nodeSamplescan.h
@@ -0,0 +1,24 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeSamplescan.h
+ *
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/executor/nodeSamplescan.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef NODESAMPLESCAN_H
+#define NODESAMPLESCAN_H
+
+#include "nodes/execnodes.h"
+
+extern SampleScanState *ExecInitSampleScan(SampleScan *node, EState *estate, int eflags);
+extern TupleTableSlot *ExecSampleScan(SampleScanState *node);
+extern void ExecEndSampleScan(SampleScanState *node);
+extern void ExecReScanSampleScan(SampleScanState *node);
+
+#endif /* NODESAMPLESCAN_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 41b13b2..b7f3129 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1212,6 +1212,26 @@ typedef struct ScanState
typedef ScanState SeqScanState;
/*
+ * SampleScan
+ */
+typedef struct SampleScanState
+{
+ ScanState ss;
+
+ /* Sampling method functions. */
+ FmgrInfo tsminit;
+ FmgrInfo tsmnextblock;
+ FmgrInfo tsmnexttuple;
+ FmgrInfo tsmend;
+ FmgrInfo tsmreset;
+
+ Buffer openbuffer; /* currently open buffer */
+ HeapTupleData tup; /* last tuple */
+
+ void *tsmdata; /* for use by table scan method */
+} SampleScanState;
+
+/*
* These structs store information about index quals that don't have simple
* constant right-hand sides. See comments for ExecIndexBuildScanKeys()
* for discussion.
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index bc71fea..01d4795 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -61,6 +61,7 @@ typedef enum NodeTag
T_ValuesScan,
T_CteScan,
T_WorkTableScan,
+ T_SampleScan,
T_ForeignScan,
T_CustomScan,
T_Join,
@@ -97,6 +98,7 @@ typedef enum NodeTag
T_BitmapOrState,
T_ScanState,
T_SeqScanState,
+ T_SampleScanState,
T_IndexScanState,
T_IndexOnlyScanState,
T_BitmapIndexScanState,
@@ -225,6 +227,7 @@ typedef enum NodeTag
T_MergePath,
T_HashPath,
T_TidPath,
+ T_SamplePath,
T_ForeignPath,
T_CustomPath,
T_AppendPath,
@@ -413,6 +416,8 @@ typedef enum NodeTag
T_XmlSerialize,
T_WithClause,
T_CommonTableExpr,
+ T_RangeTableSample,
+ T_TableSampleClause,
/*
* TAGS FOR REPLICATION GRAMMAR PARSE NODES (replnodes.h)
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 458eeb0..62c2c57 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -307,6 +307,23 @@ typedef struct FuncCall
} FuncCall;
/*
+ * TableSampleClause - a sampling method information
+ */
+typedef struct TableSampleClause
+{
+ NodeTag type;
+ Oid tsmid;
+ Oid tsminit;
+ Oid tsmnextblock;
+ Oid tsmnexttuple;
+ Oid tsmend;
+ Oid tsmreset;
+ Oid tsmcost;
+ Node *repeatable;
+ List *args;
+} TableSampleClause;
+
+/*
* A_Star - '*' representing all columns of a table or compound field
*
* This can appear within ColumnRef.fields, A_Indirection.indirection, and
@@ -507,6 +524,21 @@ typedef struct RangeFunction
} RangeFunction;
/*
+ * RangeTableSample - represents <table> TABLESAMPLE <method> (<params>) REPEATABLE (<num>)
+ *
+ * We are more generic than SQL Standard so we pass generic function
+ * arguments to the sampling method.
+ */
+typedef struct RangeTableSample
+{
+ NodeTag type;
+ RangeVar *relation;
+ char *method; /* sampling method */
+ Node *repeatable;
+ List *args; /* arguments for sampling method */
+} RangeTableSample;
+
+/*
* ColumnDef - column definition (used in various creates)
*
* If the column has a default value, we may have the value expression
@@ -751,6 +783,7 @@ typedef struct RangeTblEntry
*/
Oid relid; /* OID of the relation */
char relkind; /* relation kind (see pg_class.relkind) */
+ TableSampleClause *tablesample; /* sampling method and parameters */
/*
* Fields valid for a subquery RTE (else NULL):
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 48203a0..8427b44 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -278,6 +278,12 @@ typedef struct Scan
typedef Scan SeqScan;
/* ----------------
+ * table sample scan node
+ * ----------------
+ */
+typedef Scan SampleScan;
+
+/* ----------------
* index scan node
*
* indexqualorig is an implicitly-ANDed list of index qual expressions, each
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 7116496..064e336 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -870,6 +870,18 @@ typedef struct TidPath
} TidPath;
/*
+ * SamplePath represents a sample sacn
+ *
+ * args is list of parameters for the the TABLESAMPLE clause
+ */
+typedef struct SamplePath
+{
+ Path path;
+ Oid tsmcost; /* table sample method costing function */
+ List *tsmargs; /* arguments to a TABLESAMPLE clause */
+} SamplePath;
+
+/*
* ForeignPath represents a potential scan of a foreign table
*
* fdw_private stores FDW private data about the scan. While fdw_private is
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 75e2afb..97bc0ba 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -68,6 +68,7 @@ extern double index_pages_fetched(double tuples_fetched, BlockNumber pages,
double index_pages, PlannerInfo *root);
extern void cost_seqscan(Path *path, PlannerInfo *root, RelOptInfo *baserel,
ParamPathInfo *param_info);
+extern void cost_samplescan(SamplePath *path, PlannerInfo *root, RelOptInfo *baserel);
extern void cost_index(IndexPath *path, PlannerInfo *root,
double loop_count);
extern void cost_bitmap_heap_scan(Path *path, PlannerInfo *root, RelOptInfo *baserel,
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 26b17f5..6c0a6cf 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -32,6 +32,8 @@ extern bool add_path_precheck(RelOptInfo *parent_rel,
extern Path *create_seqscan_path(PlannerInfo *root, RelOptInfo *rel,
Relids required_outer);
+extern SamplePath *create_samplescan_path(PlannerInfo *root, RelOptInfo *rel,
+ Relids required_outer);
extern IndexPath *create_index_path(PlannerInfo *root,
IndexOptInfo *index,
List *indexclauses,
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index e14dc9a..e565082 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -312,7 +312,7 @@ PG_KEYWORD("reindex", REINDEX, UNRESERVED_KEYWORD)
PG_KEYWORD("relative", RELATIVE_P, UNRESERVED_KEYWORD)
PG_KEYWORD("release", RELEASE, UNRESERVED_KEYWORD)
PG_KEYWORD("rename", RENAME, UNRESERVED_KEYWORD)
-PG_KEYWORD("repeatable", REPEATABLE, UNRESERVED_KEYWORD)
+PG_KEYWORD("repeatable", REPEATABLE, RESERVED_KEYWORD)
PG_KEYWORD("replace", REPLACE, UNRESERVED_KEYWORD)
PG_KEYWORD("replica", REPLICA, UNRESERVED_KEYWORD)
PG_KEYWORD("reset", RESET, UNRESERVED_KEYWORD)
@@ -366,6 +366,7 @@ PG_KEYWORD("sysid", SYSID, UNRESERVED_KEYWORD)
PG_KEYWORD("system", SYSTEM_P, UNRESERVED_KEYWORD)
PG_KEYWORD("table", TABLE, RESERVED_KEYWORD)
PG_KEYWORD("tables", TABLES, UNRESERVED_KEYWORD)
+PG_KEYWORD("tablesample", TABLESAMPLE, TYPE_FUNC_NAME_KEYWORD)
PG_KEYWORD("tablespace", TABLESPACE, UNRESERVED_KEYWORD)
PG_KEYWORD("temp", TEMP, UNRESERVED_KEYWORD)
PG_KEYWORD("template", TEMPLATE, UNRESERVED_KEYWORD)
diff --git a/src/include/parser/parse_func.h b/src/include/parser/parse_func.h
index 4423bc0..0ba9768 100644
--- a/src/include/parser/parse_func.h
+++ b/src/include/parser/parse_func.h
@@ -33,6 +33,10 @@ typedef enum
extern Node *ParseFuncOrColumn(ParseState *pstate, List *funcname, List *fargs,
FuncCall *fn, int location);
+extern TableSampleClause *ParseTableSample(ParseState *pstate,
+ char *samplemethod,
+ Node *repeatable, List *args);
+
extern FuncDetailCode func_get_detail(List *funcname,
List *fargs, List *fargnames,
int nargs, Oid *argtypes,
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index 48ebf59..1ba06b6 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -63,7 +63,6 @@ typedef struct RelationAmInfo
FmgrInfo amcanreturn;
} RelationAmInfo;
-
/*
* Here are the contents of a relation cache entry.
*/
diff --git a/src/include/utils/sampling.h b/src/include/utils/sampling.h
new file mode 100644
index 0000000..734cdc0
--- /dev/null
+++ b/src/include/utils/sampling.h
@@ -0,0 +1,49 @@
+/*-------------------------------------------------------------------------
+ *
+ * sampling.h
+ * definitions for sampling functions
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/utils/sampling.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef SAMPLING_H
+#define SAMPLING_H
+
+#include "storage/bufmgr.h"
+
+typedef enum SamplerAccessStrategy
+{
+ SAS_RANDOM,
+ SAS_SEQUENTIAL
+} SamplerAccessStrategy;
+
+/* Data structure for Algorithm S from Knuth 3.4.2 */
+typedef struct
+{
+ BlockNumber N; /* number of blocks, known in advance */
+ int n; /* desired sample size */
+ BlockNumber t; /* current block number */
+ int m; /* blocks selected so far */
+} BlockSamplerData;
+
+typedef BlockSamplerData *BlockSampler;
+
+extern void BlockSampler_Init(BlockSampler bs, BlockNumber nblocks,
+ int samplesize);
+extern bool BlockSampler_HasMore(BlockSampler bs);
+extern BlockNumber BlockSampler_Next(BlockSampler bs);
+
+/* Vitter reservoir sampling functions */
+extern double vitter_init_selection_state(int n);
+extern double vitter_get_next_S(double t, int n, double *stateptr);
+
+/* Random generator */
+extern void sampler_setseed(long seed);
+extern double sampler_random_fract(void);
+
+#endif /* SAMPLING_H */
diff --git a/src/include/utils/syscache.h b/src/include/utils/syscache.h
index f97229f..29244c7 100644
--- a/src/include/utils/syscache.h
+++ b/src/include/utils/syscache.h
@@ -79,6 +79,8 @@ enum SysCacheIdentifier
RELOID,
RULERELNAME,
STATRELATTINH,
+ TABLESAMPLEMETHODNAME,
+ TABLESAMPLEMETHODOID,
TABLESPACEOID,
TSCONFIGMAP,
TSCONFIGNAMENSP,
diff --git a/src/test/regress/expected/sanity_check.out b/src/test/regress/expected/sanity_check.out
index c7be273..970d4da 100644
--- a/src/test/regress/expected/sanity_check.out
+++ b/src/test/regress/expected/sanity_check.out
@@ -127,6 +127,7 @@ pg_shdepend|t
pg_shdescription|t
pg_shseclabel|t
pg_statistic|t
+pg_tablesamplemethod|t
pg_tablespace|t
pg_trigger|t
pg_ts_config|t
diff --git a/src/test/regress/expected/tablesample.out b/src/test/regress/expected/tablesample.out
new file mode 100644
index 0000000..3d23ca1
--- /dev/null
+++ b/src/test/regress/expected/tablesample.out
@@ -0,0 +1,67 @@
+CREATE TABLE test_tablesample (id INT, name text) WITH (fillfactor=10); -- force smaller pages so we don't have to load too much data to get multiple pages
+INSERT INTO test_tablesample SELECT i, repeat(i::text, 200) FROM generate_series(0, 9) s(i) ORDER BY i;
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50) REPEATABLE (10);
+ id
+----
+ 0
+ 1
+ 2
+ 3
+ 4
+ 5
+ 9
+(7 rows)
+
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (100.0/11) REPEATABLE (9999);
+ id
+----
+ 6
+ 7
+ 8
+(3 rows)
+
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (100) REPEATABLE (10);
+ id
+----
+ 0
+ 1
+ 2
+ 3
+ 4
+ 5
+ 6
+ 7
+ 8
+ 9
+(10 rows)
+
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50) REPEATABLE (100);
+ id
+----
+ 0
+ 1
+ 2
+ 6
+ 7
+ 8
+ 9
+(7 rows)
+
+SELECT id FROM test_tablesample TABLESAMPLE BERNOULLI (50) REPEATABLE (100);
+ id
+----
+ 0
+ 1
+ 3
+ 4
+ 5
+(5 rows)
+
+SELECT id FROM test_tablesample TABLESAMPLE BERNOULLI (5.5) REPEATABLE (1);
+ id
+----
+ 0
+ 5
+(2 rows)
+
+DROP TABLE test_tablesample;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 62cc198..cf789dc 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -83,7 +83,7 @@ test: select_into select_distinct select_distinct_on select_implicit select_havi
# ----------
# Another group of parallel tests
# ----------
-test: brin gin gist spgist privileges security_label collate matview lock replica_identity rowsecurity
+test: brin gin gist spgist privileges security_label collate matview lock replica_identity rowsecurity tablesample
# ----------
# Another group of parallel tests
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 07fc827..852fed9 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -151,3 +151,4 @@ test: with
test: xml
test: event_trigger
test: stats
+test: tablesample
diff --git a/src/test/regress/sql/tablesample.sql b/src/test/regress/sql/tablesample.sql
new file mode 100644
index 0000000..5f6e828
--- /dev/null
+++ b/src/test/regress/sql/tablesample.sql
@@ -0,0 +1,12 @@
+CREATE TABLE test_tablesample (id INT, name text) WITH (fillfactor=10); -- force smaller pages so we don't have to load too much data to get multiple pages
+
+INSERT INTO test_tablesample SELECT i, repeat(i::text, 200) FROM generate_series(0, 9) s(i) ORDER BY i;
+
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50) REPEATABLE (10);
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (100.0/11) REPEATABLE (9999);
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (100) REPEATABLE (10);
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50) REPEATABLE (100);
+SELECT id FROM test_tablesample TABLESAMPLE BERNOULLI (50) REPEATABLE (100);
+SELECT id FROM test_tablesample TABLESAMPLE BERNOULLI (5.5) REPEATABLE (1);
+
+DROP TABLE test_tablesample;
Hi,
On 18.12.2014 13:14, Petr Jelinek wrote:
Hi,
v2 version of this patch is attached.
I did a review of this v2 patch today. I plan to do a bit more testing,
but these are my comments/questions so far:
(0) There's a TABLESAMPLE page at the wiki, not updated since 2012:
https://wiki.postgresql.org/wiki/TABLESAMPLE_Implementation
We should either update it or mark it as obsolete I guess. Also,
I'd like to know what's the status regarding the TODO items
mentioned there. Are those still valid with this patch?
(1) The patch adds a new catalog, but does not bump CATVERSION.
(2) The catalog naming (pg_tablesamplemethod) seems a bit awkward,
as it squishes everything into a single chunk. That's inconsistent
with naming of the other catalogs. I think pg_table_sample_method
would be better.
(3) There are a few more strange naming decisions, but that's mostly
because of the SQL standard requires that naming. I mean SYSTEM and
BERNOULLI method names, and also the fact that the probability is
specified as 0-100 value, which is inconsistent with other places
(e.g. percentile_cont uses the usual 0-1 probability notion). But
I don't think this can be fixed, that's what the standard says.
(4) I noticed there's an interesting extension in SQL Server, which
allows specifying PERCENT or ROWS, so you can say
SELECT * FROM table TABLESAMPLE SYSTEM (25 PERCENT);
or
SELECT * FROM table TABLESAMPLE SYSTEM (2500 ROWS);
That seems handy, and it'd make migration from SQL Server easier.
What do you think?
(5) I envision a lot of confusion because of the REPEATABLE clause.
With READ COMMITTED, it's not really repeatable because of changes
done by the other users (and maybe things like autovacuum). Shall
we mention this in the documentation?
(6) This seems slightly wrong, because of long/uint32 mismatch:
long seed = PG_GETARG_UINT32(1);
I think uint32 would be more appropriate, no?
(7) NITPICKING: I think a 'sample_rate' would be a better name here:
double percent = sampler->percent;
(8) NITPICKING: InitSamplingMethod contains a command with ';;'
fcinfo.arg[i] = (Datum) 0;;
(9) The current regression tests only use the REPEATABLE cases. I
understand queries without this clause are RANDOM, but maybe we
could do something like this:
SELECT COUNT(*) BETWEEN 5000 AND 7000 FROM (
SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50)
) foo;
Granted, there's still a small probability of false positive, but
maybe that's sufficiently small? Or is the amount of code this
tests negligible?
(10) In the initial patch you mentioned it's possible to write custom
sampling methods. Do you think a CREATE TABLESAMPLE METHOD,
allowing custom methods implemented as extensions would be useful?
regards
Tomas
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Dec 22, 2014 at 2:38 AM, Tomas Vondra <tv@fuzzy.cz> wrote:
(1) The patch adds a new catalog, but does not bump CATVERSION.
FWIW, this part is managed by the committer when this patch is picked up.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 21/12/14 18:38, Tomas Vondra wrote:
Hi,
On 18.12.2014 13:14, Petr Jelinek wrote:
Hi,
v2 version of this patch is attached.
I did a review of this v2 patch today. I plan to do a bit more testing,
but these are my comments/questions so far:
Thanks for looking at it!
(0) There's a TABLESAMPLE page at the wiki, not updated since 2012:
https://wiki.postgresql.org/wiki/TABLESAMPLE_Implementation
We should either update it or mark it as obsolete I guess. Also,
I'd like to know what's the status regarding the TODO items
mentioned there. Are those still valid with this patch?
I will have to look in more detail over the holidays and update it, but
the general info about table sampling there applies and will apply to
any patch trying to implement it.
Quick look on the mail thread suggest that at least the concerns
mentioned in the mailing list should not apply to this implementation.
And looking at the patch the problem with BERNOULLI sampling should not
exist either as I use completely different implementation for that.
I do also have some issues with joins which I plan to look at but it's
different one (my optimizer code overestimates the number of rows)
(1) The patch adds a new catalog, but does not bump CATVERSION.
I thought this was always done by committer?
(2) The catalog naming (pg_tablesamplemethod) seems a bit awkward,
as it squishes everything into a single chunk. That's inconsistent
with naming of the other catalogs. I think pg_table_sample_method
would be better.
Fair point, but perhaps pg_tablesample_method then as tablesample is
used as single word everywhere including the standard.
(3) There are a few more strange naming decisions, but that's mostly
because of the SQL standard requires that naming. I mean SYSTEM and
BERNOULLI method names, and also the fact that the probability is
specified as 0-100 value, which is inconsistent with other places
(e.g. percentile_cont uses the usual 0-1 probability notion). But
I don't think this can be fixed, that's what the standard says.
Yeah, I don't exactly love that either but what standard says, standard
says.
(4) I noticed there's an interesting extension in SQL Server, which
allows specifying PERCENT or ROWS, so you can saySELECT * FROM table TABLESAMPLE SYSTEM (25 PERCENT);
or
SELECT * FROM table TABLESAMPLE SYSTEM (2500 ROWS);
That seems handy, and it'd make migration from SQL Server easier.
What do you think?
Well doing it exactly this way somewhat kills the extensibility which
was one of the main goals for me - I allow any kind of parameters for
sampling and the handling of those depends solely on the sampling
method. This means that in my approach if you'd want to change what you
are limiting you'd have to write new sampling method and the query would
then look like SELECT * FROM table TABLESAMPLE SYSTEM_ROWLIMIT (2500);
or some such (depending on how you name the sampling method). Or SELECT
* FROM table TABLESAMPLE SYSTEM (2500, 'ROWS'); if we chose to extend
the SYSTEM sampling method, that would be also doable.
The reason for this is that I don't want to really limit too much what
parameters can be send to sampling (I envision even SYSTEM_TIMED
sampling method that will get limit as time interval for example).
(5) I envision a lot of confusion because of the REPEATABLE clause.
With READ COMMITTED, it's not really repeatable because of changes
done by the other users (and maybe things like autovacuum). Shall
we mention this in the documentation?
Yes docs need improvement and this should be mentioned, also code-docs -
I found few places which I forgot to update when changing code and now
have misleading comments.
(6) This seems slightly wrong, because of long/uint32 mismatch:
long seed = PG_GETARG_UINT32(1);
I think uint32 would be more appropriate, no?
Probably, although I need long later in the algorithm anyway.
(7) NITPICKING: I think a 'sample_rate' would be a better name here:
double percent = sampler->percent;
Ok.
(8) NITPICKING: InitSamplingMethod contains a command with ';;'
fcinfo.arg[i] = (Datum) 0;;
Yeah :)
(9) The current regression tests only use the REPEATABLE cases. I
understand queries without this clause are RANDOM, but maybe we
could do something like this:SELECT COUNT(*) BETWEEN 5000 AND 7000 FROM (
SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50)
) foo;Granted, there's still a small probability of false positive, but
maybe that's sufficiently small? Or is the amount of code this
tests negligible?
Well, depending on fillfactor and limit it could be made quite reliable
I think, I also want to add test with VIEW (I think v2 has a bug there)
and perhaps some JOIN.
(10) In the initial patch you mentioned it's possible to write custom
sampling methods. Do you think a CREATE TABLESAMPLE METHOD,
allowing custom methods implemented as extensions would be useful?
Yes, I think so, CREATE/DROP TABLESAMPLE METHOD is on my TODO, but since
that's just simple mechanical work with no hard problems to solve there
I can add it later once we have agreement on the general approach of the
patch (especially in the terms of extensibility).
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 22.12.2014 10:07, Petr Jelinek wrote:
On 21/12/14 18:38, Tomas Vondra wrote:
(1) The patch adds a new catalog, but does not bump CATVERSION.
I thought this was always done by committer?
Right. Sorry for the noise.
(2) The catalog naming (pg_tablesamplemethod) seems a bit awkward,
as it squishes everything into a single chunk. That's inconsistent
with naming of the other catalogs. I think pg_table_sample_method
would be better.Fair point, but perhaps pg_tablesample_method then as tablesample is
used as single word everywhere including the standard.
Sounds good.
(4) I noticed there's an interesting extension in SQL Server, which
allows specifying PERCENT or ROWS, so you can saySELECT * FROM table TABLESAMPLE SYSTEM (25 PERCENT);
or
SELECT * FROM table TABLESAMPLE SYSTEM (2500 ROWS);
That seems handy, and it'd make migration from SQL Server easier.
What do you think?Well doing it exactly this way somewhat kills the extensibility which
was one of the main goals for me - I allow any kind of parameters for
sampling and the handling of those depends solely on the sampling
method. This means that in my approach if you'd want to change what you
are limiting you'd have to write new sampling method and the query would
then look like SELECT * FROM table TABLESAMPLE SYSTEM_ROWLIMIT (2500);
or some such (depending on how you name the sampling method). Or SELECT
* FROM table TABLESAMPLE SYSTEM (2500, 'ROWS'); if we chose to extend
the SYSTEM sampling method, that would be also doable.The reason for this is that I don't want to really limit too much what
parameters can be send to sampling (I envision even SYSTEM_TIMED
sampling method that will get limit as time interval for example).
Good point.
(6) This seems slightly wrong, because of long/uint32 mismatch:
long seed = PG_GETARG_UINT32(1);
I think uint32 would be more appropriate, no?
Probably, although I need long later in the algorithm anyway.
Really? ISTM the sampler_setseed() does not really require long, uint32
would work exactly the same.
(9) The current regression tests only use the REPEATABLE cases. I
understand queries without this clause are RANDOM, but maybe we
could do something like this:SELECT COUNT(*) BETWEEN 5000 AND 7000 FROM (
SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50)
) foo;Granted, there's still a small probability of false positive, but
maybe that's sufficiently small? Or is the amount of code this
tests negligible?Well, depending on fillfactor and limit it could be made quite reliable
I think, I also want to add test with VIEW (I think v2 has a bug there)
and perhaps some JOIN.
OK.
(10) In the initial patch you mentioned it's possible to write custom
sampling methods. Do you think a CREATE TABLESAMPLE METHOD,
allowing custom methods implemented as extensions would be useful?Yes, I think so, CREATE/DROP TABLESAMPLE METHOD is on my TODO, but since
that's just simple mechanical work with no hard problems to solve there
I can add it later once we have agreement on the general approach of the
patch (especially in the terms of extensibility).
OK, good to know.
Tomas
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Dec 18, 2014 at 7:14 AM, Petr Jelinek <petr@2ndquadrant.com> wrote:
Hi,
v2 version of this patch is attached.
a few more tests revealed that passing null as the sample size
argument works, and it shouldn't.
in repeatable it gives an error if i use null as argument but it gives
a syntax error, and it should be a data exception (data exception --
invalid repeat argument in a sample clause) according to the standard
also you need to add CHECK_FOR_INTERRUPTS somewhere, i tried with a
big table and had to wait a long time for it to finish
"""
regression=# select count(1) from tenk1 tablesample system (null);
count
-------
28
(1 row)
regression=# select count(1) from tenk1 tablesample bernoulli (null);
count
-------
0
(1 row)
"""
--
Jaime Casanova www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación
Phone: +593 4 5107566 Cell: +593 987171157
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 22/12/14 20:14, Jaime Casanova wrote:
On Thu, Dec 18, 2014 at 7:14 AM, Petr Jelinek <petr@2ndquadrant.com> wrote:
Hi,
v2 version of this patch is attached.
a few more tests revealed that passing null as the sample size
argument works, and it shouldn't.
Fixed.
in repeatable it gives an error if i use null as argument but it gives
a syntax error, and it should be a data exception (data exception --
invalid repeat argument in a sample clause) according to the standard
Also fixed.
also you need to add CHECK_FOR_INTERRUPTS somewhere, i tried with a
big table and had to wait a long time for it to finish
Ah yeah, I can't rely on CHECK_FOR_INTERRUPTS in ExecScan because it
might take a while to fetch a row if percentage is very small and table
is big... Fixed.
Attached is v3 which besides the fixes mentioned above also includes
changes discussed with Tomas (except the CREATE/DROP TABLESAMPLE
METHOD), fixes for crash with FETCH FIRST and is rebased against current
master.
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
tablesample-v3.patchtext/x-diff; name=tablesample-v3.patchDownload
diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml
index 01d24a5..250ae29 100644
--- a/doc/src/sgml/ref/select.sgml
+++ b/doc/src/sgml/ref/select.sgml
@@ -49,7 +49,7 @@ SELECT [ ALL | DISTINCT [ ON ( <replaceable class="parameter">expression</replac
<phrase>where <replaceable class="parameter">from_item</replaceable> can be one of:</phrase>
- [ ONLY ] <replaceable class="parameter">table_name</replaceable> [ * ] [ [ AS ] <replaceable class="parameter">alias</replaceable> [ ( <replaceable class="parameter">column_alias</replaceable> [, ...] ) ] ]
+ [ ONLY ] <replaceable class="parameter">table_name</replaceable> [ * ] [ TABLESAMPLE <replaceable class="parameter">sampling_method</replaceable> ( <replaceable class="parameter">argument</replaceable> [, ...] ) [ REPEATABLE ( <replaceable class="parameter">seed</replaceable> ) ] ] [ [ AS ] <replaceable class="parameter">alias</replaceable> [ ( <replaceable class="parameter">column_alias</replaceable> [, ...] ) ] ]
[ LATERAL ] ( <replaceable class="parameter">select</replaceable> ) [ AS ] <replaceable class="parameter">alias</replaceable> [ ( <replaceable class="parameter">column_alias</replaceable> [, ...] ) ]
<replaceable class="parameter">with_query_name</replaceable> [ [ AS ] <replaceable class="parameter">alias</replaceable> [ ( <replaceable class="parameter">column_alias</replaceable> [, ...] ) ] ]
[ LATERAL ] <replaceable class="parameter">function_name</replaceable> ( [ <replaceable class="parameter">argument</replaceable> [, ...] ] )
@@ -317,6 +317,38 @@ TABLE [ ONLY ] <replaceable class="parameter">table_name</replaceable> [ * ]
</varlistentry>
<varlistentry>
+ <term>TABLESAMPLE <replaceable class="parameter">sampling_method</replaceable> ( <replaceable class="parameter">argument</replaceable> [, ...] ) [ REPEATABLE ( <replaceable class="parameter">seed</replaceable> ) ]</term>
+ <listitem>
+ <para>
+ Table sample clause after
+ <replaceable class="parameter">table_name</replaceable> indicates that
+ a <replaceable class="parameter">sampling_method</replaceable> should
+ be used to retrieve subset of rows in the table.
+ The <replaceable class="parameter">sampling_method</replaceable> can be
+ one of:
+ <itemizedlist>
+ <listitem>
+ <para><literal>SYSTEM</literal></para>
+ </listitem>
+ <listitem>
+ <para><literal>BERNOULLI</literal></para>
+ </listitem>
+ </itemizedlist>
+ Both of those sampling methods currently accept only single argument
+ which is the percent (floating point from 0 to 100) of the rows to
+ be returned.
+ The <literal>SYSTEM</literal> sampling method does block level
+ sampling with each block having same chance of being selected and
+ returns all rows from each selected block.
+ The <literal>BERNOULLI</literal> scans whole table and returns
+ individual rows with equal probability.
+ The optional numeric parameter <literal>REPEATABLE</literal> is used
+ as random seed for sampling.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><replaceable class="parameter">alias</replaceable></term>
<listitem>
<para>
diff --git a/src/backend/access/Makefile b/src/backend/access/Makefile
index 21721b4..595737c 100644
--- a/src/backend/access/Makefile
+++ b/src/backend/access/Makefile
@@ -8,6 +8,7 @@ subdir = src/backend/access
top_builddir = ../../..
include $(top_builddir)/src/Makefile.global
-SUBDIRS = brin common gin gist hash heap index nbtree rmgrdesc spgist transam
+SUBDIRS = brin common gin gist hash heap index nbtree rmgrdesc spgist \
+ transam tsm
include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/tsm/Makefile b/src/backend/access/tsm/Makefile
new file mode 100644
index 0000000..73bbbd7
--- /dev/null
+++ b/src/backend/access/tsm/Makefile
@@ -0,0 +1,17 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+# Makefile for access/tsm
+#
+# IDENTIFICATION
+# src/backend/access/tsm/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/backend/access/tsm
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS = tsm_system.o tsm_bernoulli.o
+
+include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/tsm/tsm_bernoulli.c b/src/backend/access/tsm/tsm_bernoulli.c
new file mode 100644
index 0000000..fd87fab
--- /dev/null
+++ b/src/backend/access/tsm/tsm_bernoulli.c
@@ -0,0 +1,200 @@
+/*-------------------------------------------------------------------------
+ *
+ * tsm_bernoulli.c
+ * interface routines for BERNOULLI table sample method
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/access/tsm/tsm_bernoulli.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "fmgr.h"
+
+#include "access/tsm_bernoulli.h"
+
+#include "nodes/execnodes.h"
+#include "nodes/relation.h"
+#include "optimizer/clauses.h"
+#include "storage/bufmgr.h"
+#include "utils/sampling.h"
+
+
+/* State */
+typedef struct
+{
+ uint32 seed; /* random seed */
+ BlockNumber tblocks; /* total blocks in relation */
+ BlockNumber blockno; /* current block */
+ float4 samplesize; /* percentage of tuples to return (0.0-1.0) */
+ OffsetNumber lt; /* last tuple returned from current block */
+} BernoulliSamplerData;
+
+/*
+ * Initialize the state.
+ */
+Datum
+tsm_bernoulli_init(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ uint32 seed = PG_GETARG_UINT32(1);
+ float4 percent = PG_ARGISNULL(2) ? -1 : PG_GETARG_FLOAT4(2);
+ Relation rel = scanstate->ss.ss_currentRelation;
+ BernoulliSamplerData *sampler;
+
+ if (percent < 0 || percent > 100)
+ ereport(ERROR,
+ (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+ errmsg("invalid sample size"),
+ errhint("Sample size must be numeric value between 0 and 100 (inclusive).")));
+
+ sampler = palloc0(sizeof(BernoulliSamplerData));
+
+ /* Remember initial values for reinit */
+ sampler->seed = seed;
+ sampler->tblocks = RelationGetNumberOfBlocks(rel);
+ sampler->blockno = InvalidBlockNumber;
+ sampler->samplesize = percent / 100;
+ sampler->lt = InvalidOffsetNumber;
+
+ sampler_setseed(seed);
+
+ scanstate->tsmdata = (void *) sampler;
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Get next block number to read or InvalidBlockNumber if we are at the
+ * end of the relation.
+ */
+Datum
+tsm_bernoulli_nextblock(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ BernoulliSamplerData *sampler = (BernoulliSamplerData *) scanstate->tsmdata;
+
+ if (sampler->blockno == InvalidBlockNumber)
+ sampler->blockno = 0;
+ else if (++sampler->blockno >= sampler->tblocks)
+ PG_RETURN_UINT32(InvalidBlockNumber);
+
+ PG_RETURN_UINT32(sampler->blockno);
+}
+
+/*
+ * Get next tuple from current block.
+ *
+ * This method implements the main logic in bernoulli sampling.
+ * The algorithm simply generates new random number (in 0.0-1.0 range) and if
+ * it falls within user specified probability (in the same range) return the
+ * tuple offset.
+ *
+ * If we reach end of the block return InvalidOffsetNumber which tells
+ * SampleScan to go to next block.
+ */
+Datum
+tsm_bernoulli_nexttuple(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ OffsetNumber maxoffset = PG_GETARG_UINT16(2);
+ BernoulliSamplerData *sampler = (BernoulliSamplerData *) scanstate->tsmdata;
+ OffsetNumber tupoffset = sampler->lt;
+ double samplesize = sampler->samplesize;
+
+ if (tupoffset == InvalidOffsetNumber)
+ tupoffset = FirstOffsetNumber;
+ else
+ tupoffset++;
+
+ /* Every tuple has percent chance of being returned */
+ while (sampler_random_fract() > samplesize)
+ {
+ tupoffset++;
+
+ if (tupoffset > maxoffset)
+ break;
+ }
+
+ if (tupoffset > maxoffset)
+ /* Tell SampleScan that we want next block. */
+ tupoffset = InvalidOffsetNumber;
+
+ sampler->lt = tupoffset;
+
+ PG_RETURN_UINT16(tupoffset);
+}
+
+/*
+ * Cleanup method.
+ */
+Datum
+tsm_bernoulli_end(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+
+ pfree(scanstate->tsmdata);
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Reset state (called by ReScan).
+ */
+Datum
+tsm_bernoulli_reset(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ BernoulliSamplerData *sampler = (BernoulliSamplerData *) scanstate->tsmdata;
+
+ sampler->blockno = InvalidBlockNumber;
+ sampler->lt = InvalidOffsetNumber;
+ sampler_setseed(sampler->seed);
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Costing function.
+ */
+Datum
+tsm_bernoulli_cost(PG_FUNCTION_ARGS)
+{
+ PlannerInfo *root = (PlannerInfo *) PG_GETARG_POINTER(0);
+ SamplePath *path = (SamplePath *) PG_GETARG_POINTER(1);
+ RelOptInfo *baserel = (RelOptInfo *) PG_GETARG_POINTER(2);
+ BlockNumber *pages = (BlockNumber *) PG_GETARG_POINTER(3);
+ double *tuples = (double *) PG_GETARG_POINTER(4);
+ List *args = path->tsmargs;
+ Node *pctnode;
+ float4 samplesize;
+
+ SamplerAccessStrategy *strategy = (SamplerAccessStrategy *) PG_GETARG_POINTER(5);
+
+ *strategy = SAS_SEQUENTIAL;
+ *pages = baserel->pages;
+
+ pctnode = linitial(args);
+ pctnode = estimate_expression_value(root, pctnode);
+
+ if (IsA(pctnode, RelabelType))
+ pctnode = (Node *) ((RelabelType *) pctnode)->arg;
+
+ if (!IsA(pctnode, Const))
+ {
+ *tuples = baserel->tuples * 0.1;
+ PG_RETURN_VOID();
+ }
+
+ samplesize = DatumGetFloat4(((Const *) pctnode)->constvalue);
+ samplesize /= 100.0;
+
+ *tuples = baserel->tuples * samplesize;
+
+ PG_RETURN_VOID();
+}
diff --git a/src/backend/access/tsm/tsm_system.c b/src/backend/access/tsm/tsm_system.c
new file mode 100644
index 0000000..dff7765
--- /dev/null
+++ b/src/backend/access/tsm/tsm_system.c
@@ -0,0 +1,187 @@
+/*-------------------------------------------------------------------------
+ *
+ * tsm_system.c
+ * interface routines for system table sample method
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/access/tsm/tsm_system.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "fmgr.h"
+
+#include "access/tsm_system.h"
+
+#include "nodes/execnodes.h"
+#include "nodes/relation.h"
+#include "optimizer/clauses.h"
+#include "storage/bufmgr.h"
+#include "utils/sampling.h"
+
+
+/*
+ * State
+ */
+typedef struct
+{
+ BlockSamplerData bs;
+ uint32 seed; /* random seed */
+ BlockNumber tblocks; /* total blocks in relation */
+ int samplesize; /* number of blocks to return */
+ OffsetNumber lt; /* last tuple returned from current block */
+} SystemSamplerData;
+
+
+/*
+ * Initializes the state.
+ */
+Datum
+tsm_system_init(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ uint32 seed = PG_GETARG_UINT32(1);
+ float4 percent = PG_ARGISNULL(2) ? -1 : PG_GETARG_FLOAT4(2);
+ SystemSamplerData *sampler;
+
+ if (percent < 0 || percent > 100)
+ ereport(ERROR,
+ (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+ errmsg("invalid sample size"),
+ errhint("Sample size must be numeric value between 0 and 100 (inclusive).")));
+
+ sampler = palloc0(sizeof(SystemSamplerData));
+
+ /* Remember initial values for reinit */
+ sampler->seed = seed;
+ sampler->tblocks = RelationGetNumberOfBlocks(scanstate->ss.ss_currentRelation);
+ sampler->samplesize = 1 + (int) (sampler->tblocks * (percent / 100.0));
+ sampler->lt = InvalidOffsetNumber;
+
+ sampler_setseed(seed);
+ BlockSampler_Init(&sampler->bs, sampler->tblocks, sampler->samplesize);
+
+ scanstate->tsmdata = (void *) sampler;
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Get next block number or InvalidBlockNumber when we're done.
+ *
+ * Uses the same logic as VACUUM for picking the random blocks.
+ */
+Datum
+tsm_system_nextblock(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ SystemSamplerData *sampler = (SystemSamplerData *) scanstate->tsmdata;
+ BlockNumber blockno;
+
+ if (!BlockSampler_HasMore(&sampler->bs))
+ PG_RETURN_UINT32(InvalidBlockNumber);
+
+ blockno = BlockSampler_Next(&sampler->bs);
+
+ PG_RETURN_UINT32(blockno);
+}
+
+/*
+ * Get next tuple offset in current block or InvalidOffsetNumber if we are done
+ * with this block.
+ */
+Datum
+tsm_system_nexttuple(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ OffsetNumber maxoffset = PG_GETARG_UINT16(2);
+ SystemSamplerData *sampler = (SystemSamplerData *) scanstate->tsmdata;
+ OffsetNumber tupoffset = sampler->lt;
+
+ if (tupoffset == InvalidOffsetNumber)
+ tupoffset = FirstOffsetNumber;
+ else
+ tupoffset++;
+
+ if (tupoffset > maxoffset)
+ tupoffset = InvalidOffsetNumber;
+
+ sampler->lt = tupoffset;
+
+ PG_RETURN_UINT16(tupoffset);
+}
+
+/*
+ * Cleanup method.
+ */
+Datum
+tsm_system_end(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+
+ pfree(scanstate->tsmdata);
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Reset state (called by ReScan).
+ */
+Datum
+tsm_system_reset(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ SystemSamplerData *sampler = (SystemSamplerData *) scanstate->tsmdata;
+
+ sampler->lt = InvalidOffsetNumber;
+ sampler_setseed(sampler->seed);
+ BlockSampler_Init(&sampler->bs, sampler->tblocks, sampler->samplesize);
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Costing function.
+ */
+Datum
+tsm_system_cost(PG_FUNCTION_ARGS)
+{
+ PlannerInfo *root = (PlannerInfo *) PG_GETARG_POINTER(0);
+ SamplePath *path = (SamplePath *) PG_GETARG_POINTER(1);
+ RelOptInfo *baserel = (RelOptInfo *) PG_GETARG_POINTER(2);
+ BlockNumber *pages = (BlockNumber *) PG_GETARG_POINTER(3);
+ double *tuples = (double *) PG_GETARG_POINTER(4);
+ List *args = path->tsmargs;
+ Node *pctnode;
+ float4 percent;
+
+ SamplerAccessStrategy *strategy = (SamplerAccessStrategy *) PG_GETARG_POINTER(5);
+
+ *strategy = SAS_RANDOM;
+
+ pctnode = linitial(args);
+ pctnode = estimate_expression_value(root, pctnode);
+
+ if (IsA(pctnode, RelabelType))
+ pctnode = (Node *) ((RelabelType *) pctnode)->arg;
+
+ if (!IsA(pctnode, Const))
+ {
+ *pages = baserel->pages * 0.1;
+ *tuples = baserel->tuples * 0.1;
+ PG_RETURN_VOID();
+ }
+
+ percent = DatumGetFloat4(((Const *) pctnode)->constvalue);
+ percent /= 100.0;
+
+ *pages = baserel->pages * percent;
+ *tuples = baserel->tuples * percent;
+
+ PG_RETURN_VOID();
+}
diff --git a/src/backend/catalog/Makefile b/src/backend/catalog/Makefile
index a403c64..34db3e6 100644
--- a/src/backend/catalog/Makefile
+++ b/src/backend/catalog/Makefile
@@ -39,7 +39,7 @@ POSTGRES_BKI_SRCS = $(addprefix $(top_srcdir)/src/include/catalog/,\
pg_ts_config.h pg_ts_config_map.h pg_ts_dict.h \
pg_ts_parser.h pg_ts_template.h pg_extension.h \
pg_foreign_data_wrapper.h pg_foreign_server.h pg_user_mapping.h \
- pg_foreign_table.h pg_policy.h \
+ pg_foreign_table.h pg_policy.h pg_tablesample_method.h \
pg_default_acl.h pg_seclabel.h pg_shseclabel.h pg_collation.h pg_range.h \
toasting.h indexing.h \
)
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 732ab22..4b011c7 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -50,23 +50,13 @@
#include "utils/lsyscache.h"
#include "utils/memutils.h"
#include "utils/pg_rusage.h"
+#include "utils/sampling.h"
#include "utils/sortsupport.h"
#include "utils/syscache.h"
#include "utils/timestamp.h"
#include "utils/tqual.h"
-/* Data structure for Algorithm S from Knuth 3.4.2 */
-typedef struct
-{
- BlockNumber N; /* number of blocks, known in advance */
- int n; /* desired sample size */
- BlockNumber t; /* current block number */
- int m; /* blocks selected so far */
-} BlockSamplerData;
-
-typedef BlockSamplerData *BlockSampler;
-
/* Per-index data for ANALYZE */
typedef struct AnlIndexData
{
@@ -88,10 +78,6 @@ static BufferAccessStrategy vac_strategy;
static void do_analyze_rel(Relation onerel, VacuumStmt *vacstmt,
AcquireSampleRowsFunc acquirefunc, BlockNumber relpages,
bool inh, bool in_outer_xact, int elevel);
-static void BlockSampler_Init(BlockSampler bs, BlockNumber nblocks,
- int samplesize);
-static bool BlockSampler_HasMore(BlockSampler bs);
-static BlockNumber BlockSampler_Next(BlockSampler bs);
static void compute_index_stats(Relation onerel, double totalrows,
AnlIndexData *indexdata, int nindexes,
HeapTuple *rows, int numrows,
@@ -947,94 +933,6 @@ examine_attribute(Relation onerel, int attnum, Node *index_expr)
}
/*
- * BlockSampler_Init -- prepare for random sampling of blocknumbers
- *
- * BlockSampler is used for stage one of our new two-stage tuple
- * sampling mechanism as discussed on pgsql-hackers 2004-04-02 (subject
- * "Large DB"). It selects a random sample of samplesize blocks out of
- * the nblocks blocks in the table. If the table has less than
- * samplesize blocks, all blocks are selected.
- *
- * Since we know the total number of blocks in advance, we can use the
- * straightforward Algorithm S from Knuth 3.4.2, rather than Vitter's
- * algorithm.
- */
-static void
-BlockSampler_Init(BlockSampler bs, BlockNumber nblocks, int samplesize)
-{
- bs->N = nblocks; /* measured table size */
-
- /*
- * If we decide to reduce samplesize for tables that have less or not much
- * more than samplesize blocks, here is the place to do it.
- */
- bs->n = samplesize;
- bs->t = 0; /* blocks scanned so far */
- bs->m = 0; /* blocks selected so far */
-}
-
-static bool
-BlockSampler_HasMore(BlockSampler bs)
-{
- return (bs->t < bs->N) && (bs->m < bs->n);
-}
-
-static BlockNumber
-BlockSampler_Next(BlockSampler bs)
-{
- BlockNumber K = bs->N - bs->t; /* remaining blocks */
- int k = bs->n - bs->m; /* blocks still to sample */
- double p; /* probability to skip block */
- double V; /* random */
-
- Assert(BlockSampler_HasMore(bs)); /* hence K > 0 and k > 0 */
-
- if ((BlockNumber) k >= K)
- {
- /* need all the rest */
- bs->m++;
- return bs->t++;
- }
-
- /*----------
- * It is not obvious that this code matches Knuth's Algorithm S.
- * Knuth says to skip the current block with probability 1 - k/K.
- * If we are to skip, we should advance t (hence decrease K), and
- * repeat the same probabilistic test for the next block. The naive
- * implementation thus requires an anl_random_fract() call for each block
- * number. But we can reduce this to one anl_random_fract() call per
- * selected block, by noting that each time the while-test succeeds,
- * we can reinterpret V as a uniform random number in the range 0 to p.
- * Therefore, instead of choosing a new V, we just adjust p to be
- * the appropriate fraction of its former value, and our next loop
- * makes the appropriate probabilistic test.
- *
- * We have initially K > k > 0. If the loop reduces K to equal k,
- * the next while-test must fail since p will become exactly zero
- * (we assume there will not be roundoff error in the division).
- * (Note: Knuth suggests a "<=" loop condition, but we use "<" just
- * to be doubly sure about roundoff error.) Therefore K cannot become
- * less than k, which means that we cannot fail to select enough blocks.
- *----------
- */
- V = anl_random_fract();
- p = 1.0 - (double) k / (double) K;
- while (V < p)
- {
- /* skip */
- bs->t++;
- K--; /* keep K == N - t */
-
- /* adjust p to be new cutoff point in reduced range */
- p *= 1.0 - (double) k / (double) K;
- }
-
- /* select */
- bs->m++;
- return bs->t++;
-}
-
-/*
* acquire_sample_rows -- acquire a random sample of rows from the table
*
* Selected rows are returned in the caller-allocated array rows[], which
@@ -1089,6 +987,8 @@ acquire_sample_rows(Relation onerel, int elevel,
/* Need a cutoff xmin for HeapTupleSatisfiesVacuum */
OldestXmin = GetOldestXmin(onerel, true);
+ /* Seed the sampler random number generator */
+ sampler_setseed(random());
/* Prepare for sampling block numbers */
BlockSampler_Init(&bs, totalblocks, targrows);
/* Prepare for sampling rows */
@@ -1249,7 +1149,7 @@ acquire_sample_rows(Relation onerel, int elevel,
* Found a suitable tuple, so save it, replacing one
* old tuple at random
*/
- int k = (int) (targrows * anl_random_fract());
+ int k = (int) (targrows * sampler_random_fract());
Assert(k >= 0 && k < targrows);
heap_freetuple(rows[k]);
@@ -1308,13 +1208,6 @@ acquire_sample_rows(Relation onerel, int elevel,
return numrows;
}
-/* Select a random value R uniformly distributed in (0 - 1) */
-double
-anl_random_fract(void)
-{
- return ((double) random() + 1) / ((double) MAX_RANDOM_VALUE + 2);
-}
-
/*
* These two routines embody Algorithm Z from "Random sampling with a
* reservoir" by Jeffrey S. Vitter, in ACM Trans. Math. Softw. 11, 1
@@ -1333,7 +1226,7 @@ double
anl_init_selection_state(int n)
{
/* Initial value of W (for use when Algorithm Z is first applied) */
- return exp(-log(anl_random_fract()) / n);
+ return exp(-log(sampler_random_fract()) / n);
}
double
@@ -1348,7 +1241,7 @@ anl_get_next_S(double t, int n, double *stateptr)
double V,
quot;
- V = anl_random_fract(); /* Generate V */
+ V = sampler_random_fract(); /* Generate V */
S = 0;
t += 1;
/* Note: "num" in Vitter's code is always equal to t - n */
@@ -1380,7 +1273,7 @@ anl_get_next_S(double t, int n, double *stateptr)
tmp;
/* Generate U and X */
- U = anl_random_fract();
+ U = sampler_random_fract();
X = t * (W - 1.0);
S = floor(X); /* S is tentatively set to floor(X) */
/* Test if U <= h(S)/cg(X) in the manner of (6.3) */
@@ -1409,7 +1302,7 @@ anl_get_next_S(double t, int n, double *stateptr)
y *= numer / denom;
denom -= 1;
}
- W = exp(-log(anl_random_fract()) / n); /* Generate W in advance */
+ W = exp(-log(sampler_random_fract()) / n); /* Generate W in advance */
if (exp(log(y) / n) <= (t + X) / t)
break;
}
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 064f880..d5d703d 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -724,6 +724,7 @@ ExplainPreScanNode(PlanState *planstate, Bitmapset **rels_used)
case T_WorkTableScan:
case T_ForeignScan:
case T_CustomScan:
+ case T_SampleScan:
*rels_used = bms_add_member(*rels_used,
((Scan *) plan)->scanrelid);
break;
@@ -950,6 +951,9 @@ ExplainNode(PlanState *planstate, List *ancestors,
else
pname = sname;
break;
+ case T_SampleScan:
+ pname = sname = "Sample Scan";
+ break;
case T_Material:
pname = sname = "Materialize";
break;
@@ -1067,6 +1071,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
case T_WorkTableScan:
case T_ForeignScan:
case T_CustomScan:
+ case T_SampleScan:
ExplainScanTarget((Scan *) plan, es);
break;
case T_IndexScan:
@@ -1319,6 +1324,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
case T_CteScan:
case T_WorkTableScan:
case T_SubqueryScan:
+ case T_SampleScan:
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
@@ -2147,6 +2153,7 @@ ExplainTargetRel(Plan *plan, Index rti, ExplainState *es)
case T_TidScan:
case T_ForeignScan:
case T_CustomScan:
+ case T_SampleScan:
case T_ModifyTable:
/* Assert it's on a real relation */
Assert(rte->rtekind == RTE_RELATION);
diff --git a/src/backend/executor/Makefile b/src/backend/executor/Makefile
index af707b0..75f799c 100644
--- a/src/backend/executor/Makefile
+++ b/src/backend/executor/Makefile
@@ -21,7 +21,7 @@ OBJS = execAmi.o execCurrent.o execGrouping.o execJunk.o execMain.o \
nodeLimit.o nodeLockRows.o \
nodeMaterial.o nodeMergeAppend.o nodeMergejoin.o nodeModifyTable.o \
nodeNestloop.o nodeFunctionscan.o nodeRecursiveunion.o nodeResult.o \
- nodeSeqscan.o nodeSetOp.o nodeSort.o nodeUnique.o \
+ nodeSamplescan.o nodeSeqscan.o nodeSetOp.o nodeSort.o nodeUnique.o \
nodeValuesscan.o nodeCtescan.o nodeWorktablescan.o \
nodeGroup.o nodeSubplan.o nodeSubqueryscan.o nodeTidscan.o \
nodeForeignscan.o nodeWindowAgg.o tstoreReceiver.o spi.o
diff --git a/src/backend/executor/execAmi.c b/src/backend/executor/execAmi.c
index 7027d7f..1826059 100644
--- a/src/backend/executor/execAmi.c
+++ b/src/backend/executor/execAmi.c
@@ -39,6 +39,7 @@
#include "executor/nodeNestloop.h"
#include "executor/nodeRecursiveunion.h"
#include "executor/nodeResult.h"
+#include "executor/nodeSamplescan.h"
#include "executor/nodeSeqscan.h"
#include "executor/nodeSetOp.h"
#include "executor/nodeSort.h"
@@ -155,6 +156,10 @@ ExecReScan(PlanState *node)
ExecReScanSeqScan((SeqScanState *) node);
break;
+ case T_SampleScanState:
+ ExecReScanSampleScan((SampleScanState *) node);
+ break;
+
case T_IndexScanState:
ExecReScanIndexScan((IndexScanState *) node);
break;
@@ -480,6 +485,9 @@ ExecSupportsBackwardScan(Plan *node)
}
return false;
+ case T_SampleScan:
+ return false;
+
case T_Material:
case T_Sort:
/* these don't evaluate tlist */
diff --git a/src/backend/executor/execCurrent.c b/src/backend/executor/execCurrent.c
index d5079ef..613f799 100644
--- a/src/backend/executor/execCurrent.c
+++ b/src/backend/executor/execCurrent.c
@@ -261,6 +261,7 @@ search_plan_tree(PlanState *node, Oid table_oid)
* Relation scan nodes can all be treated alike
*/
case T_SeqScanState:
+ case T_SampleScanState:
case T_IndexScanState:
case T_IndexOnlyScanState:
case T_BitmapHeapScanState:
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index e27c062..a1cba97 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -102,6 +102,7 @@
#include "executor/nodeNestloop.h"
#include "executor/nodeRecursiveunion.h"
#include "executor/nodeResult.h"
+#include "executor/nodeSamplescan.h"
#include "executor/nodeSeqscan.h"
#include "executor/nodeSetOp.h"
#include "executor/nodeSort.h"
@@ -190,6 +191,11 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
estate, eflags);
break;
+ case T_SampleScan:
+ result = (PlanState *) ExecInitSampleScan((SampleScan *) node,
+ estate, eflags);
+ break;
+
case T_IndexScan:
result = (PlanState *) ExecInitIndexScan((IndexScan *) node,
estate, eflags);
@@ -406,6 +412,10 @@ ExecProcNode(PlanState *node)
result = ExecSeqScan((SeqScanState *) node);
break;
+ case T_SampleScanState:
+ result = ExecSampleScan((SampleScanState *) node);
+ break;
+
case T_IndexScanState:
result = ExecIndexScan((IndexScanState *) node);
break;
@@ -644,6 +654,10 @@ ExecEndNode(PlanState *node)
ExecEndSeqScan((SeqScanState *) node);
break;
+ case T_SampleScanState:
+ ExecEndSampleScan((SampleScanState *) node);
+ break;
+
case T_IndexScanState:
ExecEndIndexScan((IndexScanState *) node);
break;
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
new file mode 100644
index 0000000..27f5f05
--- /dev/null
+++ b/src/backend/executor/nodeSamplescan.c
@@ -0,0 +1,404 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeSamplescan.c
+ * Support routines for sample scans of relations (table sampling).
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ * src/backend/executor/nodeSamplescan.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/relscan.h"
+#include "catalog/pg_tablesample_method.h"
+#include "executor/executor.h"
+#include "executor/nodeSamplescan.h"
+#include "miscadmin.h"
+#include "parser/parsetree.h"
+#include "storage/bufmgr.h"
+#include "utils/rel.h"
+#include "utils/syscache.h"
+#include "utils/tqual.h"
+
+static void InitScanRelation(SampleScanState *node, EState *estate, int eflags);
+static TupleTableSlot *SampleNext(SampleScanState *node);
+
+/*
+ * Initialize the sampling method - loads function info and
+ * calls the tsminit function.
+ *
+ * We need special handling for this because the tsminit function
+ * is allowed to take optional additional arguments.
+ */
+static void
+InitSamplingMethod(SampleScanState *scanstate, TableSampleClause *tablesample)
+{
+ FunctionCallInfoData fcinfo;
+ int i;
+ List *args = tablesample->args;
+ ListCell *arg;
+ ExprContext *econtext = scanstate->ss.ps.ps_ExprContext;
+
+ /* Load functions */
+ fmgr_info(tablesample->tsminit, &(scanstate->tsminit));
+ fmgr_info(tablesample->tsmnextblock, &(scanstate->tsmnextblock));
+ fmgr_info(tablesample->tsmnexttuple, &(scanstate->tsmnexttuple));
+ fmgr_info(tablesample->tsmend, &(scanstate->tsmend));
+ fmgr_info(tablesample->tsmreset, &(scanstate->tsmreset));
+
+ InitFunctionCallInfoData(fcinfo, &scanstate->tsminit,
+ list_length(args) + 2,
+ InvalidOid, NULL, NULL);
+
+ /* First arg is always SampleScanState */
+ fcinfo.arg[0] = PointerGetDatum(scanstate);
+ fcinfo.argnull[0] = false;
+
+ /*
+ * Second arg is always REPEATABLE
+ * When tablesample->repeatable is NULL then REPEATABLE clause was not
+ * specified.
+ * When specified, the expression cannot evaluate to NULL.
+ */
+ if (tablesample->repeatable)
+ {
+ ExprState *argstate = ExecInitExpr((Expr *) tablesample->repeatable,
+ (PlanState *) scanstate);
+ fcinfo.arg[1] = ExecEvalExpr(argstate, econtext,
+ &fcinfo.argnull[1], NULL);
+ if (fcinfo.argnull[1])
+ ereport(ERROR,
+ (errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED),
+ errmsg("REPEATABLE clause cannot be NULL")));
+ }
+ else
+ {
+ fcinfo.arg[1] = UInt32GetDatum(random());
+ fcinfo.argnull[1] = false;
+ }
+
+
+ i = 2;
+ foreach(arg, args)
+ {
+ Expr *argexpr = (Expr *) lfirst(arg);
+ ExprState *argstate = ExecInitExpr(argexpr, (PlanState *) scanstate);
+
+ if (argstate == NULL)
+ {
+ fcinfo.argnull[i] = true;
+ fcinfo.arg[i] = (Datum) 0;;
+ }
+
+ fcinfo.arg[i] = ExecEvalExpr(argstate, econtext,
+ &fcinfo.argnull[i], NULL);
+ i++;
+ }
+ Assert(i == fcinfo.nargs);
+
+ (void) FunctionCallInvoke(&fcinfo);
+}
+
+
+/* ----------------------------------------------------------------
+ * Scan Support
+ * ----------------------------------------------------------------
+ */
+
+/* ----------------------------------------------------------------
+ * SampleNext
+ *
+ * This is a workhorse for ExecSampleScan
+ * ----------------------------------------------------------------
+ */
+static TupleTableSlot *
+SampleNext(SampleScanState *node)
+{
+ EState *estate;
+ TupleTableSlot *slot;
+ BlockNumber blockno = InvalidBlockNumber;
+ Snapshot snapshot;
+ Relation relation;
+ bool found = false;
+ bool retry = false;
+ OffsetNumber tupoffset, maxoffset;
+ Buffer buffer;
+ Page page;
+ HeapTuple tuple = &(node->tup);
+
+ /*
+ * get information from the estate and scan state
+ */
+ estate = node->ss.ps.state;
+ snapshot = estate->es_snapshot;
+ slot = node->ss.ss_ScanTupleSlot;
+ relation = node->ss.ss_currentRelation;
+ buffer = node->openbuffer;
+
+ if (BufferIsValid(buffer))
+ {
+ blockno = BufferGetBlockNumber(buffer);
+ page = BufferGetPage(buffer);
+ maxoffset = PageGetMaxOffsetNumber(page);
+ }
+
+ /*
+ * get the next tuple from the table
+ */
+ for (;;)
+ {
+ ItemId itemid;
+
+ CHECK_FOR_INTERRUPTS();
+
+ /* Load next block if needed. */
+ if (!BufferIsValid(buffer))
+ {
+ blockno = DatumGetInt32(FunctionCall2(&node->tsmnextblock,
+ PointerGetDatum(node),
+ BoolGetDatum(retry)));
+ /* No more blocks to fetch */
+ if (!BlockNumberIsValid(blockno))
+ break;
+
+ buffer = ReadBufferExtended(relation, MAIN_FORKNUM, blockno,
+ RBM_NORMAL, NULL);
+ LockBuffer(buffer, BUFFER_LOCK_SHARE);
+
+ node->openbuffer = buffer;
+ page = BufferGetPage(buffer);
+ maxoffset = PageGetMaxOffsetNumber(page);
+ }
+
+ tupoffset = DatumGetUInt16(FunctionCall4(&node->tsmnexttuple,
+ PointerGetDatum(node),
+ UInt32GetDatum(blockno),
+ UInt16GetDatum(maxoffset),
+ BoolGetDatum(retry)));
+ /* Go to next block. */
+ if (!OffsetNumberIsValid(tupoffset))
+ {
+ UnlockReleaseBuffer(buffer);
+ node->openbuffer = buffer = InvalidBuffer;
+ continue;
+ }
+ retry = true;
+
+ /* Skip invalid tuple pointers. */
+ itemid = PageGetItemId(page, tupoffset);
+ if (!ItemIdIsNormal(itemid))
+ continue;
+
+ tuple->t_tableOid = RelationGetRelid(relation);
+ tuple->t_data = (HeapTupleHeader) PageGetItem(page, itemid);
+ tuple->t_len = ItemIdGetLength(itemid);
+ ItemPointerSet(&tuple->t_self, blockno, tupoffset);
+
+ /* Found visible tuple, return it. */
+ if (HeapTupleSatisfiesVisibility(tuple, snapshot, buffer))
+ {
+ found = true;
+ break;
+ }
+ }
+
+ if (found)
+ ExecStoreTuple(tuple, /* tuple to store */
+ slot, /* slot to store in */
+ buffer, /* buffer associated with this tuple */
+ false); /* don't pfree this pointer */
+ else
+ ExecClearTuple(slot);
+
+ return slot;
+}
+
+/*
+ * SampleRecheck -- access method routine to recheck a tuple in EvalPlanQual
+ */
+static bool
+SampleRecheck(SampleScanState *node, TupleTableSlot *slot)
+{
+ /* No need to recheck for SampleScan */
+ return true;
+}
+
+/* ----------------------------------------------------------------
+ * ExecSampleScan(node)
+ *
+ * Scans the relation using the sampling method and returns
+ * the next qualifying tuple.
+ * We call the ExecScan() routine and pass it the appropriate
+ * access method functions.
+ * ----------------------------------------------------------------
+ */
+TupleTableSlot *
+ExecSampleScan(SampleScanState *node)
+{
+ return ExecScan((ScanState *) node,
+ (ExecScanAccessMtd) SampleNext,
+ (ExecScanRecheckMtd) SampleRecheck);
+}
+
+/* ----------------------------------------------------------------
+ * InitScanRelation
+ *
+ * Set up to access the scan relation.
+ * ----------------------------------------------------------------
+ */
+static void
+InitScanRelation(SampleScanState *node, EState *estate, int eflags)
+{
+ Relation currentRelation;
+
+ /*
+ * get the relation object id from the relid'th entry in the range table,
+ * open that relation and acquire appropriate lock on it.
+ */
+ currentRelation = ExecOpenScanRelation(estate,
+ ((SampleScan *) node->ss.ps.plan)->scanrelid,
+ eflags);
+
+ node->ss.ss_currentRelation = currentRelation;
+ node->ss.ss_currentScanDesc = NULL;
+
+ /* and report the scan tuple slot's rowtype */
+ ExecAssignScanType(&node->ss, RelationGetDescr(currentRelation));
+}
+
+
+/* ----------------------------------------------------------------
+ * ExecInitSampleScan
+ * ----------------------------------------------------------------
+ */
+SampleScanState *
+ExecInitSampleScan(SampleScan *node, EState *estate, int eflags)
+{
+ SampleScanState *scanstate;
+ RangeTblEntry *rte = rt_fetch(node->scanrelid,
+ estate->es_range_table);
+
+ Assert(outerPlan(node) == NULL);
+ Assert(innerPlan(node) == NULL);
+ Assert(rte->tablesample != NULL);
+
+ /*
+ * create state structure
+ */
+ scanstate = makeNode(SampleScanState);
+ scanstate->ss.ps.plan = (Plan *) node;
+ scanstate->ss.ps.state = estate;
+
+ /*
+ * Miscellaneous initialization
+ *
+ * create expression context for node
+ */
+ ExecAssignExprContext(estate, &scanstate->ss.ps);
+
+ /*
+ * initialize child expressions
+ */
+ scanstate->ss.ps.targetlist = (List *)
+ ExecInitExpr((Expr *) node->plan.targetlist,
+ (PlanState *) scanstate);
+ scanstate->ss.ps.qual = (List *)
+ ExecInitExpr((Expr *) node->plan.qual,
+ (PlanState *) scanstate);
+
+ /*
+ * tuple table initialization
+ */
+ ExecInitResultTupleSlot(estate, &scanstate->ss.ps);
+ ExecInitScanTupleSlot(estate, &scanstate->ss);
+
+ /*
+ * initialize scan relation
+ */
+ InitScanRelation(scanstate, estate, eflags);
+
+ scanstate->ss.ps.ps_TupFromTlist = false;
+
+ /*
+ * Initialize result tuple type and projection info.
+ */
+ ExecAssignResultTypeFromTL(&scanstate->ss.ps);
+ ExecAssignScanProjectionInfo(&scanstate->ss);
+
+ scanstate->openbuffer = InvalidBuffer;
+
+ InitSamplingMethod(scanstate, rte->tablesample);
+
+ return scanstate;
+}
+
+/* ----------------------------------------------------------------
+ * ExecEndSampleScan
+ *
+ * frees any storage allocated through C routines.
+ * ----------------------------------------------------------------
+ */
+void
+ExecEndSampleScan(SampleScanState *node)
+{
+ /*
+ * Tell sampling function that we finished thes can.
+ */
+ FunctionCall1(&node->tsmend, PointerGetDatum(node));
+
+ if (BufferIsValid(node->openbuffer))
+ {
+ UnlockReleaseBuffer(node->openbuffer);
+ node->openbuffer = InvalidBuffer;
+ }
+
+ /*
+ * Free the exprcontext
+ */
+ ExecFreeExprContext(&node->ss.ps);
+
+ /*
+ * clean out the tuple table
+ */
+ ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
+
+ /*
+ * close the heap relation.
+ */
+ ExecCloseScanRelation(node->ss.ss_currentRelation);
+}
+
+/* ----------------------------------------------------------------
+ * Join Support
+ * ----------------------------------------------------------------
+ */
+
+/* ----------------------------------------------------------------
+ * ExecReScanSampleScan
+ *
+ * Rescans the relation.
+ *
+ * ----------------------------------------------------------------
+ */
+void
+ExecReScanSampleScan(SampleScanState *scanstate)
+{
+ if (BufferIsValid(scanstate->openbuffer))
+ {
+ UnlockReleaseBuffer(scanstate->openbuffer);
+ scanstate->openbuffer = InvalidBuffer;
+ }
+
+ /*
+ * Tell sampling function to reset its state for rescan.
+ */
+ FunctionCall1(&scanstate->tsmreset, PointerGetDatum(scanstate));
+
+ ExecScanReScan(&scanstate->ss);
+}
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 491e4db..d69cc4e 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -628,6 +628,22 @@ _copyCustomScan(const CustomScan *from)
}
/*
+ * _copySampleScan
+ */
+static SampleScan *
+_copySampleScan(const SampleScan *from)
+{
+ SampleScan *newnode = makeNode(SampleScan);
+
+ /*
+ * copy node superclass fields
+ */
+ CopyScanFields((const Scan *) from, (Scan *) newnode);
+
+ return newnode;
+}
+
+/*
* CopyJoinFields
*
* This function copies the fields of the Join node. It is used by
@@ -2006,6 +2022,7 @@ _copyRangeTblEntry(const RangeTblEntry *from)
COPY_SCALAR_FIELD(rtekind);
COPY_SCALAR_FIELD(relid);
COPY_SCALAR_FIELD(relkind);
+ COPY_NODE_FIELD(tablesample);
COPY_NODE_FIELD(subquery);
COPY_SCALAR_FIELD(security_barrier);
COPY_SCALAR_FIELD(jointype);
@@ -2138,6 +2155,37 @@ _copyCommonTableExpr(const CommonTableExpr *from)
return newnode;
}
+static RangeTableSample *
+_copyRangeTableSample(const RangeTableSample *from)
+{
+ RangeTableSample *newnode = makeNode(RangeTableSample);
+
+ COPY_NODE_FIELD(relation);
+ COPY_STRING_FIELD(method);
+ COPY_NODE_FIELD(repeatable);
+ COPY_NODE_FIELD(args);
+
+ return newnode;
+}
+
+static TableSampleClause *
+_copyTableSampleClause(const TableSampleClause *from)
+{
+ TableSampleClause *newnode = makeNode(TableSampleClause);
+
+ COPY_SCALAR_FIELD(tsmid);
+ COPY_SCALAR_FIELD(tsminit);
+ COPY_SCALAR_FIELD(tsmnextblock);
+ COPY_SCALAR_FIELD(tsmnexttuple);
+ COPY_SCALAR_FIELD(tsmend);
+ COPY_SCALAR_FIELD(tsmreset);
+ COPY_SCALAR_FIELD(tsmcost);
+ COPY_NODE_FIELD(repeatable);
+ COPY_NODE_FIELD(args);
+
+ return newnode;
+}
+
static A_Expr *
_copyAExpr(const A_Expr *from)
{
@@ -4077,6 +4125,9 @@ copyObject(const void *from)
case T_CustomScan:
retval = _copyCustomScan(from);
break;
+ case T_SampleScan:
+ retval = _copySampleScan(from);
+ break;
case T_Join:
retval = _copyJoin(from);
break;
@@ -4725,6 +4776,12 @@ copyObject(const void *from)
case T_CommonTableExpr:
retval = _copyCommonTableExpr(from);
break;
+ case T_RangeTableSample:
+ retval = _copyRangeTableSample(from);
+ break;
+ case T_TableSampleClause:
+ retval = _copyTableSampleClause(from);
+ break;
case T_PrivGrantee:
retval = _copyPrivGrantee(from);
break;
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 0803674..83c5a25 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -2325,6 +2325,7 @@ _equalRangeTblEntry(const RangeTblEntry *a, const RangeTblEntry *b)
COMPARE_SCALAR_FIELD(rtekind);
COMPARE_SCALAR_FIELD(relid);
COMPARE_SCALAR_FIELD(relkind);
+ COMPARE_NODE_FIELD(tablesample);
COMPARE_NODE_FIELD(subquery);
COMPARE_SCALAR_FIELD(security_barrier);
COMPARE_SCALAR_FIELD(jointype);
@@ -2444,6 +2445,33 @@ _equalCommonTableExpr(const CommonTableExpr *a, const CommonTableExpr *b)
}
static bool
+_equalRangeTableSample(const RangeTableSample *a, const RangeTableSample *b)
+{
+ COMPARE_NODE_FIELD(relation);
+ COMPARE_STRING_FIELD(method);
+ COMPARE_NODE_FIELD(repeatable);
+ COMPARE_NODE_FIELD(args);
+
+ return true;
+}
+
+static bool
+_equalTableSampleClause(const TableSampleClause *a, const TableSampleClause *b)
+{
+ COMPARE_SCALAR_FIELD(tsmid);
+ COMPARE_SCALAR_FIELD(tsminit);
+ COMPARE_SCALAR_FIELD(tsmnextblock);
+ COMPARE_SCALAR_FIELD(tsmnexttuple);
+ COMPARE_SCALAR_FIELD(tsmend);
+ COMPARE_SCALAR_FIELD(tsmreset);
+ COMPARE_SCALAR_FIELD(tsmcost);
+ COMPARE_NODE_FIELD(repeatable);
+ COMPARE_NODE_FIELD(args);
+
+ return true;
+}
+
+static bool
_equalXmlSerialize(const XmlSerialize *a, const XmlSerialize *b)
{
COMPARE_SCALAR_FIELD(xmloption);
@@ -3152,6 +3180,12 @@ equal(const void *a, const void *b)
case T_CommonTableExpr:
retval = _equalCommonTableExpr(a, b);
break;
+ case T_RangeTableSample:
+ retval = _equalRangeTableSample(a, b);
+ break;
+ case T_TableSampleClause:
+ retval = _equalTableSampleClause(a, b);
+ break;
case T_PrivGrantee:
retval = _equalPrivGrantee(a, b);
break;
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index ae857a0..66b40dc 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -3209,6 +3209,18 @@ raw_expression_tree_walker(Node *node,
return walker(((WithClause *) node)->ctes, context);
case T_CommonTableExpr:
return walker(((CommonTableExpr *) node)->ctequery, context);
+ case T_RangeTableSample:
+ {
+ RangeTableSample *rts = (RangeTableSample *) node;
+
+ if (walker(rts->relation, context))
+ return true;
+ if (walker(rts->repeatable, context))
+ return true;
+ if (walker(rts->args, context))
+ return true;
+ }
+ break;
default:
elog(ERROR, "unrecognized node type: %d",
(int) nodeTag(node));
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index e3e29f5..7018512 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -578,6 +578,14 @@ _outCustomScan(StringInfo str, const CustomScan *node)
}
static void
+_outSampleScan(StringInfo str, const SampleScan *node)
+{
+ WRITE_NODE_TYPE("SAMPLESCAN");
+
+ _outScanInfo(str, (const Scan *) node);
+}
+
+static void
_outJoin(StringInfo str, const Join *node)
{
WRITE_NODE_TYPE("JOIN");
@@ -1589,6 +1597,17 @@ _outTidPath(StringInfo str, const TidPath *node)
}
static void
+_outSamplePath(StringInfo str, const SamplePath *node)
+{
+ WRITE_NODE_TYPE("SAMPLEPATH");
+
+ _outPathInfo(str, (const Path *) node);
+
+ WRITE_OID_FIELD(tsmcost);
+ WRITE_NODE_FIELD(tsmargs);
+}
+
+static void
_outForeignPath(StringInfo str, const ForeignPath *node)
{
WRITE_NODE_TYPE("FOREIGNPATH");
@@ -2391,6 +2410,33 @@ _outCommonTableExpr(StringInfo str, const CommonTableExpr *node)
}
static void
+_outRangeTableSample(StringInfo str, const RangeTableSample *node)
+{
+ WRITE_NODE_TYPE("RANGETABLESAMPLE");
+
+ WRITE_NODE_FIELD(relation);
+ WRITE_STRING_FIELD(method);
+ WRITE_NODE_FIELD(repeatable);
+ WRITE_NODE_FIELD(args);
+}
+
+static void
+_outTableSampleClause(StringInfo str, const TableSampleClause *node)
+{
+ WRITE_NODE_TYPE("TABLESAMPLECLAUSE");
+
+ WRITE_OID_FIELD(tsmid);
+ WRITE_OID_FIELD(tsminit);
+ WRITE_OID_FIELD(tsmnextblock);
+ WRITE_OID_FIELD(tsmnexttuple);
+ WRITE_OID_FIELD(tsmend);
+ WRITE_OID_FIELD(tsmreset);
+ WRITE_OID_FIELD(tsmcost);
+ WRITE_NODE_FIELD(repeatable);
+ WRITE_NODE_FIELD(args);
+}
+
+static void
_outSetOperationStmt(StringInfo str, const SetOperationStmt *node)
{
WRITE_NODE_TYPE("SETOPERATIONSTMT");
@@ -2420,6 +2466,7 @@ _outRangeTblEntry(StringInfo str, const RangeTblEntry *node)
case RTE_RELATION:
WRITE_OID_FIELD(relid);
WRITE_CHAR_FIELD(relkind);
+ WRITE_NODE_FIELD(tablesample);
break;
case RTE_SUBQUERY:
WRITE_NODE_FIELD(subquery);
@@ -2887,6 +2934,9 @@ _outNode(StringInfo str, const void *obj)
case T_CustomScan:
_outCustomScan(str, obj);
break;
+ case T_SampleScan:
+ _outSampleScan(str, obj);
+ break;
case T_Join:
_outJoin(str, obj);
break;
@@ -3092,6 +3142,8 @@ _outNode(StringInfo str, const void *obj)
case T_TidPath:
_outTidPath(str, obj);
break;
+ case T_SamplePath:
+ _outSamplePath(str, obj);
case T_ForeignPath:
_outForeignPath(str, obj);
break;
@@ -3228,6 +3280,12 @@ _outNode(StringInfo str, const void *obj)
case T_CommonTableExpr:
_outCommonTableExpr(str, obj);
break;
+ case T_RangeTableSample:
+ _outRangeTableSample(str, obj);
+ break;
+ case T_TableSampleClause:
+ _outTableSampleClause(str, obj);
+ break;
case T_SetOperationStmt:
_outSetOperationStmt(str, obj);
break;
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index a3efdd4..3a510dd 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -350,6 +350,43 @@ _readCommonTableExpr(void)
}
/*
+ * _readRangeTableSample
+ */
+static RangeTableSample *
+_readRangeTableSample(void)
+{
+ READ_LOCALS(RangeTableSample);
+
+ READ_NODE_FIELD(relation);
+ READ_STRING_FIELD(method);
+ READ_NODE_FIELD(repeatable);
+ READ_NODE_FIELD(args);
+
+ READ_DONE();
+}
+
+/*
+ * _readTableSampleClause
+ */
+static TableSampleClause *
+_readTableSampleClause(void)
+{
+ READ_LOCALS(TableSampleClause);
+
+ READ_OID_FIELD(tsmid);
+ READ_OID_FIELD(tsminit);
+ READ_OID_FIELD(tsmnextblock);
+ READ_OID_FIELD(tsmnexttuple);
+ READ_OID_FIELD(tsmend);
+ READ_OID_FIELD(tsmreset);
+ READ_OID_FIELD(tsmcost);
+ READ_NODE_FIELD(repeatable);
+ READ_NODE_FIELD(args);
+
+ READ_DONE();
+}
+
+/*
* _readSetOperationStmt
*/
static SetOperationStmt *
@@ -1216,6 +1253,7 @@ _readRangeTblEntry(void)
case RTE_RELATION:
READ_OID_FIELD(relid);
READ_CHAR_FIELD(relkind);
+ READ_NODE_FIELD(tablesample);
break;
case RTE_SUBQUERY:
READ_NODE_FIELD(subquery);
@@ -1311,6 +1349,10 @@ parseNodeString(void)
return_value = _readRowMarkClause();
else if (MATCH("COMMONTABLEEXPR", 15))
return_value = _readCommonTableExpr();
+ else if (MATCH("RANGETABLESAMPLE", 16))
+ return_value = _readRangeTableSample();
+ else if (MATCH("TABLESAMPLECLAUSE", 17))
+ return_value = _readTableSampleClause();
else if (MATCH("SETOPERATIONSTMT", 16))
return_value = _readSetOperationStmt();
else if (MATCH("ALIAS", 5))
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 449fdc3..dffab29 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -71,6 +71,8 @@ static void set_plain_rel_size(PlannerInfo *root, RelOptInfo *rel,
RangeTblEntry *rte);
static void set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
RangeTblEntry *rte);
+static void set_tablesample_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
+ RangeTblEntry *rte);
static void set_foreign_size(PlannerInfo *root, RelOptInfo *rel,
RangeTblEntry *rte);
static void set_foreign_pathlist(PlannerInfo *root, RelOptInfo *rel,
@@ -332,6 +334,11 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
/* Foreign table */
set_foreign_pathlist(root, rel, rte);
}
+ else if (rte->tablesample != NULL)
+ {
+ /* Build sample scan on relation */
+ set_tablesample_rel_pathlist(root, rel, rte);
+ }
else
{
/* Plain relation */
@@ -418,6 +425,34 @@ set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
}
/*
+ * set_tablesample_rel_pathlist
+ * Build access paths for a sampled relation
+ *
+ * There is only one possible path - sampling scan
+ */
+static void
+set_tablesample_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
+{
+ Relids required_outer;
+
+ /*
+ * We don't support pushing join clauses into the quals of a seqscan, but
+ * it could still have required parameterization due to LATERAL refs in
+ * its tlist.
+ */
+ required_outer = rel->lateral_relids;
+
+ /* We only do sample scan if it was requested */
+ add_path(rel, (Path *) create_samplescan_path(root, rel, required_outer));
+
+ /*
+ * There is only one plan to consider but we still need to set
+ * parameters for RelOptInfo.
+ */
+ set_cheapest(rel);
+}
+
+/*
* set_foreign_size
* Set size estimates for a foreign table RTE
*/
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 659daa2..d0741f0 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -88,6 +88,7 @@
#include "optimizer/restrictinfo.h"
#include "parser/parsetree.h"
#include "utils/lsyscache.h"
+#include "utils/sampling.h"
#include "utils/selfuncs.h"
#include "utils/spccache.h"
#include "utils/tuplesort.h"
@@ -219,6 +220,72 @@ cost_seqscan(Path *path, PlannerInfo *root,
}
/*
+ * cost_samplescan
+ * Determines and returns the cost of scanning a relation using sampling.
+ *
+ * From planner/optimizer perspective, we dont't care all that much about cost
+ * itself since there is always only one scan path to consider when sampling
+ * scan is present, but number of rows estimation is still important.
+ *
+ * 'baserel' is the relation to be scanned
+ * 'param_info' is the ParamPathInfo if this is a parameterized path, else NULL
+ */
+void
+cost_samplescan(SamplePath *path, PlannerInfo *root, RelOptInfo *baserel)
+{
+ Cost startup_cost = 0;
+ Cost run_cost = 0;
+ double spc_seq_page_cost,
+ spc_random_page_cost,
+ spc_page_cost;
+ QualCost qpqual_cost;
+ Cost cpu_per_tuple;
+ BlockNumber pages;
+ double tuples;
+ SamplerAccessStrategy strategy;
+
+ /* Should only be applied to base relations */
+ Assert(baserel->relid > 0);
+ Assert(baserel->rtekind == RTE_RELATION);
+
+ /* Call the sampling method's costing function. */
+ OidFunctionCall6(path->tsmcost, PointerGetDatum(root),
+ PointerGetDatum(path), PointerGetDatum(baserel),
+ PointerGetDatum(&pages), PointerGetDatum(&tuples),
+ PointerGetDatum(&strategy));
+
+ /* Mark the path with the correct row estimate */
+ if (path->path.param_info)
+ path->path.rows = path->path.param_info->ppi_rows;
+ else
+ path->path.rows = tuples;
+
+ /* fetch estimated page cost for tablespace containing table */
+ get_tablespace_page_costs(baserel->reltablespace,
+ &spc_random_page_cost,
+ &spc_seq_page_cost);
+
+
+ spc_page_cost = strategy == SAS_RANDOM ?
+ spc_random_page_cost : spc_seq_page_cost;
+
+ /*
+ * disk costs
+ */
+ run_cost += spc_page_cost * pages;
+
+ /* CPU costs */
+ get_restriction_qual_cost(root, baserel, path->path.param_info, &qpqual_cost);
+
+ startup_cost += qpqual_cost.startup;
+ cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple;
+ run_cost += cpu_per_tuple * tuples;
+
+ path->path.startup_cost = startup_cost;
+ path->path.total_cost = startup_cost + run_cost;
+}
+
+/*
* cost_index
* Determines and returns the cost of scanning a relation using an index.
*
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 8f9ae4f..1056885 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -58,6 +58,8 @@ static Material *create_material_plan(PlannerInfo *root, MaterialPath *best_path
static Plan *create_unique_plan(PlannerInfo *root, UniquePath *best_path);
static SeqScan *create_seqscan_plan(PlannerInfo *root, Path *best_path,
List *tlist, List *scan_clauses);
+static SampleScan *create_samplescan_plan(PlannerInfo *root, Path *best_path,
+ List *tlist, List *scan_clauses);
static Scan *create_indexscan_plan(PlannerInfo *root, IndexPath *best_path,
List *tlist, List *scan_clauses, bool indexonly);
static BitmapHeapScan *create_bitmap_scan_plan(PlannerInfo *root,
@@ -100,6 +102,7 @@ static List *order_qual_clauses(PlannerInfo *root, List *clauses);
static void copy_path_costsize(Plan *dest, Path *src);
static void copy_plan_costsize(Plan *dest, Plan *src);
static SeqScan *make_seqscan(List *qptlist, List *qpqual, Index scanrelid);
+static SampleScan *make_samplescan(List *qptlist, List *qpqual, Index scanrelid);
static IndexScan *make_indexscan(List *qptlist, List *qpqual, Index scanrelid,
Oid indexid, List *indexqual, List *indexqualorig,
List *indexorderby, List *indexorderbyorig,
@@ -228,6 +231,7 @@ create_plan_recurse(PlannerInfo *root, Path *best_path)
switch (best_path->pathtype)
{
case T_SeqScan:
+ case T_SampleScan:
case T_IndexScan:
case T_IndexOnlyScan:
case T_BitmapHeapScan:
@@ -343,6 +347,13 @@ create_scan_plan(PlannerInfo *root, Path *best_path)
scan_clauses);
break;
+ case T_SampleScan:
+ plan = (Plan *) create_samplescan_plan(root,
+ best_path,
+ tlist,
+ scan_clauses);
+ break;
+
case T_IndexScan:
plan = (Plan *) create_indexscan_plan(root,
(IndexPath *) best_path,
@@ -546,6 +557,7 @@ disuse_physical_tlist(PlannerInfo *root, Plan *plan, Path *path)
switch (path->pathtype)
{
case T_SeqScan:
+ case T_SampleScan:
case T_IndexScan:
case T_IndexOnlyScan:
case T_BitmapHeapScan:
@@ -1133,6 +1145,45 @@ create_seqscan_plan(PlannerInfo *root, Path *best_path,
}
/*
+ * create_samplescan_plan
+ * Returns a samplecan plan for the base relation scanned by 'best_path'
+ * with restriction clauses 'scan_clauses' and targetlist 'tlist'.
+ */
+static SampleScan *
+create_samplescan_plan(PlannerInfo *root, Path *best_path,
+ List *tlist, List *scan_clauses)
+{
+ SampleScan *scan_plan;
+ Index scan_relid = best_path->parent->relid;
+
+ /* it should be a base rel with tablesample clause... */
+ Assert(scan_relid > 0);
+ Assert(best_path->parent->rtekind == RTE_RELATION);
+ Assert(best_path->pathtype == T_SampleScan);
+
+ /* Sort clauses into best execution order */
+ scan_clauses = order_qual_clauses(root, scan_clauses);
+
+ /* Reduce RestrictInfo list to bare expressions; ignore pseudoconstants */
+ scan_clauses = extract_actual_clauses(scan_clauses, false);
+
+ /* Replace any outer-relation variables with nestloop params */
+ if (best_path->param_info)
+ {
+ scan_clauses = (List *)
+ replace_nestloop_params(root, (Node *) scan_clauses);
+ }
+
+ scan_plan = make_samplescan(tlist,
+ scan_clauses,
+ scan_relid);
+
+ copy_path_costsize(&scan_plan->plan, best_path);
+
+ return scan_plan;
+}
+
+/*
* create_indexscan_plan
* Returns an indexscan plan for the base relation scanned by 'best_path'
* with restriction clauses 'scan_clauses' and targetlist 'tlist'.
@@ -3318,6 +3369,24 @@ make_seqscan(List *qptlist,
return node;
}
+static SampleScan *
+make_samplescan(List *qptlist,
+ List *qpqual,
+ Index scanrelid)
+{
+ SampleScan *node = makeNode(SampleScan);
+ Plan *plan = &node->plan;
+
+ /* cost should be inserted by caller */
+ plan->targetlist = qptlist;
+ plan->qual = qpqual;
+ plan->lefttree = NULL;
+ plan->righttree = NULL;
+ node->scanrelid = scanrelid;
+
+ return node;
+}
+
static IndexScan *
make_indexscan(List *qptlist,
List *qpqual,
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 4d3fbca..0d78f27 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -446,6 +446,17 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
fix_scan_list(root, splan->plan.qual, rtoffset);
}
break;
+ case T_SampleScan:
+ {
+ SampleScan *splan = (SampleScan *) plan;
+
+ splan->scanrelid += rtoffset;
+ splan->plan.targetlist =
+ fix_scan_list(root, splan->plan.targetlist, rtoffset);
+ splan->plan.qual =
+ fix_scan_list(root, splan->plan.qual, rtoffset);
+ }
+ break;
case T_IndexScan:
{
IndexScan *splan = (IndexScan *) plan;
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index 579d021..7da1a44 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -2163,6 +2163,7 @@ finalize_plan(PlannerInfo *root, Plan *plan, Bitmapset *valid_params,
break;
case T_SeqScan:
+ case T_SampleScan:
context.paramids = bms_add_members(context.paramids, scan_params);
break;
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 319e8b2..3c2c1b8 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -706,6 +706,33 @@ create_seqscan_path(PlannerInfo *root, RelOptInfo *rel, Relids required_outer)
}
/*
+ * create_samplescan_path
+ * Like seqscan but uses sampling function while scanning.
+ */
+SamplePath *
+create_samplescan_path(PlannerInfo *root, RelOptInfo *rel, Relids required_outer)
+{
+ SamplePath *pathnode = makeNode(SamplePath);
+ RangeTblEntry *rte = planner_rt_fetch(rel->relid, root);
+ TableSampleClause *tablesample = rte->tablesample;
+
+ Assert(tablesample);
+
+ pathnode->path.pathtype = T_SampleScan;
+ pathnode->path.parent = rel;
+ pathnode->path.param_info = get_baserel_parampathinfo(root, rel,
+ required_outer);
+ pathnode->path.pathkeys = NIL; /* samplescan has unordered result */
+
+ pathnode->tsmcost = tablesample->tsmcost;
+ pathnode->tsmargs = tablesample->args;
+
+ cost_samplescan(pathnode, root, rel);
+
+ return pathnode;
+}
+
+/*
* create_index_path
* Creates a path node for an index scan.
*
@@ -1921,6 +1948,8 @@ reparameterize_path(PlannerInfo *root, Path *path,
case T_SubqueryScan:
return create_subqueryscan_path(root, rel, path->pathkeys,
required_outer);
+ case T_SampleScan:
+ return (Path *) create_samplescan_path(root, rel, required_outer);
default:
break;
}
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 1f4fe9d..0eb81ae 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -447,6 +447,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
%type <range> relation_expr
%type <range> relation_expr_opt_alias
%type <target> target_el single_set_clause set_target insert_column_item
+%type <node> relation_expr_tablesample tablesample_clause opt_repeatable_clause
%type <str> generic_option_name
%type <node> generic_option_arg
@@ -611,8 +612,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
STATEMENT STATISTICS STDIN STDOUT STORAGE STRICT_P STRIP_P SUBSTRING
SYMMETRIC SYSID SYSTEM_P
- TABLE TABLES TABLESPACE TEMP TEMPLATE TEMPORARY TEXT_P THEN TIME TIMESTAMP
- TO TRAILING TRANSACTION TREAT TRIGGER TRIM TRUE_P
+ TABLE TABLES TABLESAMPLE TABLESPACE TEMP TEMPLATE TEMPORARY TEXT_P THEN
+ TIME TIMESTAMP TO TRAILING TRANSACTION TREAT TRIGGER TRIM TRUE_P
TRUNCATE TRUSTED TYPE_P TYPES_P
UNBOUNDED UNCOMMITTED UNENCRYPTED UNION UNIQUE UNKNOWN UNLISTEN UNLOGGED
@@ -10137,6 +10138,12 @@ table_ref: relation_expr opt_alias_clause
$1->alias = $2;
$$ = (Node *) $1;
}
+ | relation_expr_tablesample opt_alias_clause
+ {
+ RangeTableSample *n = (RangeTableSample *) $1;
+ n->relation->alias = $2;
+ $$ = (Node *) n;
+ }
| func_table func_alias_clause
{
RangeFunction *n = (RangeFunction *) $1;
@@ -10432,7 +10439,6 @@ relation_expr_list:
| relation_expr_list ',' relation_expr { $$ = lappend($1, $3); }
;
-
/*
* Given "UPDATE foo set set ...", we have to decide without looking any
* further ahead whether the first "set" is an alias or the UPDATE's SET
@@ -10462,6 +10468,31 @@ relation_expr_opt_alias: relation_expr %prec UMINUS
}
;
+
+relation_expr_tablesample: relation_expr tablesample_clause
+ {
+ RangeTableSample *n = (RangeTableSample *) $2;
+ n->relation = $1;
+ $$ = (Node *) n;
+ }
+ ;
+
+tablesample_clause:
+ TABLESAMPLE ColId '(' func_arg_list ')' opt_repeatable_clause
+ {
+ RangeTableSample *n = makeNode(RangeTableSample);
+ n->method = $2;
+ n->args = $4;
+ n->repeatable = $6;
+ $$ = (Node *) n;
+ }
+ ;
+
+opt_repeatable_clause:
+ REPEATABLE '(' AexprConst ')' { $$ = (Node *) $3; }
+ | /*EMPTY*/ { $$ = NULL; }
+ ;
+
/*
* func_table represents a function invocation in a FROM list. It can be
* a plain function call, like "foo(...)", or a ROWS FROM expression with
@@ -13244,7 +13275,6 @@ unreserved_keyword:
| RELATIVE_P
| RELEASE
| RENAME
- | REPEATABLE
| REPLACE
| REPLICA
| RESET
@@ -13419,6 +13449,7 @@ type_func_name_keyword:
| OVERLAPS
| RIGHT
| SIMILAR
+ | TABLESAMPLE
| VERBOSE
;
@@ -13487,6 +13518,7 @@ reserved_keyword:
| PLACING
| PRIMARY
| REFERENCES
+ | REPEATABLE
| RETURNING
| SELECT
| SESSION_USER
diff --git a/src/backend/parser/parse_clause.c b/src/backend/parser/parse_clause.c
index 4931dca..6d64a84 100644
--- a/src/backend/parser/parse_clause.c
+++ b/src/backend/parser/parse_clause.c
@@ -16,6 +16,7 @@
#include "postgres.h"
#include "access/heapam.h"
+#include "access/htup_details.h"
#include "catalog/heap.h"
#include "catalog/pg_type.h"
#include "commands/defrem.h"
@@ -29,6 +30,7 @@
#include "parser/parse_coerce.h"
#include "parser/parse_collate.h"
#include "parser/parse_expr.h"
+#include "parser/parse_func.h"
#include "parser/parse_oper.h"
#include "parser/parse_relation.h"
#include "parser/parse_target.h"
@@ -36,6 +38,7 @@
#include "utils/guc.h"
#include "utils/lsyscache.h"
#include "utils/rel.h"
+#include "utils/syscache.h"
/* Convenience macro for the most common makeNamespaceItem() case */
@@ -413,6 +416,19 @@ transformJoinOnClause(ParseState *pstate, JoinExpr *j, List *namespace)
return result;
}
+static RangeTblEntry *
+transformTableSampleEntry(ParseState *pstate, RangeTableSample *r)
+{
+ RangeTblEntry *rte;
+ TableSampleClause *tablesample = NULL;
+
+ rte = transformTableEntry(pstate, r->relation);
+ tablesample = ParseTableSample(pstate, r->method, r->repeatable, r->args);
+ rte->tablesample = tablesample;
+
+ return rte;
+}
+
/*
* transformTableEntry --- transform a RangeVar (simple relation reference)
*/
@@ -421,7 +437,7 @@ transformTableEntry(ParseState *pstate, RangeVar *r)
{
RangeTblEntry *rte;
- /* We need only build a range table entry */
+ /* We first need to build a range table entry */
rte = addRangeTableEntry(pstate, r, r->alias,
interpretInhOption(r->inhOpt), true);
@@ -1121,6 +1137,26 @@ transformFromClauseItem(ParseState *pstate, Node *n,
return (Node *) j;
}
+ else if (IsA(n, RangeTableSample))
+ {
+ /* Tablesample reference */
+ RangeTableSample *rv = (RangeTableSample *) n;
+ RangeTblRef *rtr;
+ RangeTblEntry *rte = NULL;
+ int rtindex;
+
+ rte = transformTableSampleEntry(pstate, rv);
+
+ /* assume new rte is at end */
+ rtindex = list_length(pstate->p_rtable);
+ Assert(rte == rt_fetch(rtindex, pstate->p_rtable));
+ *top_rte = rte;
+ *top_rti = rtindex;
+ *namespace = list_make1(makeDefaultNSItem(rte));
+ rtr = makeNode(RangeTblRef);
+ rtr->rtindex = rtindex;
+ return (Node *) rtr;
+ }
else
elog(ERROR, "unrecognized node type: %d", (int) nodeTag(n));
return NULL; /* can't get here, keep compiler quiet */
diff --git a/src/backend/parser/parse_func.c b/src/backend/parser/parse_func.c
index 9ebd3fd..b4ef9b8 100644
--- a/src/backend/parser/parse_func.c
+++ b/src/backend/parser/parse_func.c
@@ -18,6 +18,7 @@
#include "catalog/pg_aggregate.h"
#include "catalog/pg_proc.h"
#include "catalog/pg_type.h"
+#include "catalog/pg_tablesample_method.h"
#include "funcapi.h"
#include "lib/stringinfo.h"
#include "nodes/makefuncs.h"
@@ -26,6 +27,7 @@
#include "parser/parse_clause.h"
#include "parser/parse_coerce.h"
#include "parser/parse_func.h"
+#include "parser/parse_expr.h"
#include "parser/parse_relation.h"
#include "parser/parse_target.h"
#include "parser/parse_type.h"
@@ -760,6 +762,133 @@ ParseFuncOrColumn(ParseState *pstate, List *funcname, List *fargs,
}
+/*
+ * ParseTableSample
+ *
+ * Parse TABLESAMPLE clause and process the arguments
+ */
+extern TableSampleClause *
+ParseTableSample(ParseState *pstate, char *samplemethod, Node *repeatable,
+ List *sampleargs)
+{
+ HeapTuple tuple;
+ Form_pg_tablesample_method tsm;
+ Form_pg_proc procform;
+ TableSampleClause *tablesample;
+ List *fargs;
+ ListCell *larg;
+ int nargs, initnargs;
+ Oid actual_arg_types[FUNC_MAX_ARGS];
+ Oid init_arg_types[FUNC_MAX_ARGS];
+
+ /* Load the table sample method */
+ tuple = SearchSysCache1(TABLESAMPLEMETHODNAME, PointerGetDatum(samplemethod));
+ if (!HeapTupleIsValid(tuple))
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_OBJECT),
+ errmsg("table sample method \"%s\" does not exist",
+ samplemethod)));
+
+ tablesample = makeNode(TableSampleClause);
+ tablesample->tsmid = HeapTupleGetOid(tuple);
+
+ tsm = (Form_pg_tablesample_method) GETSTRUCT(tuple);
+
+ tablesample->tsminit = tsm->tsminit;
+ tablesample->tsmnextblock = tsm->tsmnextblock;
+ tablesample->tsmnexttuple = tsm->tsmnexttuple;
+ tablesample->tsmend = tsm->tsmend;
+ tablesample->tsmreset = tsm->tsmreset;
+ tablesample->tsmcost = tsm->tsmcost;
+
+ ReleaseSysCache(tuple);
+
+ /* Validate the parameters against init function definition. */
+ tuple = SearchSysCache1(PROCOID,
+ ObjectIdGetDatum(tablesample->tsminit));
+
+ if (!HeapTupleIsValid(tuple)) /* should not happen */
+ elog(ERROR, "cache lookup failed for function %u",
+ tablesample->tsminit);
+
+ procform = (Form_pg_proc) GETSTRUCT(tuple);
+ initnargs = procform->pronargs;
+ Assert(initnargs >= 3);
+
+ /*
+ * First parameter is used to pass the SampleScanState, second is
+ * seed (REPEATABLE), skip the processing for them here, just assert
+ * that the types are correct.
+ * XXX: maybe make this ereport?
+ */
+ Assert(procform->proargtypes.values[0] == INTERNALOID);
+ Assert(procform->proargtypes.values[1] == INT4OID);
+ initnargs -= 2;
+ memcpy(init_arg_types, procform->proargtypes.values + 2,
+ initnargs * sizeof(Oid));
+
+ /* Now we are done with the catalog */
+ ReleaseSysCache(tuple);
+
+ /* Process repeatable (seed) */
+ if (repeatable != NULL)
+ {
+ Node *arg = repeatable;
+
+ if (arg && IsA(arg, A_Const))
+ {
+ A_Const *con = (A_Const *) arg;
+
+ if (con->val.type == T_Null)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("REPEATABLE clause must be NOT NULL numeric value"),
+ parser_errposition(pstate, con->location)));
+
+ }
+
+ arg = transformExpr(pstate, arg, EXPR_KIND_FROM_FUNCTION);
+ arg = coerce_to_specific_type(pstate, arg, INT4OID, "REPEATABLE");
+ tablesample->repeatable = arg;
+ }
+ else
+ tablesample->repeatable = NULL;
+
+ /* Transform the rest of arguments ... */
+ fargs = NIL;
+ nargs = 0;
+ foreach(larg, sampleargs)
+ {
+ Node *arg = transformExpr(pstate, (Node *) lfirst(larg), EXPR_KIND_FROM_FUNCTION);
+ Oid argtype = exprType(arg);
+
+ fargs = lappend(fargs, arg);
+
+ actual_arg_types[nargs++] = argtype;
+ }
+
+ /*
+ * Check if parameters are correct.
+ *
+ * XXX: can we do better at hinting here?
+ */
+ if (initnargs != nargs ||
+ !can_coerce_type(initnargs, actual_arg_types, init_arg_types,
+ COERCION_IMPLICIT))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("wrong parameters for TABLESAMPLE method \"%s\"",
+ samplemethod)));
+
+ /* perform the necessary typecasting of arguments */
+ make_fn_arguments(pstate, fargs, actual_arg_types, init_arg_types);
+
+ /* Pass the arguments down */
+ tablesample->args = fargs;
+
+ return tablesample;
+}
+
/* func_match_argtypes()
*
* Given a list of candidate functions (having the right name and number
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index 24ade6c..65ca5bf 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -31,6 +31,7 @@
#include "catalog/pg_opclass.h"
#include "catalog/pg_operator.h"
#include "catalog/pg_proc.h"
+#include "catalog/pg_tablesample_method.h"
#include "catalog/pg_trigger.h"
#include "catalog/pg_type.h"
#include "commands/defrem.h"
@@ -343,6 +344,8 @@ static void make_ruledef(StringInfo buf, HeapTuple ruletup, TupleDesc rulettc,
int prettyFlags);
static void make_viewdef(StringInfo buf, HeapTuple ruletup, TupleDesc rulettc,
int prettyFlags, int wrapColumn);
+static void get_tablesample_def(TableSampleClause *tablesample,
+ deparse_context *context);
static void get_query_def(Query *query, StringInfo buf, List *parentnamespace,
TupleDesc resultDesc,
int prettyFlags, int wrapColumn, int startIndent);
@@ -4157,6 +4160,50 @@ make_viewdef(StringInfo buf, HeapTuple ruletup, TupleDesc rulettc,
heap_close(ev_relation, AccessShareLock);
}
+/* ----------
+ * get_tablesample_def - Convert TableSampleClause back to SQL
+ * ----------
+ */
+static void
+get_tablesample_def(TableSampleClause *tablesample, deparse_context *context)
+{
+ StringInfo buf = context->buf;
+ HeapTuple tuple;
+ Form_pg_tablesample_method tsm;
+ char *tsmname;
+ int nargs;
+ ListCell *l;
+
+ /* Load the table sample method */
+ tuple = SearchSysCache1(TABLESAMPLEMETHODOID, ObjectIdGetDatum(tablesample->tsmid));
+ if (!HeapTupleIsValid(tuple))
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_OBJECT),
+ errmsg("cache lookup failed for table sample method %u",
+ tablesample->tsmid)));
+
+ tsm = (Form_pg_tablesample_method) GETSTRUCT(tuple);
+ tsmname = NameStr(tsm->tsmname);
+ appendStringInfo(buf, " TABLESAMPLE %s (", quote_identifier(tsmname));
+
+ ReleaseSysCache(tuple);
+
+ nargs = 0;
+ foreach(l, tablesample->args)
+ {
+ if (nargs++ > 0)
+ appendStringInfoString(buf, ", ");
+ get_rule_expr((Node *) lfirst(l), context, true);
+ }
+ appendStringInfoChar(buf, ')');
+
+ if (tablesample->repeatable != NULL)
+ {
+ appendStringInfoString(buf, " REPEATABLE (");
+ get_rule_expr(tablesample->repeatable, context, true);
+ appendStringInfoChar(buf, ')');
+ }
+}
/* ----------
* get_query_def - Parse back one query parsetree
@@ -8384,6 +8431,9 @@ get_from_clause_item(Node *jtnode, Query *query, deparse_context *context)
only_marker(rte),
generate_relation_name(rte->relid,
context->namespaces));
+
+ if (rte->tablesample)
+ get_tablesample_def(rte->tablesample, context);
break;
case RTE_SUBQUERY:
/* Subquery RTE */
diff --git a/src/backend/utils/cache/syscache.c b/src/backend/utils/cache/syscache.c
index 94d951c..11d560e 100644
--- a/src/backend/utils/cache/syscache.c
+++ b/src/backend/utils/cache/syscache.c
@@ -55,6 +55,7 @@
#include "catalog/pg_shdescription.h"
#include "catalog/pg_shseclabel.h"
#include "catalog/pg_statistic.h"
+#include "catalog/pg_tablesample_method.h"
#include "catalog/pg_tablespace.h"
#include "catalog/pg_ts_config.h"
#include "catalog/pg_ts_config_map.h"
@@ -642,6 +643,28 @@ static const struct cachedesc cacheinfo[] = {
},
128
},
+ {TableSampleMethodRelationId, /* TABLESAMPLEMETHODNAME */
+ TableSampleMethodNameIndexId,
+ 1,
+ {
+ Anum_pg_tablesample_method_tsmname,
+ 0,
+ 0,
+ 0,
+ },
+ 2
+ },
+ {TableSampleMethodRelationId, /* TABLESAMPLEMETHODOID */
+ TableSampleMethodOidIndexId,
+ 1,
+ {
+ ObjectIdAttributeNumber,
+ 0,
+ 0,
+ 0,
+ },
+ 2
+ },
{TableSpaceRelationId, /* TABLESPACEOID */
TablespaceOidIndexId,
1,
diff --git a/src/backend/utils/misc/Makefile b/src/backend/utils/misc/Makefile
index 449d5b4..848ba29 100644
--- a/src/backend/utils/misc/Makefile
+++ b/src/backend/utils/misc/Makefile
@@ -15,7 +15,7 @@ include $(top_builddir)/src/Makefile.global
override CPPFLAGS := -I. -I$(srcdir) $(CPPFLAGS)
OBJS = guc.o help_config.o pg_rusage.o ps_status.o \
- superuser.o timeout.o tzparser.o
+ sampling.o superuser.o timeout.o tzparser.o
# This location might depend on the installation directories. Therefore
# we can't subsitute it into pg_config.h.
diff --git a/src/backend/utils/misc/sampling.c b/src/backend/utils/misc/sampling.c
new file mode 100644
index 0000000..71a91f9
--- /dev/null
+++ b/src/backend/utils/misc/sampling.c
@@ -0,0 +1,130 @@
+/*-------------------------------------------------------------------------
+ *
+ * sampling.c
+ * Block sampling routines shared by ANALYZE and TABLESAMPLE.
+ *
+ * Portions Copyright (c) 1996-2012, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ * src/backend/utils/misc/sampling.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include <math.h>
+
+#include "utils/sampling.h"
+
+static unsigned short _sampler_seed[3] = { 0x330e, 0xabcd, 0x1234 };
+
+/*
+ * BlockSampler_Init -- prepare for random sampling of blocknumbers
+ *
+ * BlockSampler is used for stage one of our new two-stage tuple
+ * sampling mechanism as discussed on pgsql-hackers 2004-04-02 (subject
+ * "Large DB"). It selects a random sample of samplesize blocks out of
+ * the nblocks blocks in the table. If the table has less than
+ * samplesize blocks, all blocks are selected.
+ *
+ * Since we know the total number of blocks in advance, we can use the
+ * straightforward Algorithm S from Knuth 3.4.2, rather than Vitter's
+ * algorithm.
+ */
+void
+BlockSampler_Init(BlockSampler bs, BlockNumber nblocks, int samplesize)
+{
+ bs->N = nblocks; /* measured table size */
+
+ /*
+ * If we decide to reduce samplesize for tables that have less or not much
+ * more than samplesize blocks, here is the place to do it.
+ */
+ bs->n = samplesize;
+ bs->t = 0; /* blocks scanned so far */
+ bs->m = 0; /* blocks selected so far */
+}
+
+bool
+BlockSampler_HasMore(BlockSampler bs)
+{
+ return (bs->t < bs->N) && (bs->m < bs->n);
+}
+
+BlockNumber
+BlockSampler_Next(BlockSampler bs)
+{
+ BlockNumber K = bs->N - bs->t; /* remaining blocks */
+ int k = bs->n - bs->m; /* blocks still to sample */
+ double p; /* probability to skip block */
+ double V; /* random */
+
+ Assert(BlockSampler_HasMore(bs)); /* hence K > 0 and k > 0 */
+
+ if ((BlockNumber) k >= K)
+ {
+ /* need all the rest */
+ bs->m++;
+ return bs->t++;
+ }
+
+ /*----------
+ * It is not obvious that this code matches Knuth's Algorithm S.
+ * Knuth says to skip the current block with probability 1 - k/K.
+ * If we are to skip, we should advance t (hence decrease K), and
+ * repeat the same probabilistic test for the next block. The naive
+ * implementation thus requires an sampler_random_fract() call for each
+ * block number. But we can reduce this to one sampler_random_fract()
+ * call per selected block, by noting that each time the while-test
+ * succeeds, we can reinterpret V as a uniform random number in the range
+ * 0 to p. Therefore, instead of choosing a new V, we just adjust p to be
+ * the appropriate fraction of its former value, and our next loop
+ * makes the appropriate probabilistic test.
+ *
+ * We have initially K > k > 0. If the loop reduces K to equal k,
+ * the next while-test must fail since p will become exactly zero
+ * (we assume there will not be roundoff error in the division).
+ * (Note: Knuth suggests a "<=" loop condition, but we use "<" just
+ * to be doubly sure about roundoff error.) Therefore K cannot become
+ * less than k, which means that we cannot fail to select enough blocks.
+ *----------
+ */
+ V = sampler_random_fract();
+ p = 1.0 - (double) k / (double) K;
+ while (V < p)
+ {
+ /* skip */
+ bs->t++;
+ K--; /* keep K == N - t */
+
+ /* adjust p to be new cutoff point in reduced range */
+ p *= 1.0 - (double) k / (double) K;
+ }
+
+ /* select */
+ bs->m++;
+ return bs->t++;
+}
+
+
+/*----------
+ * Random number generator used by sampling
+ *----------
+ */
+void
+sampler_setseed(long seed)
+{
+ _sampler_seed[0] = 0x330e;
+ _sampler_seed[1] = (unsigned short) seed;
+ _sampler_seed[2] = (unsigned short) (seed >> 16);
+}
+
+/* Select a random value R uniformly distributed in (0 - 1) */
+double
+sampler_random_fract(void)
+{
+ return pg_erand48(_sampler_seed);
+}
diff --git a/src/include/access/tsm_bernoulli.h b/src/include/access/tsm_bernoulli.h
new file mode 100644
index 0000000..00cd069
--- /dev/null
+++ b/src/include/access/tsm_bernoulli.h
@@ -0,0 +1,20 @@
+/*--------------------------------------------------------------------------
+ * tsm_bernoulli.h
+ * Header file for BERNOULLI table sampling method.
+ *
+ * Copyright (c) 2006-2014, PostgreSQL Global Development Group
+ *
+ * src/include/access/tsm_bernoulli.h
+ *--------------------------------------------------------------------------
+ */
+#ifndef TSM_BERNOULLI_H
+#define TSM_BERNOULLI_H
+
+extern Datum tsm_bernoulli_init(PG_FUNCTION_ARGS);
+extern Datum tsm_bernoulli_nextblock(PG_FUNCTION_ARGS);
+extern Datum tsm_bernoulli_nexttuple(PG_FUNCTION_ARGS);
+extern Datum tsm_bernoulli_end(PG_FUNCTION_ARGS);
+extern Datum tsm_bernoulli_reset(PG_FUNCTION_ARGS);
+extern Datum tsm_bernoulli_cost(PG_FUNCTION_ARGS);
+
+#endif /* TSM_SYSTEM_H */
diff --git a/src/include/access/tsm_system.h b/src/include/access/tsm_system.h
new file mode 100644
index 0000000..4021470
--- /dev/null
+++ b/src/include/access/tsm_system.h
@@ -0,0 +1,20 @@
+/*--------------------------------------------------------------------------
+ * tsm_system.h
+ * Header file for SYSTEM table sampling method.
+ *
+ * Copyright (c) 2006-2014, PostgreSQL Global Development Group
+ *
+ * src/include/access/tsm_system.h
+ *--------------------------------------------------------------------------
+ */
+#ifndef TSM_SYSTEM_H
+#define TSM_SYSTEM_H
+
+extern Datum tsm_system_init(PG_FUNCTION_ARGS);
+extern Datum tsm_system_nextblock(PG_FUNCTION_ARGS);
+extern Datum tsm_system_nexttuple(PG_FUNCTION_ARGS);
+extern Datum tsm_system_end(PG_FUNCTION_ARGS);
+extern Datum tsm_system_reset(PG_FUNCTION_ARGS);
+extern Datum tsm_system_cost(PG_FUNCTION_ARGS);
+
+#endif /* TSM_SYSTEM_H */
diff --git a/src/include/catalog/indexing.h b/src/include/catalog/indexing.h
index bde1a84..b06a791 100644
--- a/src/include/catalog/indexing.h
+++ b/src/include/catalog/indexing.h
@@ -305,6 +305,11 @@ DECLARE_UNIQUE_INDEX(pg_policy_oid_index, 3257, on pg_policy using btree(oid oid
DECLARE_UNIQUE_INDEX(pg_policy_polrelid_polname_index, 3258, on pg_policy using btree(polrelid oid_ops, polname name_ops));
#define PolicyPolrelidPolnameIndexId 3258
+DECLARE_UNIQUE_INDEX(pg_tablesample_method_name_index, 3281, on pg_tablesample_method using btree(tsmname name_ops));
+#define TableSampleMethodNameIndexId 3281
+DECLARE_UNIQUE_INDEX(pg_tablesample_method_oid_index, 3282, on pg_tablesample_method using btree(oid oid_ops));
+#define TableSampleMethodOidIndexId 3282
+
/* last step of initialization script: build the indexes declared above */
BUILD_INDICES
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index f766ed7..9c77957 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -5136,6 +5136,31 @@ DESCR("rank of hypothetical row without gaps");
DATA(insert OID = 3993 ( dense_rank_final PGNSP PGUID 12 1 0 2276 0 f f f f f f i 2 0 20 "2281 2276" "{2281,2276}" "{i,v}" _null_ _null_ hypothetical_dense_rank_final _null_ _null_ _null_ ));
DESCR("aggregate final function");
+DATA(insert OID = 3285 ( tsm_system_init PGNSP PGUID 12 1 0 0 0 f f f f t f v 3 0 2278 "2281 23 700" _null_ _null_ _null_ _null_ tsm_system_init _null_ _null_ _null_ ));
+DESCR("tsm_system_init(internal)");
+DATA(insert OID = 3286 ( tsm_system_nextblock PGNSP PGUID 12 1 0 0 0 f f f f t f v 1 0 23 "2281" _null_ _null_ _null_ _null_ tsm_system_nextblock _null_ _null_ _null_ ));
+DESCR("tsm_system_nextblock(internal)");
+DATA(insert OID = 3287 ( tsm_system_nexttuple PGNSP PGUID 12 1 0 0 0 f f f f t f v 1 0 21 "2281" _null_ _null_ _null_ _null_ tsm_system_nexttuple _null_ _null_ _null_ ));
+DESCR("tsm_system_nexttuple(internal)");
+DATA(insert OID = 3288 ( tsm_system_end PGNSP PGUID 12 1 0 0 0 f f f f t f v 1 0 2278 "2281" _null_ _null_ _null_ _null_ tsm_system_end _null_ _null_ _null_ ));
+DESCR("tsm_system_end(internal)");
+DATA(insert OID = 3289 ( tsm_system_reset PGNSP PGUID 12 1 0 0 0 f f f f t f v 1 0 2278 "2281" _null_ _null_ _null_ _null_ tsm_system_reset _null_ _null_ _null_ ));
+DESCR("tsm_system_reset(internal)");
+DATA(insert OID = 3290 ( tsm_system_cost PGNSP PGUID 12 1 0 0 0 f f f f t f v 1 0 2278 "700" _null_ _null_ _null_ _null_ tsm_system_cost _null_ _null_ _null_ ));
+DESCR("tsm_system_cost(internal)");
+
+DATA(insert OID = 3291 ( tsm_bernoulli_init PGNSP PGUID 12 1 0 0 0 f f f f t f v 3 0 2278 "2281 23 700" _null_ _null_ _null_ _null_ tsm_bernoulli_init _null_ _null_ _null_ ));
+DESCR("tsm_bernoulli_init(internal)");
+DATA(insert OID = 3292 ( tsm_bernoulli_nextblock PGNSP PGUID 12 1 0 0 0 f f f f t f v 1 0 23 "2281" _null_ _null_ _null_ _null_ tsm_bernoulli_nextblock _null_ _null_ _null_ ));
+DESCR("tsm_bernoulli_nextblock(internal)");
+DATA(insert OID = 3293 ( tsm_bernoulli_nexttuple PGNSP PGUID 12 1 0 0 0 f f f f t f v 1 0 21 "2281" _null_ _null_ _null_ _null_ tsm_bernoulli_nexttuple _null_ _null_ _null_ ));
+DESCR("tsm_bernoulli_nexttuple(internal)");
+DATA(insert OID = 3294 ( tsm_bernoulli_end PGNSP PGUID 12 1 0 0 0 f f f f t f v 1 0 2278 "2281" _null_ _null_ _null_ _null_ tsm_bernoulli_end _null_ _null_ _null_ ));
+DESCR("tsm_bernoulli_end(internal)");
+DATA(insert OID = 3296 ( tsm_bernoulli_reset PGNSP PGUID 12 1 0 0 0 f f f f t f v 1 0 2278 "2281" _null_ _null_ _null_ _null_ tsm_bernoulli_reset _null_ _null_ _null_ ));
+DESCR("tsm_bernoulli_reset(internal)");
+DATA(insert OID = 3297 ( tsm_bernoulli_cost PGNSP PGUID 12 1 0 0 0 f f f f t f v 1 0 2278 "700" _null_ _null_ _null_ _null_ tsm_bernoulli_cost _null_ _null_ _null_ ));
+DESCR("tsm_bernoulli_cost(internal)");
/*
* Symbolic values for provolatile column: these indicate whether the result
diff --git a/src/include/catalog/pg_tablesample_method.h b/src/include/catalog/pg_tablesample_method.h
new file mode 100644
index 0000000..0e4a716
--- /dev/null
+++ b/src/include/catalog/pg_tablesample_method.h
@@ -0,0 +1,70 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_tablesample_method.h
+ * definition of the table scan methods.
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/catalog/pg_tablesample_method.h
+ *
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PG_TABLESAMPLE_METHOD_H
+#define PG_TABLESAMPLE_METHOD_H
+
+#include "catalog/genbki.h"
+
+/* ----------------
+ * pg_tablesample_method definition. cpp turns this into
+ * typedef struct FormData_pg_tablesample_method
+ * ----------------
+ */
+#define TableSampleMethodRelationId 3280
+
+CATALOG(pg_tablesample_method,3280)
+{
+ NameData tsmname; /* tablescan method name */
+ regproc tsminit; /* init scan function */
+ regproc tsmnextblock; /* function returning next block to sample
+ or InvalidBlockOffset if finished */
+ regproc tsmnexttuple; /* function returning next tuple offset from current block
+ or InvalidOffsetNumber if end of the block was reacher */
+ regproc tsmend; /* end scan function*/
+ regproc tsmreset; /* reset state - used by rescan */
+ regproc tsmcost; /* costing function */
+} FormData_pg_tablesample_method;
+
+/* ----------------
+ * Form_pg_tablesample_method corresponds to a pointer to a tuple with
+ * the format of pg_tablesample_method relation.
+ * ----------------
+ */
+typedef FormData_pg_tablesample_method *Form_pg_tablesample_method;
+
+/* ----------------
+ * compiler constants for pg_tablesample_method
+ * ----------------
+ */
+#define Natts_pg_tablesample_method 7
+#define Anum_pg_tablesample_method_tsmname 1
+#define Anum_pg_tablesample_method_tsminit 2
+#define Anum_pg_tablesample_method_tsmnextblock 3
+#define Anum_pg_tablesample_method_tsmnexttuple 4
+#define Anum_pg_tablesample_method_tsmend 5
+#define Anum_pg_tablesample_method_tsmreset 6
+#define Anum_pg_tablesample_method_tsmcost 7
+
+/* ----------------
+ * initial contents of pg_tablesample_method
+ * ----------------
+ */
+
+DATA(insert OID = 3283 ( system tsm_system_init tsm_system_nextblock tsm_system_nexttuple tsm_system_end tsm_system_reset tsm_system_cost ));
+DESCR("SYSTEM table sampling method");
+DATA(insert OID = 3284 ( bernoulli tsm_bernoulli_init tsm_bernoulli_nextblock tsm_bernoulli_nexttuple tsm_bernoulli_end tsm_bernoulli_reset tsm_bernoulli_cost ));
+DESCR("BERNOULLI table sampling method");
+
+#endif /* PG_TABLESAMPLE_METHOD_H */
diff --git a/src/include/executor/nodeSamplescan.h b/src/include/executor/nodeSamplescan.h
new file mode 100644
index 0000000..4b769da
--- /dev/null
+++ b/src/include/executor/nodeSamplescan.h
@@ -0,0 +1,24 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeSamplescan.h
+ *
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/executor/nodeSamplescan.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef NODESAMPLESCAN_H
+#define NODESAMPLESCAN_H
+
+#include "nodes/execnodes.h"
+
+extern SampleScanState *ExecInitSampleScan(SampleScan *node, EState *estate, int eflags);
+extern TupleTableSlot *ExecSampleScan(SampleScanState *node);
+extern void ExecEndSampleScan(SampleScanState *node);
+extern void ExecReScanSampleScan(SampleScanState *node);
+
+#endif /* NODESAMPLESCAN_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 41b13b2..b7f3129 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1212,6 +1212,26 @@ typedef struct ScanState
typedef ScanState SeqScanState;
/*
+ * SampleScan
+ */
+typedef struct SampleScanState
+{
+ ScanState ss;
+
+ /* Sampling method functions. */
+ FmgrInfo tsminit;
+ FmgrInfo tsmnextblock;
+ FmgrInfo tsmnexttuple;
+ FmgrInfo tsmend;
+ FmgrInfo tsmreset;
+
+ Buffer openbuffer; /* currently open buffer */
+ HeapTupleData tup; /* last tuple */
+
+ void *tsmdata; /* for use by table scan method */
+} SampleScanState;
+
+/*
* These structs store information about index quals that don't have simple
* constant right-hand sides. See comments for ExecIndexBuildScanKeys()
* for discussion.
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index bc71fea..01d4795 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -61,6 +61,7 @@ typedef enum NodeTag
T_ValuesScan,
T_CteScan,
T_WorkTableScan,
+ T_SampleScan,
T_ForeignScan,
T_CustomScan,
T_Join,
@@ -97,6 +98,7 @@ typedef enum NodeTag
T_BitmapOrState,
T_ScanState,
T_SeqScanState,
+ T_SampleScanState,
T_IndexScanState,
T_IndexOnlyScanState,
T_BitmapIndexScanState,
@@ -225,6 +227,7 @@ typedef enum NodeTag
T_MergePath,
T_HashPath,
T_TidPath,
+ T_SamplePath,
T_ForeignPath,
T_CustomPath,
T_AppendPath,
@@ -413,6 +416,8 @@ typedef enum NodeTag
T_XmlSerialize,
T_WithClause,
T_CommonTableExpr,
+ T_RangeTableSample,
+ T_TableSampleClause,
/*
* TAGS FOR REPLICATION GRAMMAR PARSE NODES (replnodes.h)
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 458eeb0..62c2c57 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -307,6 +307,23 @@ typedef struct FuncCall
} FuncCall;
/*
+ * TableSampleClause - a sampling method information
+ */
+typedef struct TableSampleClause
+{
+ NodeTag type;
+ Oid tsmid;
+ Oid tsminit;
+ Oid tsmnextblock;
+ Oid tsmnexttuple;
+ Oid tsmend;
+ Oid tsmreset;
+ Oid tsmcost;
+ Node *repeatable;
+ List *args;
+} TableSampleClause;
+
+/*
* A_Star - '*' representing all columns of a table or compound field
*
* This can appear within ColumnRef.fields, A_Indirection.indirection, and
@@ -507,6 +524,21 @@ typedef struct RangeFunction
} RangeFunction;
/*
+ * RangeTableSample - represents <table> TABLESAMPLE <method> (<params>) REPEATABLE (<num>)
+ *
+ * We are more generic than SQL Standard so we pass generic function
+ * arguments to the sampling method.
+ */
+typedef struct RangeTableSample
+{
+ NodeTag type;
+ RangeVar *relation;
+ char *method; /* sampling method */
+ Node *repeatable;
+ List *args; /* arguments for sampling method */
+} RangeTableSample;
+
+/*
* ColumnDef - column definition (used in various creates)
*
* If the column has a default value, we may have the value expression
@@ -751,6 +783,7 @@ typedef struct RangeTblEntry
*/
Oid relid; /* OID of the relation */
char relkind; /* relation kind (see pg_class.relkind) */
+ TableSampleClause *tablesample; /* sampling method and parameters */
/*
* Fields valid for a subquery RTE (else NULL):
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 48203a0..8427b44 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -278,6 +278,12 @@ typedef struct Scan
typedef Scan SeqScan;
/* ----------------
+ * table sample scan node
+ * ----------------
+ */
+typedef Scan SampleScan;
+
+/* ----------------
* index scan node
*
* indexqualorig is an implicitly-ANDed list of index qual expressions, each
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 7116496..064e336 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -870,6 +870,18 @@ typedef struct TidPath
} TidPath;
/*
+ * SamplePath represents a sample sacn
+ *
+ * args is list of parameters for the the TABLESAMPLE clause
+ */
+typedef struct SamplePath
+{
+ Path path;
+ Oid tsmcost; /* table sample method costing function */
+ List *tsmargs; /* arguments to a TABLESAMPLE clause */
+} SamplePath;
+
+/*
* ForeignPath represents a potential scan of a foreign table
*
* fdw_private stores FDW private data about the scan. While fdw_private is
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 75e2afb..97bc0ba 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -68,6 +68,7 @@ extern double index_pages_fetched(double tuples_fetched, BlockNumber pages,
double index_pages, PlannerInfo *root);
extern void cost_seqscan(Path *path, PlannerInfo *root, RelOptInfo *baserel,
ParamPathInfo *param_info);
+extern void cost_samplescan(SamplePath *path, PlannerInfo *root, RelOptInfo *baserel);
extern void cost_index(IndexPath *path, PlannerInfo *root,
double loop_count);
extern void cost_bitmap_heap_scan(Path *path, PlannerInfo *root, RelOptInfo *baserel,
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 26b17f5..6c0a6cf 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -32,6 +32,8 @@ extern bool add_path_precheck(RelOptInfo *parent_rel,
extern Path *create_seqscan_path(PlannerInfo *root, RelOptInfo *rel,
Relids required_outer);
+extern SamplePath *create_samplescan_path(PlannerInfo *root, RelOptInfo *rel,
+ Relids required_outer);
extern IndexPath *create_index_path(PlannerInfo *root,
IndexOptInfo *index,
List *indexclauses,
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index e14dc9a..e565082 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -312,7 +312,7 @@ PG_KEYWORD("reindex", REINDEX, UNRESERVED_KEYWORD)
PG_KEYWORD("relative", RELATIVE_P, UNRESERVED_KEYWORD)
PG_KEYWORD("release", RELEASE, UNRESERVED_KEYWORD)
PG_KEYWORD("rename", RENAME, UNRESERVED_KEYWORD)
-PG_KEYWORD("repeatable", REPEATABLE, UNRESERVED_KEYWORD)
+PG_KEYWORD("repeatable", REPEATABLE, RESERVED_KEYWORD)
PG_KEYWORD("replace", REPLACE, UNRESERVED_KEYWORD)
PG_KEYWORD("replica", REPLICA, UNRESERVED_KEYWORD)
PG_KEYWORD("reset", RESET, UNRESERVED_KEYWORD)
@@ -366,6 +366,7 @@ PG_KEYWORD("sysid", SYSID, UNRESERVED_KEYWORD)
PG_KEYWORD("system", SYSTEM_P, UNRESERVED_KEYWORD)
PG_KEYWORD("table", TABLE, RESERVED_KEYWORD)
PG_KEYWORD("tables", TABLES, UNRESERVED_KEYWORD)
+PG_KEYWORD("tablesample", TABLESAMPLE, TYPE_FUNC_NAME_KEYWORD)
PG_KEYWORD("tablespace", TABLESPACE, UNRESERVED_KEYWORD)
PG_KEYWORD("temp", TEMP, UNRESERVED_KEYWORD)
PG_KEYWORD("template", TEMPLATE, UNRESERVED_KEYWORD)
diff --git a/src/include/parser/parse_func.h b/src/include/parser/parse_func.h
index 4423bc0..0ba9768 100644
--- a/src/include/parser/parse_func.h
+++ b/src/include/parser/parse_func.h
@@ -33,6 +33,10 @@ typedef enum
extern Node *ParseFuncOrColumn(ParseState *pstate, List *funcname, List *fargs,
FuncCall *fn, int location);
+extern TableSampleClause *ParseTableSample(ParseState *pstate,
+ char *samplemethod,
+ Node *repeatable, List *args);
+
extern FuncDetailCode func_get_detail(List *funcname,
List *fargs, List *fargnames,
int nargs, Oid *argtypes,
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index 48ebf59..1ba06b6 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -63,7 +63,6 @@ typedef struct RelationAmInfo
FmgrInfo amcanreturn;
} RelationAmInfo;
-
/*
* Here are the contents of a relation cache entry.
*/
diff --git a/src/include/utils/sampling.h b/src/include/utils/sampling.h
new file mode 100644
index 0000000..734cdc0
--- /dev/null
+++ b/src/include/utils/sampling.h
@@ -0,0 +1,49 @@
+/*-------------------------------------------------------------------------
+ *
+ * sampling.h
+ * definitions for sampling functions
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/utils/sampling.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef SAMPLING_H
+#define SAMPLING_H
+
+#include "storage/bufmgr.h"
+
+typedef enum SamplerAccessStrategy
+{
+ SAS_RANDOM,
+ SAS_SEQUENTIAL
+} SamplerAccessStrategy;
+
+/* Data structure for Algorithm S from Knuth 3.4.2 */
+typedef struct
+{
+ BlockNumber N; /* number of blocks, known in advance */
+ int n; /* desired sample size */
+ BlockNumber t; /* current block number */
+ int m; /* blocks selected so far */
+} BlockSamplerData;
+
+typedef BlockSamplerData *BlockSampler;
+
+extern void BlockSampler_Init(BlockSampler bs, BlockNumber nblocks,
+ int samplesize);
+extern bool BlockSampler_HasMore(BlockSampler bs);
+extern BlockNumber BlockSampler_Next(BlockSampler bs);
+
+/* Vitter reservoir sampling functions */
+extern double vitter_init_selection_state(int n);
+extern double vitter_get_next_S(double t, int n, double *stateptr);
+
+/* Random generator */
+extern void sampler_setseed(long seed);
+extern double sampler_random_fract(void);
+
+#endif /* SAMPLING_H */
diff --git a/src/include/utils/syscache.h b/src/include/utils/syscache.h
index f97229f..29244c7 100644
--- a/src/include/utils/syscache.h
+++ b/src/include/utils/syscache.h
@@ -79,6 +79,8 @@ enum SysCacheIdentifier
RELOID,
RULERELNAME,
STATRELATTINH,
+ TABLESAMPLEMETHODNAME,
+ TABLESAMPLEMETHODOID,
TABLESPACEOID,
TSCONFIGMAP,
TSCONFIGNAMENSP,
diff --git a/src/test/regress/expected/sanity_check.out b/src/test/regress/expected/sanity_check.out
index c7be273..5946edf 100644
--- a/src/test/regress/expected/sanity_check.out
+++ b/src/test/regress/expected/sanity_check.out
@@ -127,6 +127,7 @@ pg_shdepend|t
pg_shdescription|t
pg_shseclabel|t
pg_statistic|t
+pg_tablesample_method|t
pg_tablespace|t
pg_trigger|t
pg_ts_config|t
diff --git a/src/test/regress/expected/tablesample.out b/src/test/regress/expected/tablesample.out
new file mode 100644
index 0000000..436b754
--- /dev/null
+++ b/src/test/regress/expected/tablesample.out
@@ -0,0 +1,77 @@
+CREATE TABLE test_tablesample (id INT, name text) WITH (fillfactor=10); -- force smaller pages so we don't have to load too much data to get multiple pages
+INSERT INTO test_tablesample SELECT i, repeat(i::text, 200) FROM generate_series(0, 9) s(i) ORDER BY i;
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50) REPEATABLE (10);
+ id
+----
+ 0
+ 1
+ 2
+ 3
+ 4
+ 5
+ 9
+(7 rows)
+
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (100.0/11) REPEATABLE (9999);
+ id
+----
+ 6
+ 7
+ 8
+(3 rows)
+
+SELECT count(*) FROM test_tablesample TABLESAMPLE SYSTEM (100);
+ count
+-------
+ 10
+(1 row)
+
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50) REPEATABLE (100);
+ id
+----
+ 0
+ 1
+ 2
+ 6
+ 7
+ 8
+ 9
+(7 rows)
+
+SELECT id FROM test_tablesample TABLESAMPLE BERNOULLI (50) REPEATABLE (100);
+ id
+----
+ 0
+ 1
+ 3
+ 4
+ 5
+(5 rows)
+
+SELECT id FROM test_tablesample TABLESAMPLE BERNOULLI (5.5) REPEATABLE (1);
+ id
+----
+ 0
+ 5
+(2 rows)
+
+CREATE VIEW test_tablesample_v1 AS SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (10*2) REPEATABLE (2);
+CREATE VIEW test_tablesample_v2 AS SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (99);
+SELECT pg_get_viewdef('test_tablesample_v1'::regclass);
+ pg_get_viewdef
+--------------------------------------------------------------------------------
+ SELECT test_tablesample.id +
+ FROM test_tablesample TABLESAMPLE system (((10 * 2))::real) REPEATABLE (2);
+(1 row)
+
+SELECT pg_get_viewdef('test_tablesample_v2'::regclass);
+ pg_get_viewdef
+-----------------------------------------------------------
+ SELECT test_tablesample.id +
+ FROM test_tablesample TABLESAMPLE system ((99)::real);
+(1 row)
+
+DROP TABLE test_tablesample CASCADE;
+NOTICE: drop cascades to 2 other objects
+DETAIL: drop cascades to view test_tablesample_v1
+drop cascades to view test_tablesample_v2
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 62cc198..cf789dc 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -83,7 +83,7 @@ test: select_into select_distinct select_distinct_on select_implicit select_havi
# ----------
# Another group of parallel tests
# ----------
-test: brin gin gist spgist privileges security_label collate matview lock replica_identity rowsecurity
+test: brin gin gist spgist privileges security_label collate matview lock replica_identity rowsecurity tablesample
# ----------
# Another group of parallel tests
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 07fc827..852fed9 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -151,3 +151,4 @@ test: with
test: xml
test: event_trigger
test: stats
+test: tablesample
diff --git a/src/test/regress/sql/tablesample.sql b/src/test/regress/sql/tablesample.sql
new file mode 100644
index 0000000..e8313a3
--- /dev/null
+++ b/src/test/regress/sql/tablesample.sql
@@ -0,0 +1,17 @@
+CREATE TABLE test_tablesample (id INT, name text) WITH (fillfactor=10); -- force smaller pages so we don't have to load too much data to get multiple pages
+
+INSERT INTO test_tablesample SELECT i, repeat(i::text, 200) FROM generate_series(0, 9) s(i) ORDER BY i;
+
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50) REPEATABLE (10);
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (100.0/11) REPEATABLE (9999);
+SELECT count(*) FROM test_tablesample TABLESAMPLE SYSTEM (100);
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50) REPEATABLE (100);
+SELECT id FROM test_tablesample TABLESAMPLE BERNOULLI (50) REPEATABLE (100);
+SELECT id FROM test_tablesample TABLESAMPLE BERNOULLI (5.5) REPEATABLE (1);
+
+CREATE VIEW test_tablesample_v1 AS SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (10*2) REPEATABLE (2);
+CREATE VIEW test_tablesample_v2 AS SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (99);
+SELECT pg_get_viewdef('test_tablesample_v1'::regclass);
+SELECT pg_get_viewdef('test_tablesample_v2'::regclass);
+
+DROP TABLE test_tablesample CASCADE;
On Tue, Dec 23, 2014 at 5:21 AM, Petr Jelinek <petr@2ndquadrant.com> wrote:
Attached is v3 which besides the fixes mentioned above also includes changes
discussed with Tomas (except the CREATE/DROP TABLESAMPLE METHOD), fixes for
crash with FETCH FIRST and is rebased against current master.
This patch needs a rebase, there is a small conflict in parallel_schedule.
Structurally speaking, I think that the tsm methods should be added in
src/backend/utils and not src/backend/access which is more high-level
as tsm_bernoulli.c and tsm_system.c contain only a set of new
procedure functions. Having a single header file tsm.h would be also a
good thing.
Regarding the naming, is "tsm" (table sample method) really appealing?
Wouldn't it be better to use simply tablesample_* for the file names
and the method names?
This is a large patch... Wouldn't sampling.[c|h] extracted from
ANALYZE live better as a refactoring patch? This would limit a bit bug
occurrences on the main patch.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 06/01/15 08:51, Michael Paquier wrote:
On Tue, Dec 23, 2014 at 5:21 AM, Petr Jelinek <petr@2ndquadrant.com> wrote:
Attached is v3 which besides the fixes mentioned above also includes changes
discussed with Tomas (except the CREATE/DROP TABLESAMPLE METHOD), fixes for
crash with FETCH FIRST and is rebased against current master.This patch needs a rebase, there is a small conflict in parallel_schedule.
Sigh, I really wish we had automation that checks this automatically for
patches in CF.
Structurally speaking, I think that the tsm methods should be added in
src/backend/utils and not src/backend/access which is more high-level
as tsm_bernoulli.c and tsm_system.c contain only a set of new
I am not sure if I parsed this correctly, do you mean to say that only
low-level access functions belong to src/backend/access? Makes sense.
procedure functions. Having a single header file tsm.h would be also a
good thing.
I was thinking about single header also, didn't find a good precedent so
went with two, but don't have problem doing one :).
Regarding the naming, is "tsm" (table sample method) really appealing?
Wouldn't it be better to use simply tablesample_* for the file names
and the method names?
Doesn't really matter to me, I just really don't want to have
tablesample_method_* there as that's way too long for my taste. I'll
think about the naming when I move it to src/backend/utils.
This is a large patch... Wouldn't sampling.[c|h] extracted from
ANALYZE live better as a refactoring patch? This would limit a bit bug
occurrences on the main patch.
That's a good idea, I'll split it into patch series.
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2015-01-06 14:22:16 +0100, Petr Jelinek wrote:
On 06/01/15 08:51, Michael Paquier wrote:
On Tue, Dec 23, 2014 at 5:21 AM, Petr Jelinek <petr@2ndquadrant.com> wrote:
Attached is v3 which besides the fixes mentioned above also includes changes
discussed with Tomas (except the CREATE/DROP TABLESAMPLE METHOD), fixes for
crash with FETCH FIRST and is rebased against current master.This patch needs a rebase, there is a small conflict in parallel_schedule.
Sigh, I really wish we had automation that checks this automatically for
patches in CF.
FWIW, I personally think minor conflicts aren't really an issue and
don't really require a rebase. At least if the patches are in git
format, the reviewer can just use the version they're based on. Perhaps
always stating which version of the tree the patches apply on would be
good practice.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 06/01/15 14:22, Petr Jelinek wrote:
On 06/01/15 08:51, Michael Paquier wrote:
On Tue, Dec 23, 2014 at 5:21 AM, Petr Jelinek <petr@2ndquadrant.com>
wrote:Attached is v3 which besides the fixes mentioned above also includes
changes
discussed with Tomas (except the CREATE/DROP TABLESAMPLE METHOD),
fixes for
crash with FETCH FIRST and is rebased against current master.This patch needs a rebase, there is a small conflict in
parallel_schedule.Sigh, I really wish we had automation that checks this automatically for
patches in CF.
Here is rebase against current master.
Structurally speaking, I think that the tsm methods should be added in
src/backend/utils and not src/backend/access which is more high-level
as tsm_bernoulli.c and tsm_system.c contain only a set of newI am not sure if I parsed this correctly, do you mean to say that only
low-level access functions belong to src/backend/access? Makes sense.
Made this change.
procedure functions. Having a single header file tsm.h would be also a
good thing.
Moved into single tablesample.h file.
Regarding the naming, is "tsm" (table sample method) really appealing?
Wouldn't it be better to use simply tablesample_* for the file names
and the method names?
I created the src/backend/tablesample and files are named just system.c
and bernoulli.c, but I kept tsm_ prefix for methods as they become too
long for my taste when prefixing with tablesample_.
This is a large patch... Wouldn't sampling.[c|h] extracted from
ANALYZE live better as a refactoring patch? This would limit a bit bug
occurrences on the main patch.That's a good idea, I'll split it into patch series.
I've split the sampling.c/h into separate patch.
I also wrote basic CREATE/DROP TABLESAMPLE METHOD support, again as
separate patch in the attached patch-set. This also includes modules
test with simple custom tablesample method.
There are some very minor cleanups in the main tablesample code itself
but no functional changes.
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
0001-separate-block-sampling-functions.patchtext/x-diff; name=0001-separate-block-sampling-functions.patchDownload
>From d799c85e65346615f9fa102890e3c6c6156ce92f Mon Sep 17 00:00:00 2001
From: Petr Jelinek <pjmodos@pjmodos.net>
Date: Wed, 7 Jan 2015 23:36:56 +0100
Subject: [PATCH 1/3] separate block sampling functions
---
src/backend/commands/analyze.c | 123 +++---------------------------------
src/backend/utils/misc/Makefile | 2 +-
src/backend/utils/misc/sampling.c | 130 ++++++++++++++++++++++++++++++++++++++
src/include/utils/sampling.h | 49 ++++++++++++++
4 files changed, 188 insertions(+), 116 deletions(-)
create mode 100644 src/backend/utils/misc/sampling.c
create mode 100644 src/include/utils/sampling.h
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 5de2b39..0e770b7 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -50,23 +50,13 @@
#include "utils/lsyscache.h"
#include "utils/memutils.h"
#include "utils/pg_rusage.h"
+#include "utils/sampling.h"
#include "utils/sortsupport.h"
#include "utils/syscache.h"
#include "utils/timestamp.h"
#include "utils/tqual.h"
-/* Data structure for Algorithm S from Knuth 3.4.2 */
-typedef struct
-{
- BlockNumber N; /* number of blocks, known in advance */
- int n; /* desired sample size */
- BlockNumber t; /* current block number */
- int m; /* blocks selected so far */
-} BlockSamplerData;
-
-typedef BlockSamplerData *BlockSampler;
-
/* Per-index data for ANALYZE */
typedef struct AnlIndexData
{
@@ -88,10 +78,6 @@ static BufferAccessStrategy vac_strategy;
static void do_analyze_rel(Relation onerel, VacuumStmt *vacstmt,
AcquireSampleRowsFunc acquirefunc, BlockNumber relpages,
bool inh, bool in_outer_xact, int elevel);
-static void BlockSampler_Init(BlockSampler bs, BlockNumber nblocks,
- int samplesize);
-static bool BlockSampler_HasMore(BlockSampler bs);
-static BlockNumber BlockSampler_Next(BlockSampler bs);
static void compute_index_stats(Relation onerel, double totalrows,
AnlIndexData *indexdata, int nindexes,
HeapTuple *rows, int numrows,
@@ -947,94 +933,6 @@ examine_attribute(Relation onerel, int attnum, Node *index_expr)
}
/*
- * BlockSampler_Init -- prepare for random sampling of blocknumbers
- *
- * BlockSampler is used for stage one of our new two-stage tuple
- * sampling mechanism as discussed on pgsql-hackers 2004-04-02 (subject
- * "Large DB"). It selects a random sample of samplesize blocks out of
- * the nblocks blocks in the table. If the table has less than
- * samplesize blocks, all blocks are selected.
- *
- * Since we know the total number of blocks in advance, we can use the
- * straightforward Algorithm S from Knuth 3.4.2, rather than Vitter's
- * algorithm.
- */
-static void
-BlockSampler_Init(BlockSampler bs, BlockNumber nblocks, int samplesize)
-{
- bs->N = nblocks; /* measured table size */
-
- /*
- * If we decide to reduce samplesize for tables that have less or not much
- * more than samplesize blocks, here is the place to do it.
- */
- bs->n = samplesize;
- bs->t = 0; /* blocks scanned so far */
- bs->m = 0; /* blocks selected so far */
-}
-
-static bool
-BlockSampler_HasMore(BlockSampler bs)
-{
- return (bs->t < bs->N) && (bs->m < bs->n);
-}
-
-static BlockNumber
-BlockSampler_Next(BlockSampler bs)
-{
- BlockNumber K = bs->N - bs->t; /* remaining blocks */
- int k = bs->n - bs->m; /* blocks still to sample */
- double p; /* probability to skip block */
- double V; /* random */
-
- Assert(BlockSampler_HasMore(bs)); /* hence K > 0 and k > 0 */
-
- if ((BlockNumber) k >= K)
- {
- /* need all the rest */
- bs->m++;
- return bs->t++;
- }
-
- /*----------
- * It is not obvious that this code matches Knuth's Algorithm S.
- * Knuth says to skip the current block with probability 1 - k/K.
- * If we are to skip, we should advance t (hence decrease K), and
- * repeat the same probabilistic test for the next block. The naive
- * implementation thus requires an anl_random_fract() call for each block
- * number. But we can reduce this to one anl_random_fract() call per
- * selected block, by noting that each time the while-test succeeds,
- * we can reinterpret V as a uniform random number in the range 0 to p.
- * Therefore, instead of choosing a new V, we just adjust p to be
- * the appropriate fraction of its former value, and our next loop
- * makes the appropriate probabilistic test.
- *
- * We have initially K > k > 0. If the loop reduces K to equal k,
- * the next while-test must fail since p will become exactly zero
- * (we assume there will not be roundoff error in the division).
- * (Note: Knuth suggests a "<=" loop condition, but we use "<" just
- * to be doubly sure about roundoff error.) Therefore K cannot become
- * less than k, which means that we cannot fail to select enough blocks.
- *----------
- */
- V = anl_random_fract();
- p = 1.0 - (double) k / (double) K;
- while (V < p)
- {
- /* skip */
- bs->t++;
- K--; /* keep K == N - t */
-
- /* adjust p to be new cutoff point in reduced range */
- p *= 1.0 - (double) k / (double) K;
- }
-
- /* select */
- bs->m++;
- return bs->t++;
-}
-
-/*
* acquire_sample_rows -- acquire a random sample of rows from the table
*
* Selected rows are returned in the caller-allocated array rows[], which
@@ -1089,6 +987,8 @@ acquire_sample_rows(Relation onerel, int elevel,
/* Need a cutoff xmin for HeapTupleSatisfiesVacuum */
OldestXmin = GetOldestXmin(onerel, true);
+ /* Seed the sampler random number generator */
+ sampler_setseed(random());
/* Prepare for sampling block numbers */
BlockSampler_Init(&bs, totalblocks, targrows);
/* Prepare for sampling rows */
@@ -1249,7 +1149,7 @@ acquire_sample_rows(Relation onerel, int elevel,
* Found a suitable tuple, so save it, replacing one
* old tuple at random
*/
- int k = (int) (targrows * anl_random_fract());
+ int k = (int) (targrows * sampler_random_fract());
Assert(k >= 0 && k < targrows);
heap_freetuple(rows[k]);
@@ -1308,13 +1208,6 @@ acquire_sample_rows(Relation onerel, int elevel,
return numrows;
}
-/* Select a random value R uniformly distributed in (0 - 1) */
-double
-anl_random_fract(void)
-{
- return ((double) random() + 1) / ((double) MAX_RANDOM_VALUE + 2);
-}
-
/*
* These two routines embody Algorithm Z from "Random sampling with a
* reservoir" by Jeffrey S. Vitter, in ACM Trans. Math. Softw. 11, 1
@@ -1333,7 +1226,7 @@ double
anl_init_selection_state(int n)
{
/* Initial value of W (for use when Algorithm Z is first applied) */
- return exp(-log(anl_random_fract()) / n);
+ return exp(-log(sampler_random_fract()) / n);
}
double
@@ -1348,7 +1241,7 @@ anl_get_next_S(double t, int n, double *stateptr)
double V,
quot;
- V = anl_random_fract(); /* Generate V */
+ V = sampler_random_fract(); /* Generate V */
S = 0;
t += 1;
/* Note: "num" in Vitter's code is always equal to t - n */
@@ -1380,7 +1273,7 @@ anl_get_next_S(double t, int n, double *stateptr)
tmp;
/* Generate U and X */
- U = anl_random_fract();
+ U = sampler_random_fract();
X = t * (W - 1.0);
S = floor(X); /* S is tentatively set to floor(X) */
/* Test if U <= h(S)/cg(X) in the manner of (6.3) */
@@ -1409,7 +1302,7 @@ anl_get_next_S(double t, int n, double *stateptr)
y *= numer / denom;
denom -= 1;
}
- W = exp(-log(anl_random_fract()) / n); /* Generate W in advance */
+ W = exp(-log(sampler_random_fract()) / n); /* Generate W in advance */
if (exp(log(y) / n) <= (t + X) / t)
break;
}
diff --git a/src/backend/utils/misc/Makefile b/src/backend/utils/misc/Makefile
index 449d5b4..848ba29 100644
--- a/src/backend/utils/misc/Makefile
+++ b/src/backend/utils/misc/Makefile
@@ -15,7 +15,7 @@ include $(top_builddir)/src/Makefile.global
override CPPFLAGS := -I. -I$(srcdir) $(CPPFLAGS)
OBJS = guc.o help_config.o pg_rusage.o ps_status.o \
- superuser.o timeout.o tzparser.o
+ sampling.o superuser.o timeout.o tzparser.o
# This location might depend on the installation directories. Therefore
# we can't subsitute it into pg_config.h.
diff --git a/src/backend/utils/misc/sampling.c b/src/backend/utils/misc/sampling.c
new file mode 100644
index 0000000..71a91f9
--- /dev/null
+++ b/src/backend/utils/misc/sampling.c
@@ -0,0 +1,130 @@
+/*-------------------------------------------------------------------------
+ *
+ * sampling.c
+ * Block sampling routines shared by ANALYZE and TABLESAMPLE.
+ *
+ * Portions Copyright (c) 1996-2012, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ * src/backend/utils/misc/sampling.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include <math.h>
+
+#include "utils/sampling.h"
+
+static unsigned short _sampler_seed[3] = { 0x330e, 0xabcd, 0x1234 };
+
+/*
+ * BlockSampler_Init -- prepare for random sampling of blocknumbers
+ *
+ * BlockSampler is used for stage one of our new two-stage tuple
+ * sampling mechanism as discussed on pgsql-hackers 2004-04-02 (subject
+ * "Large DB"). It selects a random sample of samplesize blocks out of
+ * the nblocks blocks in the table. If the table has less than
+ * samplesize blocks, all blocks are selected.
+ *
+ * Since we know the total number of blocks in advance, we can use the
+ * straightforward Algorithm S from Knuth 3.4.2, rather than Vitter's
+ * algorithm.
+ */
+void
+BlockSampler_Init(BlockSampler bs, BlockNumber nblocks, int samplesize)
+{
+ bs->N = nblocks; /* measured table size */
+
+ /*
+ * If we decide to reduce samplesize for tables that have less or not much
+ * more than samplesize blocks, here is the place to do it.
+ */
+ bs->n = samplesize;
+ bs->t = 0; /* blocks scanned so far */
+ bs->m = 0; /* blocks selected so far */
+}
+
+bool
+BlockSampler_HasMore(BlockSampler bs)
+{
+ return (bs->t < bs->N) && (bs->m < bs->n);
+}
+
+BlockNumber
+BlockSampler_Next(BlockSampler bs)
+{
+ BlockNumber K = bs->N - bs->t; /* remaining blocks */
+ int k = bs->n - bs->m; /* blocks still to sample */
+ double p; /* probability to skip block */
+ double V; /* random */
+
+ Assert(BlockSampler_HasMore(bs)); /* hence K > 0 and k > 0 */
+
+ if ((BlockNumber) k >= K)
+ {
+ /* need all the rest */
+ bs->m++;
+ return bs->t++;
+ }
+
+ /*----------
+ * It is not obvious that this code matches Knuth's Algorithm S.
+ * Knuth says to skip the current block with probability 1 - k/K.
+ * If we are to skip, we should advance t (hence decrease K), and
+ * repeat the same probabilistic test for the next block. The naive
+ * implementation thus requires an sampler_random_fract() call for each
+ * block number. But we can reduce this to one sampler_random_fract()
+ * call per selected block, by noting that each time the while-test
+ * succeeds, we can reinterpret V as a uniform random number in the range
+ * 0 to p. Therefore, instead of choosing a new V, we just adjust p to be
+ * the appropriate fraction of its former value, and our next loop
+ * makes the appropriate probabilistic test.
+ *
+ * We have initially K > k > 0. If the loop reduces K to equal k,
+ * the next while-test must fail since p will become exactly zero
+ * (we assume there will not be roundoff error in the division).
+ * (Note: Knuth suggests a "<=" loop condition, but we use "<" just
+ * to be doubly sure about roundoff error.) Therefore K cannot become
+ * less than k, which means that we cannot fail to select enough blocks.
+ *----------
+ */
+ V = sampler_random_fract();
+ p = 1.0 - (double) k / (double) K;
+ while (V < p)
+ {
+ /* skip */
+ bs->t++;
+ K--; /* keep K == N - t */
+
+ /* adjust p to be new cutoff point in reduced range */
+ p *= 1.0 - (double) k / (double) K;
+ }
+
+ /* select */
+ bs->m++;
+ return bs->t++;
+}
+
+
+/*----------
+ * Random number generator used by sampling
+ *----------
+ */
+void
+sampler_setseed(long seed)
+{
+ _sampler_seed[0] = 0x330e;
+ _sampler_seed[1] = (unsigned short) seed;
+ _sampler_seed[2] = (unsigned short) (seed >> 16);
+}
+
+/* Select a random value R uniformly distributed in (0 - 1) */
+double
+sampler_random_fract(void)
+{
+ return pg_erand48(_sampler_seed);
+}
diff --git a/src/include/utils/sampling.h b/src/include/utils/sampling.h
new file mode 100644
index 0000000..734cdc0
--- /dev/null
+++ b/src/include/utils/sampling.h
@@ -0,0 +1,49 @@
+/*-------------------------------------------------------------------------
+ *
+ * sampling.h
+ * definitions for sampling functions
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/utils/sampling.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef SAMPLING_H
+#define SAMPLING_H
+
+#include "storage/bufmgr.h"
+
+typedef enum SamplerAccessStrategy
+{
+ SAS_RANDOM,
+ SAS_SEQUENTIAL
+} SamplerAccessStrategy;
+
+/* Data structure for Algorithm S from Knuth 3.4.2 */
+typedef struct
+{
+ BlockNumber N; /* number of blocks, known in advance */
+ int n; /* desired sample size */
+ BlockNumber t; /* current block number */
+ int m; /* blocks selected so far */
+} BlockSamplerData;
+
+typedef BlockSamplerData *BlockSampler;
+
+extern void BlockSampler_Init(BlockSampler bs, BlockNumber nblocks,
+ int samplesize);
+extern bool BlockSampler_HasMore(BlockSampler bs);
+extern BlockNumber BlockSampler_Next(BlockSampler bs);
+
+/* Vitter reservoir sampling functions */
+extern double vitter_init_selection_state(int n);
+extern double vitter_get_next_S(double t, int n, double *stateptr);
+
+/* Random generator */
+extern void sampler_setseed(long seed);
+extern double sampler_random_fract(void);
+
+#endif /* SAMPLING_H */
--
1.9.1
0002-tablesample-v4.patchtext/x-diff; name=0002-tablesample-v4.patchDownload
>From 19b0cde3a84841790eb8e10c3eacd21b581aa4a3 Mon Sep 17 00:00:00 2001
From: Petr Jelinek <pjmodos@pjmodos.net>
Date: Wed, 7 Jan 2015 23:37:55 +0100
Subject: [PATCH 2/3] tablesample v4
---
doc/src/sgml/ref/select.sgml | 34 ++-
src/backend/access/Makefile | 3 +-
src/backend/catalog/Makefile | 2 +-
src/backend/commands/explain.c | 7 +
src/backend/executor/Makefile | 2 +-
src/backend/executor/execAmi.c | 8 +
src/backend/executor/execCurrent.c | 1 +
src/backend/executor/execProcnode.c | 14 +
src/backend/executor/nodeSamplescan.c | 404 ++++++++++++++++++++++++++++
src/backend/nodes/copyfuncs.c | 57 ++++
src/backend/nodes/equalfuncs.c | 34 +++
src/backend/nodes/nodeFuncs.c | 12 +
src/backend/nodes/outfuncs.c | 58 ++++
src/backend/nodes/readfuncs.c | 42 +++
src/backend/optimizer/path/allpaths.c | 35 +++
src/backend/optimizer/path/costsize.c | 67 +++++
src/backend/optimizer/plan/createplan.c | 69 +++++
src/backend/optimizer/plan/setrefs.c | 11 +
src/backend/optimizer/plan/subselect.c | 1 +
src/backend/optimizer/util/pathnode.c | 29 ++
src/backend/parser/gram.y | 40 ++-
src/backend/parser/parse_clause.c | 38 ++-
src/backend/parser/parse_func.c | 128 +++++++++
src/backend/utils/Makefile | 3 +-
src/backend/utils/adt/ruleutils.c | 50 ++++
src/backend/utils/cache/syscache.c | 23 ++
src/backend/utils/tablesample/Makefile | 17 ++
src/backend/utils/tablesample/bernoulli.c | 199 ++++++++++++++
src/backend/utils/tablesample/system.c | 186 +++++++++++++
src/include/catalog/indexing.h | 5 +
src/include/catalog/pg_proc.h | 25 ++
src/include/catalog/pg_tablesample_method.h | 70 +++++
src/include/executor/nodeSamplescan.h | 24 ++
src/include/nodes/execnodes.h | 20 ++
src/include/nodes/nodes.h | 5 +
src/include/nodes/parsenodes.h | 33 +++
src/include/nodes/plannodes.h | 6 +
src/include/nodes/relation.h | 12 +
src/include/optimizer/cost.h | 1 +
src/include/optimizer/pathnode.h | 2 +
src/include/parser/kwlist.h | 3 +-
src/include/parser/parse_func.h | 4 +
src/include/utils/rel.h | 1 -
src/include/utils/sampling.h | 4 -
src/include/utils/syscache.h | 2 +
src/include/utils/tablesample.h | 27 ++
src/test/regress/expected/sanity_check.out | 1 +
src/test/regress/expected/tablesample.out | 77 ++++++
src/test/regress/parallel_schedule | 2 +-
src/test/regress/serial_schedule | 1 +
src/test/regress/sql/tablesample.sql | 17 ++
51 files changed, 1899 insertions(+), 17 deletions(-)
create mode 100644 src/backend/executor/nodeSamplescan.c
create mode 100644 src/backend/utils/tablesample/Makefile
create mode 100644 src/backend/utils/tablesample/bernoulli.c
create mode 100644 src/backend/utils/tablesample/system.c
create mode 100644 src/include/catalog/pg_tablesample_method.h
create mode 100644 src/include/executor/nodeSamplescan.h
create mode 100644 src/include/utils/tablesample.h
create mode 100644 src/test/regress/expected/tablesample.out
create mode 100644 src/test/regress/sql/tablesample.sql
diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml
index 01d24a5..250ae29 100644
--- a/doc/src/sgml/ref/select.sgml
+++ b/doc/src/sgml/ref/select.sgml
@@ -49,7 +49,7 @@ SELECT [ ALL | DISTINCT [ ON ( <replaceable class="parameter">expression</replac
<phrase>where <replaceable class="parameter">from_item</replaceable> can be one of:</phrase>
- [ ONLY ] <replaceable class="parameter">table_name</replaceable> [ * ] [ [ AS ] <replaceable class="parameter">alias</replaceable> [ ( <replaceable class="parameter">column_alias</replaceable> [, ...] ) ] ]
+ [ ONLY ] <replaceable class="parameter">table_name</replaceable> [ * ] [ TABLESAMPLE <replaceable class="parameter">sampling_method</replaceable> ( <replaceable class="parameter">argument</replaceable> [, ...] ) [ REPEATABLE ( <replaceable class="parameter">seed</replaceable> ) ] ] [ [ AS ] <replaceable class="parameter">alias</replaceable> [ ( <replaceable class="parameter">column_alias</replaceable> [, ...] ) ] ]
[ LATERAL ] ( <replaceable class="parameter">select</replaceable> ) [ AS ] <replaceable class="parameter">alias</replaceable> [ ( <replaceable class="parameter">column_alias</replaceable> [, ...] ) ]
<replaceable class="parameter">with_query_name</replaceable> [ [ AS ] <replaceable class="parameter">alias</replaceable> [ ( <replaceable class="parameter">column_alias</replaceable> [, ...] ) ] ]
[ LATERAL ] <replaceable class="parameter">function_name</replaceable> ( [ <replaceable class="parameter">argument</replaceable> [, ...] ] )
@@ -317,6 +317,38 @@ TABLE [ ONLY ] <replaceable class="parameter">table_name</replaceable> [ * ]
</varlistentry>
<varlistentry>
+ <term>TABLESAMPLE <replaceable class="parameter">sampling_method</replaceable> ( <replaceable class="parameter">argument</replaceable> [, ...] ) [ REPEATABLE ( <replaceable class="parameter">seed</replaceable> ) ]</term>
+ <listitem>
+ <para>
+ Table sample clause after
+ <replaceable class="parameter">table_name</replaceable> indicates that
+ a <replaceable class="parameter">sampling_method</replaceable> should
+ be used to retrieve subset of rows in the table.
+ The <replaceable class="parameter">sampling_method</replaceable> can be
+ one of:
+ <itemizedlist>
+ <listitem>
+ <para><literal>SYSTEM</literal></para>
+ </listitem>
+ <listitem>
+ <para><literal>BERNOULLI</literal></para>
+ </listitem>
+ </itemizedlist>
+ Both of those sampling methods currently accept only single argument
+ which is the percent (floating point from 0 to 100) of the rows to
+ be returned.
+ The <literal>SYSTEM</literal> sampling method does block level
+ sampling with each block having same chance of being selected and
+ returns all rows from each selected block.
+ The <literal>BERNOULLI</literal> scans whole table and returns
+ individual rows with equal probability.
+ The optional numeric parameter <literal>REPEATABLE</literal> is used
+ as random seed for sampling.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><replaceable class="parameter">alias</replaceable></term>
<listitem>
<para>
diff --git a/src/backend/access/Makefile b/src/backend/access/Makefile
index 21721b4..238057a 100644
--- a/src/backend/access/Makefile
+++ b/src/backend/access/Makefile
@@ -8,6 +8,7 @@ subdir = src/backend/access
top_builddir = ../../..
include $(top_builddir)/src/Makefile.global
-SUBDIRS = brin common gin gist hash heap index nbtree rmgrdesc spgist transam
+SUBDIRS = brin common gin gist hash heap index nbtree rmgrdesc spgist \
+ transam
include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/catalog/Makefile b/src/backend/catalog/Makefile
index a403c64..34db3e6 100644
--- a/src/backend/catalog/Makefile
+++ b/src/backend/catalog/Makefile
@@ -39,7 +39,7 @@ POSTGRES_BKI_SRCS = $(addprefix $(top_srcdir)/src/include/catalog/,\
pg_ts_config.h pg_ts_config_map.h pg_ts_dict.h \
pg_ts_parser.h pg_ts_template.h pg_extension.h \
pg_foreign_data_wrapper.h pg_foreign_server.h pg_user_mapping.h \
- pg_foreign_table.h pg_policy.h \
+ pg_foreign_table.h pg_policy.h pg_tablesample_method.h \
pg_default_acl.h pg_seclabel.h pg_shseclabel.h pg_collation.h pg_range.h \
toasting.h indexing.h \
)
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 8a0be5d..5152964 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -724,6 +724,7 @@ ExplainPreScanNode(PlanState *planstate, Bitmapset **rels_used)
case T_WorkTableScan:
case T_ForeignScan:
case T_CustomScan:
+ case T_SampleScan:
*rels_used = bms_add_member(*rels_used,
((Scan *) plan)->scanrelid);
break;
@@ -950,6 +951,9 @@ ExplainNode(PlanState *planstate, List *ancestors,
else
pname = sname;
break;
+ case T_SampleScan:
+ pname = sname = "Sample Scan";
+ break;
case T_Material:
pname = sname = "Materialize";
break;
@@ -1067,6 +1071,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
case T_WorkTableScan:
case T_ForeignScan:
case T_CustomScan:
+ case T_SampleScan:
ExplainScanTarget((Scan *) plan, es);
break;
case T_IndexScan:
@@ -1319,6 +1324,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
case T_CteScan:
case T_WorkTableScan:
case T_SubqueryScan:
+ case T_SampleScan:
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
@@ -2147,6 +2153,7 @@ ExplainTargetRel(Plan *plan, Index rti, ExplainState *es)
case T_TidScan:
case T_ForeignScan:
case T_CustomScan:
+ case T_SampleScan:
case T_ModifyTable:
/* Assert it's on a real relation */
Assert(rte->rtekind == RTE_RELATION);
diff --git a/src/backend/executor/Makefile b/src/backend/executor/Makefile
index af707b0..75f799c 100644
--- a/src/backend/executor/Makefile
+++ b/src/backend/executor/Makefile
@@ -21,7 +21,7 @@ OBJS = execAmi.o execCurrent.o execGrouping.o execJunk.o execMain.o \
nodeLimit.o nodeLockRows.o \
nodeMaterial.o nodeMergeAppend.o nodeMergejoin.o nodeModifyTable.o \
nodeNestloop.o nodeFunctionscan.o nodeRecursiveunion.o nodeResult.o \
- nodeSeqscan.o nodeSetOp.o nodeSort.o nodeUnique.o \
+ nodeSamplescan.o nodeSeqscan.o nodeSetOp.o nodeSort.o nodeUnique.o \
nodeValuesscan.o nodeCtescan.o nodeWorktablescan.o \
nodeGroup.o nodeSubplan.o nodeSubqueryscan.o nodeTidscan.o \
nodeForeignscan.o nodeWindowAgg.o tstoreReceiver.o spi.o
diff --git a/src/backend/executor/execAmi.c b/src/backend/executor/execAmi.c
index 6ebad2f..4948a26 100644
--- a/src/backend/executor/execAmi.c
+++ b/src/backend/executor/execAmi.c
@@ -39,6 +39,7 @@
#include "executor/nodeNestloop.h"
#include "executor/nodeRecursiveunion.h"
#include "executor/nodeResult.h"
+#include "executor/nodeSamplescan.h"
#include "executor/nodeSeqscan.h"
#include "executor/nodeSetOp.h"
#include "executor/nodeSort.h"
@@ -155,6 +156,10 @@ ExecReScan(PlanState *node)
ExecReScanSeqScan((SeqScanState *) node);
break;
+ case T_SampleScanState:
+ ExecReScanSampleScan((SampleScanState *) node);
+ break;
+
case T_IndexScanState:
ExecReScanIndexScan((IndexScanState *) node);
break;
@@ -480,6 +485,9 @@ ExecSupportsBackwardScan(Plan *node)
}
return false;
+ case T_SampleScan:
+ return false;
+
case T_Material:
case T_Sort:
/* these don't evaluate tlist */
diff --git a/src/backend/executor/execCurrent.c b/src/backend/executor/execCurrent.c
index 1c8be25..5cfe549 100644
--- a/src/backend/executor/execCurrent.c
+++ b/src/backend/executor/execCurrent.c
@@ -261,6 +261,7 @@ search_plan_tree(PlanState *node, Oid table_oid)
* Relation scan nodes can all be treated alike
*/
case T_SeqScanState:
+ case T_SampleScanState:
case T_IndexScanState:
case T_IndexOnlyScanState:
case T_BitmapHeapScanState:
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 9892499..03c2feb 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -102,6 +102,7 @@
#include "executor/nodeNestloop.h"
#include "executor/nodeRecursiveunion.h"
#include "executor/nodeResult.h"
+#include "executor/nodeSamplescan.h"
#include "executor/nodeSeqscan.h"
#include "executor/nodeSetOp.h"
#include "executor/nodeSort.h"
@@ -190,6 +191,11 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
estate, eflags);
break;
+ case T_SampleScan:
+ result = (PlanState *) ExecInitSampleScan((SampleScan *) node,
+ estate, eflags);
+ break;
+
case T_IndexScan:
result = (PlanState *) ExecInitIndexScan((IndexScan *) node,
estate, eflags);
@@ -406,6 +412,10 @@ ExecProcNode(PlanState *node)
result = ExecSeqScan((SeqScanState *) node);
break;
+ case T_SampleScanState:
+ result = ExecSampleScan((SampleScanState *) node);
+ break;
+
case T_IndexScanState:
result = ExecIndexScan((IndexScanState *) node);
break;
@@ -644,6 +654,10 @@ ExecEndNode(PlanState *node)
ExecEndSeqScan((SeqScanState *) node);
break;
+ case T_SampleScanState:
+ ExecEndSampleScan((SampleScanState *) node);
+ break;
+
case T_IndexScanState:
ExecEndIndexScan((IndexScanState *) node);
break;
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
new file mode 100644
index 0000000..27f5f05
--- /dev/null
+++ b/src/backend/executor/nodeSamplescan.c
@@ -0,0 +1,404 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeSamplescan.c
+ * Support routines for sample scans of relations (table sampling).
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ * src/backend/executor/nodeSamplescan.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/relscan.h"
+#include "catalog/pg_tablesample_method.h"
+#include "executor/executor.h"
+#include "executor/nodeSamplescan.h"
+#include "miscadmin.h"
+#include "parser/parsetree.h"
+#include "storage/bufmgr.h"
+#include "utils/rel.h"
+#include "utils/syscache.h"
+#include "utils/tqual.h"
+
+static void InitScanRelation(SampleScanState *node, EState *estate, int eflags);
+static TupleTableSlot *SampleNext(SampleScanState *node);
+
+/*
+ * Initialize the sampling method - loads function info and
+ * calls the tsminit function.
+ *
+ * We need special handling for this because the tsminit function
+ * is allowed to take optional additional arguments.
+ */
+static void
+InitSamplingMethod(SampleScanState *scanstate, TableSampleClause *tablesample)
+{
+ FunctionCallInfoData fcinfo;
+ int i;
+ List *args = tablesample->args;
+ ListCell *arg;
+ ExprContext *econtext = scanstate->ss.ps.ps_ExprContext;
+
+ /* Load functions */
+ fmgr_info(tablesample->tsminit, &(scanstate->tsminit));
+ fmgr_info(tablesample->tsmnextblock, &(scanstate->tsmnextblock));
+ fmgr_info(tablesample->tsmnexttuple, &(scanstate->tsmnexttuple));
+ fmgr_info(tablesample->tsmend, &(scanstate->tsmend));
+ fmgr_info(tablesample->tsmreset, &(scanstate->tsmreset));
+
+ InitFunctionCallInfoData(fcinfo, &scanstate->tsminit,
+ list_length(args) + 2,
+ InvalidOid, NULL, NULL);
+
+ /* First arg is always SampleScanState */
+ fcinfo.arg[0] = PointerGetDatum(scanstate);
+ fcinfo.argnull[0] = false;
+
+ /*
+ * Second arg is always REPEATABLE
+ * When tablesample->repeatable is NULL then REPEATABLE clause was not
+ * specified.
+ * When specified, the expression cannot evaluate to NULL.
+ */
+ if (tablesample->repeatable)
+ {
+ ExprState *argstate = ExecInitExpr((Expr *) tablesample->repeatable,
+ (PlanState *) scanstate);
+ fcinfo.arg[1] = ExecEvalExpr(argstate, econtext,
+ &fcinfo.argnull[1], NULL);
+ if (fcinfo.argnull[1])
+ ereport(ERROR,
+ (errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED),
+ errmsg("REPEATABLE clause cannot be NULL")));
+ }
+ else
+ {
+ fcinfo.arg[1] = UInt32GetDatum(random());
+ fcinfo.argnull[1] = false;
+ }
+
+
+ i = 2;
+ foreach(arg, args)
+ {
+ Expr *argexpr = (Expr *) lfirst(arg);
+ ExprState *argstate = ExecInitExpr(argexpr, (PlanState *) scanstate);
+
+ if (argstate == NULL)
+ {
+ fcinfo.argnull[i] = true;
+ fcinfo.arg[i] = (Datum) 0;;
+ }
+
+ fcinfo.arg[i] = ExecEvalExpr(argstate, econtext,
+ &fcinfo.argnull[i], NULL);
+ i++;
+ }
+ Assert(i == fcinfo.nargs);
+
+ (void) FunctionCallInvoke(&fcinfo);
+}
+
+
+/* ----------------------------------------------------------------
+ * Scan Support
+ * ----------------------------------------------------------------
+ */
+
+/* ----------------------------------------------------------------
+ * SampleNext
+ *
+ * This is a workhorse for ExecSampleScan
+ * ----------------------------------------------------------------
+ */
+static TupleTableSlot *
+SampleNext(SampleScanState *node)
+{
+ EState *estate;
+ TupleTableSlot *slot;
+ BlockNumber blockno = InvalidBlockNumber;
+ Snapshot snapshot;
+ Relation relation;
+ bool found = false;
+ bool retry = false;
+ OffsetNumber tupoffset, maxoffset;
+ Buffer buffer;
+ Page page;
+ HeapTuple tuple = &(node->tup);
+
+ /*
+ * get information from the estate and scan state
+ */
+ estate = node->ss.ps.state;
+ snapshot = estate->es_snapshot;
+ slot = node->ss.ss_ScanTupleSlot;
+ relation = node->ss.ss_currentRelation;
+ buffer = node->openbuffer;
+
+ if (BufferIsValid(buffer))
+ {
+ blockno = BufferGetBlockNumber(buffer);
+ page = BufferGetPage(buffer);
+ maxoffset = PageGetMaxOffsetNumber(page);
+ }
+
+ /*
+ * get the next tuple from the table
+ */
+ for (;;)
+ {
+ ItemId itemid;
+
+ CHECK_FOR_INTERRUPTS();
+
+ /* Load next block if needed. */
+ if (!BufferIsValid(buffer))
+ {
+ blockno = DatumGetInt32(FunctionCall2(&node->tsmnextblock,
+ PointerGetDatum(node),
+ BoolGetDatum(retry)));
+ /* No more blocks to fetch */
+ if (!BlockNumberIsValid(blockno))
+ break;
+
+ buffer = ReadBufferExtended(relation, MAIN_FORKNUM, blockno,
+ RBM_NORMAL, NULL);
+ LockBuffer(buffer, BUFFER_LOCK_SHARE);
+
+ node->openbuffer = buffer;
+ page = BufferGetPage(buffer);
+ maxoffset = PageGetMaxOffsetNumber(page);
+ }
+
+ tupoffset = DatumGetUInt16(FunctionCall4(&node->tsmnexttuple,
+ PointerGetDatum(node),
+ UInt32GetDatum(blockno),
+ UInt16GetDatum(maxoffset),
+ BoolGetDatum(retry)));
+ /* Go to next block. */
+ if (!OffsetNumberIsValid(tupoffset))
+ {
+ UnlockReleaseBuffer(buffer);
+ node->openbuffer = buffer = InvalidBuffer;
+ continue;
+ }
+ retry = true;
+
+ /* Skip invalid tuple pointers. */
+ itemid = PageGetItemId(page, tupoffset);
+ if (!ItemIdIsNormal(itemid))
+ continue;
+
+ tuple->t_tableOid = RelationGetRelid(relation);
+ tuple->t_data = (HeapTupleHeader) PageGetItem(page, itemid);
+ tuple->t_len = ItemIdGetLength(itemid);
+ ItemPointerSet(&tuple->t_self, blockno, tupoffset);
+
+ /* Found visible tuple, return it. */
+ if (HeapTupleSatisfiesVisibility(tuple, snapshot, buffer))
+ {
+ found = true;
+ break;
+ }
+ }
+
+ if (found)
+ ExecStoreTuple(tuple, /* tuple to store */
+ slot, /* slot to store in */
+ buffer, /* buffer associated with this tuple */
+ false); /* don't pfree this pointer */
+ else
+ ExecClearTuple(slot);
+
+ return slot;
+}
+
+/*
+ * SampleRecheck -- access method routine to recheck a tuple in EvalPlanQual
+ */
+static bool
+SampleRecheck(SampleScanState *node, TupleTableSlot *slot)
+{
+ /* No need to recheck for SampleScan */
+ return true;
+}
+
+/* ----------------------------------------------------------------
+ * ExecSampleScan(node)
+ *
+ * Scans the relation using the sampling method and returns
+ * the next qualifying tuple.
+ * We call the ExecScan() routine and pass it the appropriate
+ * access method functions.
+ * ----------------------------------------------------------------
+ */
+TupleTableSlot *
+ExecSampleScan(SampleScanState *node)
+{
+ return ExecScan((ScanState *) node,
+ (ExecScanAccessMtd) SampleNext,
+ (ExecScanRecheckMtd) SampleRecheck);
+}
+
+/* ----------------------------------------------------------------
+ * InitScanRelation
+ *
+ * Set up to access the scan relation.
+ * ----------------------------------------------------------------
+ */
+static void
+InitScanRelation(SampleScanState *node, EState *estate, int eflags)
+{
+ Relation currentRelation;
+
+ /*
+ * get the relation object id from the relid'th entry in the range table,
+ * open that relation and acquire appropriate lock on it.
+ */
+ currentRelation = ExecOpenScanRelation(estate,
+ ((SampleScan *) node->ss.ps.plan)->scanrelid,
+ eflags);
+
+ node->ss.ss_currentRelation = currentRelation;
+ node->ss.ss_currentScanDesc = NULL;
+
+ /* and report the scan tuple slot's rowtype */
+ ExecAssignScanType(&node->ss, RelationGetDescr(currentRelation));
+}
+
+
+/* ----------------------------------------------------------------
+ * ExecInitSampleScan
+ * ----------------------------------------------------------------
+ */
+SampleScanState *
+ExecInitSampleScan(SampleScan *node, EState *estate, int eflags)
+{
+ SampleScanState *scanstate;
+ RangeTblEntry *rte = rt_fetch(node->scanrelid,
+ estate->es_range_table);
+
+ Assert(outerPlan(node) == NULL);
+ Assert(innerPlan(node) == NULL);
+ Assert(rte->tablesample != NULL);
+
+ /*
+ * create state structure
+ */
+ scanstate = makeNode(SampleScanState);
+ scanstate->ss.ps.plan = (Plan *) node;
+ scanstate->ss.ps.state = estate;
+
+ /*
+ * Miscellaneous initialization
+ *
+ * create expression context for node
+ */
+ ExecAssignExprContext(estate, &scanstate->ss.ps);
+
+ /*
+ * initialize child expressions
+ */
+ scanstate->ss.ps.targetlist = (List *)
+ ExecInitExpr((Expr *) node->plan.targetlist,
+ (PlanState *) scanstate);
+ scanstate->ss.ps.qual = (List *)
+ ExecInitExpr((Expr *) node->plan.qual,
+ (PlanState *) scanstate);
+
+ /*
+ * tuple table initialization
+ */
+ ExecInitResultTupleSlot(estate, &scanstate->ss.ps);
+ ExecInitScanTupleSlot(estate, &scanstate->ss);
+
+ /*
+ * initialize scan relation
+ */
+ InitScanRelation(scanstate, estate, eflags);
+
+ scanstate->ss.ps.ps_TupFromTlist = false;
+
+ /*
+ * Initialize result tuple type and projection info.
+ */
+ ExecAssignResultTypeFromTL(&scanstate->ss.ps);
+ ExecAssignScanProjectionInfo(&scanstate->ss);
+
+ scanstate->openbuffer = InvalidBuffer;
+
+ InitSamplingMethod(scanstate, rte->tablesample);
+
+ return scanstate;
+}
+
+/* ----------------------------------------------------------------
+ * ExecEndSampleScan
+ *
+ * frees any storage allocated through C routines.
+ * ----------------------------------------------------------------
+ */
+void
+ExecEndSampleScan(SampleScanState *node)
+{
+ /*
+ * Tell sampling function that we finished thes can.
+ */
+ FunctionCall1(&node->tsmend, PointerGetDatum(node));
+
+ if (BufferIsValid(node->openbuffer))
+ {
+ UnlockReleaseBuffer(node->openbuffer);
+ node->openbuffer = InvalidBuffer;
+ }
+
+ /*
+ * Free the exprcontext
+ */
+ ExecFreeExprContext(&node->ss.ps);
+
+ /*
+ * clean out the tuple table
+ */
+ ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
+
+ /*
+ * close the heap relation.
+ */
+ ExecCloseScanRelation(node->ss.ss_currentRelation);
+}
+
+/* ----------------------------------------------------------------
+ * Join Support
+ * ----------------------------------------------------------------
+ */
+
+/* ----------------------------------------------------------------
+ * ExecReScanSampleScan
+ *
+ * Rescans the relation.
+ *
+ * ----------------------------------------------------------------
+ */
+void
+ExecReScanSampleScan(SampleScanState *scanstate)
+{
+ if (BufferIsValid(scanstate->openbuffer))
+ {
+ UnlockReleaseBuffer(scanstate->openbuffer);
+ scanstate->openbuffer = InvalidBuffer;
+ }
+
+ /*
+ * Tell sampling function to reset its state for rescan.
+ */
+ FunctionCall1(&scanstate->tsmreset, PointerGetDatum(scanstate));
+
+ ExecScanReScan(&scanstate->ss);
+}
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index f1a24f5..8861512 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -628,6 +628,22 @@ _copyCustomScan(const CustomScan *from)
}
/*
+ * _copySampleScan
+ */
+static SampleScan *
+_copySampleScan(const SampleScan *from)
+{
+ SampleScan *newnode = makeNode(SampleScan);
+
+ /*
+ * copy node superclass fields
+ */
+ CopyScanFields((const Scan *) from, (Scan *) newnode);
+
+ return newnode;
+}
+
+/*
* CopyJoinFields
*
* This function copies the fields of the Join node. It is used by
@@ -2006,6 +2022,7 @@ _copyRangeTblEntry(const RangeTblEntry *from)
COPY_SCALAR_FIELD(rtekind);
COPY_SCALAR_FIELD(relid);
COPY_SCALAR_FIELD(relkind);
+ COPY_NODE_FIELD(tablesample);
COPY_NODE_FIELD(subquery);
COPY_SCALAR_FIELD(security_barrier);
COPY_SCALAR_FIELD(jointype);
@@ -2138,6 +2155,37 @@ _copyCommonTableExpr(const CommonTableExpr *from)
return newnode;
}
+static RangeTableSample *
+_copyRangeTableSample(const RangeTableSample *from)
+{
+ RangeTableSample *newnode = makeNode(RangeTableSample);
+
+ COPY_NODE_FIELD(relation);
+ COPY_STRING_FIELD(method);
+ COPY_NODE_FIELD(repeatable);
+ COPY_NODE_FIELD(args);
+
+ return newnode;
+}
+
+static TableSampleClause *
+_copyTableSampleClause(const TableSampleClause *from)
+{
+ TableSampleClause *newnode = makeNode(TableSampleClause);
+
+ COPY_SCALAR_FIELD(tsmid);
+ COPY_SCALAR_FIELD(tsminit);
+ COPY_SCALAR_FIELD(tsmnextblock);
+ COPY_SCALAR_FIELD(tsmnexttuple);
+ COPY_SCALAR_FIELD(tsmend);
+ COPY_SCALAR_FIELD(tsmreset);
+ COPY_SCALAR_FIELD(tsmcost);
+ COPY_NODE_FIELD(repeatable);
+ COPY_NODE_FIELD(args);
+
+ return newnode;
+}
+
static A_Expr *
_copyAExpr(const A_Expr *from)
{
@@ -4075,6 +4123,9 @@ copyObject(const void *from)
case T_CustomScan:
retval = _copyCustomScan(from);
break;
+ case T_SampleScan:
+ retval = _copySampleScan(from);
+ break;
case T_Join:
retval = _copyJoin(from);
break;
@@ -4723,6 +4774,12 @@ copyObject(const void *from)
case T_CommonTableExpr:
retval = _copyCommonTableExpr(from);
break;
+ case T_RangeTableSample:
+ retval = _copyRangeTableSample(from);
+ break;
+ case T_TableSampleClause:
+ retval = _copyTableSampleClause(from);
+ break;
case T_PrivGrantee:
retval = _copyPrivGrantee(from);
break;
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 6e8b308..1e7ebbf 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -2323,6 +2323,7 @@ _equalRangeTblEntry(const RangeTblEntry *a, const RangeTblEntry *b)
COMPARE_SCALAR_FIELD(rtekind);
COMPARE_SCALAR_FIELD(relid);
COMPARE_SCALAR_FIELD(relkind);
+ COMPARE_NODE_FIELD(tablesample);
COMPARE_NODE_FIELD(subquery);
COMPARE_SCALAR_FIELD(security_barrier);
COMPARE_SCALAR_FIELD(jointype);
@@ -2442,6 +2443,33 @@ _equalCommonTableExpr(const CommonTableExpr *a, const CommonTableExpr *b)
}
static bool
+_equalRangeTableSample(const RangeTableSample *a, const RangeTableSample *b)
+{
+ COMPARE_NODE_FIELD(relation);
+ COMPARE_STRING_FIELD(method);
+ COMPARE_NODE_FIELD(repeatable);
+ COMPARE_NODE_FIELD(args);
+
+ return true;
+}
+
+static bool
+_equalTableSampleClause(const TableSampleClause *a, const TableSampleClause *b)
+{
+ COMPARE_SCALAR_FIELD(tsmid);
+ COMPARE_SCALAR_FIELD(tsminit);
+ COMPARE_SCALAR_FIELD(tsmnextblock);
+ COMPARE_SCALAR_FIELD(tsmnexttuple);
+ COMPARE_SCALAR_FIELD(tsmend);
+ COMPARE_SCALAR_FIELD(tsmreset);
+ COMPARE_SCALAR_FIELD(tsmcost);
+ COMPARE_NODE_FIELD(repeatable);
+ COMPARE_NODE_FIELD(args);
+
+ return true;
+}
+
+static bool
_equalXmlSerialize(const XmlSerialize *a, const XmlSerialize *b)
{
COMPARE_SCALAR_FIELD(xmloption);
@@ -3150,6 +3178,12 @@ equal(const void *a, const void *b)
case T_CommonTableExpr:
retval = _equalCommonTableExpr(a, b);
break;
+ case T_RangeTableSample:
+ retval = _equalRangeTableSample(a, b);
+ break;
+ case T_TableSampleClause:
+ retval = _equalTableSampleClause(a, b);
+ break;
case T_PrivGrantee:
retval = _equalPrivGrantee(a, b);
break;
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index 21dfda7..bd9ce09 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -3209,6 +3209,18 @@ raw_expression_tree_walker(Node *node,
return walker(((WithClause *) node)->ctes, context);
case T_CommonTableExpr:
return walker(((CommonTableExpr *) node)->ctequery, context);
+ case T_RangeTableSample:
+ {
+ RangeTableSample *rts = (RangeTableSample *) node;
+
+ if (walker(rts->relation, context))
+ return true;
+ if (walker(rts->repeatable, context))
+ return true;
+ if (walker(rts->args, context))
+ return true;
+ }
+ break;
default:
elog(ERROR, "unrecognized node type: %d",
(int) nodeTag(node));
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index dd1278b..684cd7e 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -578,6 +578,14 @@ _outCustomScan(StringInfo str, const CustomScan *node)
}
static void
+_outSampleScan(StringInfo str, const SampleScan *node)
+{
+ WRITE_NODE_TYPE("SAMPLESCAN");
+
+ _outScanInfo(str, (const Scan *) node);
+}
+
+static void
_outJoin(StringInfo str, const Join *node)
{
WRITE_NODE_TYPE("JOIN");
@@ -1589,6 +1597,17 @@ _outTidPath(StringInfo str, const TidPath *node)
}
static void
+_outSamplePath(StringInfo str, const SamplePath *node)
+{
+ WRITE_NODE_TYPE("SAMPLEPATH");
+
+ _outPathInfo(str, (const Path *) node);
+
+ WRITE_OID_FIELD(tsmcost);
+ WRITE_NODE_FIELD(tsmargs);
+}
+
+static void
_outForeignPath(StringInfo str, const ForeignPath *node)
{
WRITE_NODE_TYPE("FOREIGNPATH");
@@ -2391,6 +2410,33 @@ _outCommonTableExpr(StringInfo str, const CommonTableExpr *node)
}
static void
+_outRangeTableSample(StringInfo str, const RangeTableSample *node)
+{
+ WRITE_NODE_TYPE("RANGETABLESAMPLE");
+
+ WRITE_NODE_FIELD(relation);
+ WRITE_STRING_FIELD(method);
+ WRITE_NODE_FIELD(repeatable);
+ WRITE_NODE_FIELD(args);
+}
+
+static void
+_outTableSampleClause(StringInfo str, const TableSampleClause *node)
+{
+ WRITE_NODE_TYPE("TABLESAMPLECLAUSE");
+
+ WRITE_OID_FIELD(tsmid);
+ WRITE_OID_FIELD(tsminit);
+ WRITE_OID_FIELD(tsmnextblock);
+ WRITE_OID_FIELD(tsmnexttuple);
+ WRITE_OID_FIELD(tsmend);
+ WRITE_OID_FIELD(tsmreset);
+ WRITE_OID_FIELD(tsmcost);
+ WRITE_NODE_FIELD(repeatable);
+ WRITE_NODE_FIELD(args);
+}
+
+static void
_outSetOperationStmt(StringInfo str, const SetOperationStmt *node)
{
WRITE_NODE_TYPE("SETOPERATIONSTMT");
@@ -2420,6 +2466,7 @@ _outRangeTblEntry(StringInfo str, const RangeTblEntry *node)
case RTE_RELATION:
WRITE_OID_FIELD(relid);
WRITE_CHAR_FIELD(relkind);
+ WRITE_NODE_FIELD(tablesample);
break;
case RTE_SUBQUERY:
WRITE_NODE_FIELD(subquery);
@@ -2887,6 +2934,9 @@ _outNode(StringInfo str, const void *obj)
case T_CustomScan:
_outCustomScan(str, obj);
break;
+ case T_SampleScan:
+ _outSampleScan(str, obj);
+ break;
case T_Join:
_outJoin(str, obj);
break;
@@ -3092,6 +3142,8 @@ _outNode(StringInfo str, const void *obj)
case T_TidPath:
_outTidPath(str, obj);
break;
+ case T_SamplePath:
+ _outSamplePath(str, obj);
case T_ForeignPath:
_outForeignPath(str, obj);
break;
@@ -3228,6 +3280,12 @@ _outNode(StringInfo str, const void *obj)
case T_CommonTableExpr:
_outCommonTableExpr(str, obj);
break;
+ case T_RangeTableSample:
+ _outRangeTableSample(str, obj);
+ break;
+ case T_TableSampleClause:
+ _outTableSampleClause(str, obj);
+ break;
case T_SetOperationStmt:
_outSetOperationStmt(str, obj);
break;
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index ae24d05..fe08107 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -350,6 +350,43 @@ _readCommonTableExpr(void)
}
/*
+ * _readRangeTableSample
+ */
+static RangeTableSample *
+_readRangeTableSample(void)
+{
+ READ_LOCALS(RangeTableSample);
+
+ READ_NODE_FIELD(relation);
+ READ_STRING_FIELD(method);
+ READ_NODE_FIELD(repeatable);
+ READ_NODE_FIELD(args);
+
+ READ_DONE();
+}
+
+/*
+ * _readTableSampleClause
+ */
+static TableSampleClause *
+_readTableSampleClause(void)
+{
+ READ_LOCALS(TableSampleClause);
+
+ READ_OID_FIELD(tsmid);
+ READ_OID_FIELD(tsminit);
+ READ_OID_FIELD(tsmnextblock);
+ READ_OID_FIELD(tsmnexttuple);
+ READ_OID_FIELD(tsmend);
+ READ_OID_FIELD(tsmreset);
+ READ_OID_FIELD(tsmcost);
+ READ_NODE_FIELD(repeatable);
+ READ_NODE_FIELD(args);
+
+ READ_DONE();
+}
+
+/*
* _readSetOperationStmt
*/
static SetOperationStmt *
@@ -1216,6 +1253,7 @@ _readRangeTblEntry(void)
case RTE_RELATION:
READ_OID_FIELD(relid);
READ_CHAR_FIELD(relkind);
+ READ_NODE_FIELD(tablesample);
break;
case RTE_SUBQUERY:
READ_NODE_FIELD(subquery);
@@ -1311,6 +1349,10 @@ parseNodeString(void)
return_value = _readRowMarkClause();
else if (MATCH("COMMONTABLEEXPR", 15))
return_value = _readCommonTableExpr();
+ else if (MATCH("RANGETABLESAMPLE", 16))
+ return_value = _readRangeTableSample();
+ else if (MATCH("TABLESAMPLECLAUSE", 17))
+ return_value = _readTableSampleClause();
else if (MATCH("SETOPERATIONSTMT", 16))
return_value = _readSetOperationStmt();
else if (MATCH("ALIAS", 5))
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 58d78e6..c18973c 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -71,6 +71,8 @@ static void set_plain_rel_size(PlannerInfo *root, RelOptInfo *rel,
RangeTblEntry *rte);
static void set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
RangeTblEntry *rte);
+static void set_tablesample_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
+ RangeTblEntry *rte);
static void set_foreign_size(PlannerInfo *root, RelOptInfo *rel,
RangeTblEntry *rte);
static void set_foreign_pathlist(PlannerInfo *root, RelOptInfo *rel,
@@ -332,6 +334,11 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
/* Foreign table */
set_foreign_pathlist(root, rel, rte);
}
+ else if (rte->tablesample != NULL)
+ {
+ /* Build sample scan on relation */
+ set_tablesample_rel_pathlist(root, rel, rte);
+ }
else
{
/* Plain relation */
@@ -418,6 +425,34 @@ set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
}
/*
+ * set_tablesample_rel_pathlist
+ * Build access paths for a sampled relation
+ *
+ * There is only one possible path - sampling scan
+ */
+static void
+set_tablesample_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
+{
+ Relids required_outer;
+
+ /*
+ * We don't support pushing join clauses into the quals of a seqscan, but
+ * it could still have required parameterization due to LATERAL refs in
+ * its tlist.
+ */
+ required_outer = rel->lateral_relids;
+
+ /* We only do sample scan if it was requested */
+ add_path(rel, (Path *) create_samplescan_path(root, rel, required_outer));
+
+ /*
+ * There is only one plan to consider but we still need to set
+ * parameters for RelOptInfo.
+ */
+ set_cheapest(rel);
+}
+
+/*
* set_foreign_size
* Set size estimates for a foreign table RTE
*/
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 020558b..8f9d41e 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -88,6 +88,7 @@
#include "optimizer/restrictinfo.h"
#include "parser/parsetree.h"
#include "utils/lsyscache.h"
+#include "utils/sampling.h"
#include "utils/selfuncs.h"
#include "utils/spccache.h"
#include "utils/tuplesort.h"
@@ -219,6 +220,72 @@ cost_seqscan(Path *path, PlannerInfo *root,
}
/*
+ * cost_samplescan
+ * Determines and returns the cost of scanning a relation using sampling.
+ *
+ * From planner/optimizer perspective, we dont't care all that much about cost
+ * itself since there is always only one scan path to consider when sampling
+ * scan is present, but number of rows estimation is still important.
+ *
+ * 'baserel' is the relation to be scanned
+ * 'param_info' is the ParamPathInfo if this is a parameterized path, else NULL
+ */
+void
+cost_samplescan(SamplePath *path, PlannerInfo *root, RelOptInfo *baserel)
+{
+ Cost startup_cost = 0;
+ Cost run_cost = 0;
+ double spc_seq_page_cost,
+ spc_random_page_cost,
+ spc_page_cost;
+ QualCost qpqual_cost;
+ Cost cpu_per_tuple;
+ BlockNumber pages;
+ double tuples;
+ SamplerAccessStrategy strategy;
+
+ /* Should only be applied to base relations */
+ Assert(baserel->relid > 0);
+ Assert(baserel->rtekind == RTE_RELATION);
+
+ /* Call the sampling method's costing function. */
+ OidFunctionCall6(path->tsmcost, PointerGetDatum(root),
+ PointerGetDatum(path), PointerGetDatum(baserel),
+ PointerGetDatum(&pages), PointerGetDatum(&tuples),
+ PointerGetDatum(&strategy));
+
+ /* Mark the path with the correct row estimate */
+ if (path->path.param_info)
+ path->path.rows = path->path.param_info->ppi_rows;
+ else
+ path->path.rows = tuples;
+
+ /* fetch estimated page cost for tablespace containing table */
+ get_tablespace_page_costs(baserel->reltablespace,
+ &spc_random_page_cost,
+ &spc_seq_page_cost);
+
+
+ spc_page_cost = strategy == SAS_RANDOM ?
+ spc_random_page_cost : spc_seq_page_cost;
+
+ /*
+ * disk costs
+ */
+ run_cost += spc_page_cost * pages;
+
+ /* CPU costs */
+ get_restriction_qual_cost(root, baserel, path->path.param_info, &qpqual_cost);
+
+ startup_cost += qpqual_cost.startup;
+ cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple;
+ run_cost += cpu_per_tuple * tuples;
+
+ path->path.startup_cost = startup_cost;
+ path->path.total_cost = startup_cost + run_cost;
+}
+
+/*
* cost_index
* Determines and returns the cost of scanning a relation using an index.
*
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 655be81..10a5e02 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -58,6 +58,8 @@ static Material *create_material_plan(PlannerInfo *root, MaterialPath *best_path
static Plan *create_unique_plan(PlannerInfo *root, UniquePath *best_path);
static SeqScan *create_seqscan_plan(PlannerInfo *root, Path *best_path,
List *tlist, List *scan_clauses);
+static SampleScan *create_samplescan_plan(PlannerInfo *root, Path *best_path,
+ List *tlist, List *scan_clauses);
static Scan *create_indexscan_plan(PlannerInfo *root, IndexPath *best_path,
List *tlist, List *scan_clauses, bool indexonly);
static BitmapHeapScan *create_bitmap_scan_plan(PlannerInfo *root,
@@ -100,6 +102,7 @@ static List *order_qual_clauses(PlannerInfo *root, List *clauses);
static void copy_path_costsize(Plan *dest, Path *src);
static void copy_plan_costsize(Plan *dest, Plan *src);
static SeqScan *make_seqscan(List *qptlist, List *qpqual, Index scanrelid);
+static SampleScan *make_samplescan(List *qptlist, List *qpqual, Index scanrelid);
static IndexScan *make_indexscan(List *qptlist, List *qpqual, Index scanrelid,
Oid indexid, List *indexqual, List *indexqualorig,
List *indexorderby, List *indexorderbyorig,
@@ -228,6 +231,7 @@ create_plan_recurse(PlannerInfo *root, Path *best_path)
switch (best_path->pathtype)
{
case T_SeqScan:
+ case T_SampleScan:
case T_IndexScan:
case T_IndexOnlyScan:
case T_BitmapHeapScan:
@@ -343,6 +347,13 @@ create_scan_plan(PlannerInfo *root, Path *best_path)
scan_clauses);
break;
+ case T_SampleScan:
+ plan = (Plan *) create_samplescan_plan(root,
+ best_path,
+ tlist,
+ scan_clauses);
+ break;
+
case T_IndexScan:
plan = (Plan *) create_indexscan_plan(root,
(IndexPath *) best_path,
@@ -546,6 +557,7 @@ disuse_physical_tlist(PlannerInfo *root, Plan *plan, Path *path)
switch (path->pathtype)
{
case T_SeqScan:
+ case T_SampleScan:
case T_IndexScan:
case T_IndexOnlyScan:
case T_BitmapHeapScan:
@@ -1133,6 +1145,45 @@ create_seqscan_plan(PlannerInfo *root, Path *best_path,
}
/*
+ * create_samplescan_plan
+ * Returns a samplecan plan for the base relation scanned by 'best_path'
+ * with restriction clauses 'scan_clauses' and targetlist 'tlist'.
+ */
+static SampleScan *
+create_samplescan_plan(PlannerInfo *root, Path *best_path,
+ List *tlist, List *scan_clauses)
+{
+ SampleScan *scan_plan;
+ Index scan_relid = best_path->parent->relid;
+
+ /* it should be a base rel with tablesample clause... */
+ Assert(scan_relid > 0);
+ Assert(best_path->parent->rtekind == RTE_RELATION);
+ Assert(best_path->pathtype == T_SampleScan);
+
+ /* Sort clauses into best execution order */
+ scan_clauses = order_qual_clauses(root, scan_clauses);
+
+ /* Reduce RestrictInfo list to bare expressions; ignore pseudoconstants */
+ scan_clauses = extract_actual_clauses(scan_clauses, false);
+
+ /* Replace any outer-relation variables with nestloop params */
+ if (best_path->param_info)
+ {
+ scan_clauses = (List *)
+ replace_nestloop_params(root, (Node *) scan_clauses);
+ }
+
+ scan_plan = make_samplescan(tlist,
+ scan_clauses,
+ scan_relid);
+
+ copy_path_costsize(&scan_plan->plan, best_path);
+
+ return scan_plan;
+}
+
+/*
* create_indexscan_plan
* Returns an indexscan plan for the base relation scanned by 'best_path'
* with restriction clauses 'scan_clauses' and targetlist 'tlist'.
@@ -3318,6 +3369,24 @@ make_seqscan(List *qptlist,
return node;
}
+static SampleScan *
+make_samplescan(List *qptlist,
+ List *qpqual,
+ Index scanrelid)
+{
+ SampleScan *node = makeNode(SampleScan);
+ Plan *plan = &node->plan;
+
+ /* cost should be inserted by caller */
+ plan->targetlist = qptlist;
+ plan->qual = qpqual;
+ plan->lefttree = NULL;
+ plan->righttree = NULL;
+ node->scanrelid = scanrelid;
+
+ return node;
+}
+
static IndexScan *
make_indexscan(List *qptlist,
List *qpqual,
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 7703946..de33fc6 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -446,6 +446,17 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
fix_scan_list(root, splan->plan.qual, rtoffset);
}
break;
+ case T_SampleScan:
+ {
+ SampleScan *splan = (SampleScan *) plan;
+
+ splan->scanrelid += rtoffset;
+ splan->plan.targetlist =
+ fix_scan_list(root, splan->plan.targetlist, rtoffset);
+ splan->plan.qual =
+ fix_scan_list(root, splan->plan.qual, rtoffset);
+ }
+ break;
case T_IndexScan:
{
IndexScan *splan = (IndexScan *) plan;
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index 78fb6b1..191624c 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -2163,6 +2163,7 @@ finalize_plan(PlannerInfo *root, Plan *plan, Bitmapset *valid_params,
break;
case T_SeqScan:
+ case T_SampleScan:
context.paramids = bms_add_members(context.paramids, scan_params);
break;
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 1395a21..6206c60 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -706,6 +706,33 @@ create_seqscan_path(PlannerInfo *root, RelOptInfo *rel, Relids required_outer)
}
/*
+ * create_samplescan_path
+ * Like seqscan but uses sampling function while scanning.
+ */
+SamplePath *
+create_samplescan_path(PlannerInfo *root, RelOptInfo *rel, Relids required_outer)
+{
+ SamplePath *pathnode = makeNode(SamplePath);
+ RangeTblEntry *rte = planner_rt_fetch(rel->relid, root);
+ TableSampleClause *tablesample = rte->tablesample;
+
+ Assert(tablesample);
+
+ pathnode->path.pathtype = T_SampleScan;
+ pathnode->path.parent = rel;
+ pathnode->path.param_info = get_baserel_parampathinfo(root, rel,
+ required_outer);
+ pathnode->path.pathkeys = NIL; /* samplescan has unordered result */
+
+ pathnode->tsmcost = tablesample->tsmcost;
+ pathnode->tsmargs = tablesample->args;
+
+ cost_samplescan(pathnode, root, rel);
+
+ return pathnode;
+}
+
+/*
* create_index_path
* Creates a path node for an index scan.
*
@@ -1921,6 +1948,8 @@ reparameterize_path(PlannerInfo *root, Path *path,
case T_SubqueryScan:
return create_subqueryscan_path(root, rel, path->pathkeys,
required_outer);
+ case T_SampleScan:
+ return (Path *) create_samplescan_path(root, rel, required_outer);
default:
break;
}
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 679e1bb..01d72d4 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -447,6 +447,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
%type <range> relation_expr
%type <range> relation_expr_opt_alias
%type <target> target_el single_set_clause set_target insert_column_item
+%type <node> relation_expr_tablesample tablesample_clause opt_repeatable_clause
%type <str> generic_option_name
%type <node> generic_option_arg
@@ -611,8 +612,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
STATEMENT STATISTICS STDIN STDOUT STORAGE STRICT_P STRIP_P SUBSTRING
SYMMETRIC SYSID SYSTEM_P
- TABLE TABLES TABLESPACE TEMP TEMPLATE TEMPORARY TEXT_P THEN TIME TIMESTAMP
- TO TRAILING TRANSACTION TREAT TRIGGER TRIM TRUE_P
+ TABLE TABLES TABLESAMPLE TABLESPACE TEMP TEMPLATE TEMPORARY TEXT_P THEN
+ TIME TIMESTAMP TO TRAILING TRANSACTION TREAT TRIGGER TRIM TRUE_P
TRUNCATE TRUSTED TYPE_P TYPES_P
UNBOUNDED UNCOMMITTED UNENCRYPTED UNION UNIQUE UNKNOWN UNLISTEN UNLOGGED
@@ -10227,6 +10228,12 @@ table_ref: relation_expr opt_alias_clause
$1->alias = $2;
$$ = (Node *) $1;
}
+ | relation_expr_tablesample opt_alias_clause
+ {
+ RangeTableSample *n = (RangeTableSample *) $1;
+ n->relation->alias = $2;
+ $$ = (Node *) n;
+ }
| func_table func_alias_clause
{
RangeFunction *n = (RangeFunction *) $1;
@@ -10522,7 +10529,6 @@ relation_expr_list:
| relation_expr_list ',' relation_expr { $$ = lappend($1, $3); }
;
-
/*
* Given "UPDATE foo set set ...", we have to decide without looking any
* further ahead whether the first "set" is an alias or the UPDATE's SET
@@ -10552,6 +10558,31 @@ relation_expr_opt_alias: relation_expr %prec UMINUS
}
;
+
+relation_expr_tablesample: relation_expr tablesample_clause
+ {
+ RangeTableSample *n = (RangeTableSample *) $2;
+ n->relation = $1;
+ $$ = (Node *) n;
+ }
+ ;
+
+tablesample_clause:
+ TABLESAMPLE ColId '(' func_arg_list ')' opt_repeatable_clause
+ {
+ RangeTableSample *n = makeNode(RangeTableSample);
+ n->method = $2;
+ n->args = $4;
+ n->repeatable = $6;
+ $$ = (Node *) n;
+ }
+ ;
+
+opt_repeatable_clause:
+ REPEATABLE '(' AexprConst ')' { $$ = (Node *) $3; }
+ | /*EMPTY*/ { $$ = NULL; }
+ ;
+
/*
* func_table represents a function invocation in a FROM list. It can be
* a plain function call, like "foo(...)", or a ROWS FROM expression with
@@ -13334,7 +13365,6 @@ unreserved_keyword:
| RELATIVE_P
| RELEASE
| RENAME
- | REPEATABLE
| REPLACE
| REPLICA
| RESET
@@ -13509,6 +13539,7 @@ type_func_name_keyword:
| OVERLAPS
| RIGHT
| SIMILAR
+ | TABLESAMPLE
| VERBOSE
;
@@ -13577,6 +13608,7 @@ reserved_keyword:
| PLACING
| PRIMARY
| REFERENCES
+ | REPEATABLE
| RETURNING
| SELECT
| SESSION_USER
diff --git a/src/backend/parser/parse_clause.c b/src/backend/parser/parse_clause.c
index 654dce6..03632d2 100644
--- a/src/backend/parser/parse_clause.c
+++ b/src/backend/parser/parse_clause.c
@@ -16,6 +16,7 @@
#include "postgres.h"
#include "access/heapam.h"
+#include "access/htup_details.h"
#include "catalog/heap.h"
#include "catalog/pg_type.h"
#include "commands/defrem.h"
@@ -29,6 +30,7 @@
#include "parser/parse_coerce.h"
#include "parser/parse_collate.h"
#include "parser/parse_expr.h"
+#include "parser/parse_func.h"
#include "parser/parse_oper.h"
#include "parser/parse_relation.h"
#include "parser/parse_target.h"
@@ -36,6 +38,7 @@
#include "utils/guc.h"
#include "utils/lsyscache.h"
#include "utils/rel.h"
+#include "utils/syscache.h"
/* Convenience macro for the most common makeNamespaceItem() case */
@@ -413,6 +416,19 @@ transformJoinOnClause(ParseState *pstate, JoinExpr *j, List *namespace)
return result;
}
+static RangeTblEntry *
+transformTableSampleEntry(ParseState *pstate, RangeTableSample *r)
+{
+ RangeTblEntry *rte;
+ TableSampleClause *tablesample = NULL;
+
+ rte = transformTableEntry(pstate, r->relation);
+ tablesample = ParseTableSample(pstate, r->method, r->repeatable, r->args);
+ rte->tablesample = tablesample;
+
+ return rte;
+}
+
/*
* transformTableEntry --- transform a RangeVar (simple relation reference)
*/
@@ -421,7 +437,7 @@ transformTableEntry(ParseState *pstate, RangeVar *r)
{
RangeTblEntry *rte;
- /* We need only build a range table entry */
+ /* We first need to build a range table entry */
rte = addRangeTableEntry(pstate, r, r->alias,
interpretInhOption(r->inhOpt), true);
@@ -1121,6 +1137,26 @@ transformFromClauseItem(ParseState *pstate, Node *n,
return (Node *) j;
}
+ else if (IsA(n, RangeTableSample))
+ {
+ /* Tablesample reference */
+ RangeTableSample *rv = (RangeTableSample *) n;
+ RangeTblRef *rtr;
+ RangeTblEntry *rte = NULL;
+ int rtindex;
+
+ rte = transformTableSampleEntry(pstate, rv);
+
+ /* assume new rte is at end */
+ rtindex = list_length(pstate->p_rtable);
+ Assert(rte == rt_fetch(rtindex, pstate->p_rtable));
+ *top_rte = rte;
+ *top_rti = rtindex;
+ *namespace = list_make1(makeDefaultNSItem(rte));
+ rtr = makeNode(RangeTblRef);
+ rtr->rtindex = rtindex;
+ return (Node *) rtr;
+ }
else
elog(ERROR, "unrecognized node type: %d", (int) nodeTag(n));
return NULL; /* can't get here, keep compiler quiet */
diff --git a/src/backend/parser/parse_func.c b/src/backend/parser/parse_func.c
index a200804..690d0fa 100644
--- a/src/backend/parser/parse_func.c
+++ b/src/backend/parser/parse_func.c
@@ -18,6 +18,7 @@
#include "catalog/pg_aggregate.h"
#include "catalog/pg_proc.h"
#include "catalog/pg_type.h"
+#include "catalog/pg_tablesample_method.h"
#include "funcapi.h"
#include "lib/stringinfo.h"
#include "nodes/makefuncs.h"
@@ -26,6 +27,7 @@
#include "parser/parse_clause.h"
#include "parser/parse_coerce.h"
#include "parser/parse_func.h"
+#include "parser/parse_expr.h"
#include "parser/parse_relation.h"
#include "parser/parse_target.h"
#include "parser/parse_type.h"
@@ -760,6 +762,132 @@ ParseFuncOrColumn(ParseState *pstate, List *funcname, List *fargs,
}
+/*
+ * ParseTableSample
+ *
+ * Parse TABLESAMPLE clause and process the arguments
+ */
+extern TableSampleClause *
+ParseTableSample(ParseState *pstate, char *samplemethod, Node *repeatable,
+ List *sampleargs)
+{
+ HeapTuple tuple;
+ Form_pg_tablesample_method tsm;
+ Form_pg_proc procform;
+ TableSampleClause *tablesample;
+ List *fargs;
+ ListCell *larg;
+ int nargs, initnargs;
+ Oid actual_arg_types[FUNC_MAX_ARGS];
+ Oid init_arg_types[FUNC_MAX_ARGS];
+
+ /* Load the tablesample method */
+ tuple = SearchSysCache1(TABLESAMPLEMETHODNAME, PointerGetDatum(samplemethod));
+ if (!HeapTupleIsValid(tuple))
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_OBJECT),
+ errmsg("tablesample method \"%s\" does not exist",
+ samplemethod)));
+
+ tablesample = makeNode(TableSampleClause);
+ tablesample->tsmid = HeapTupleGetOid(tuple);
+
+ tsm = (Form_pg_tablesample_method) GETSTRUCT(tuple);
+
+ tablesample->tsminit = tsm->tsminit;
+ tablesample->tsmnextblock = tsm->tsmnextblock;
+ tablesample->tsmnexttuple = tsm->tsmnexttuple;
+ tablesample->tsmend = tsm->tsmend;
+ tablesample->tsmreset = tsm->tsmreset;
+ tablesample->tsmcost = tsm->tsmcost;
+
+ ReleaseSysCache(tuple);
+
+ /* Validate the parameters against init function definition. */
+ tuple = SearchSysCache1(PROCOID,
+ ObjectIdGetDatum(tablesample->tsminit));
+
+ if (!HeapTupleIsValid(tuple)) /* should not happen */
+ elog(ERROR, "cache lookup failed for function %u",
+ tablesample->tsminit);
+
+ procform = (Form_pg_proc) GETSTRUCT(tuple);
+ initnargs = procform->pronargs;
+ Assert(initnargs >= 3);
+
+ /*
+ * First parameter is used to pass the SampleScanState, second is
+ * seed (REPEATABLE), skip the processing for them here, just assert
+ * that the types are correct.
+ */
+ Assert(procform->proargtypes.values[0] == INTERNALOID);
+ Assert(procform->proargtypes.values[1] == INT4OID);
+ initnargs -= 2;
+ memcpy(init_arg_types, procform->proargtypes.values + 2,
+ initnargs * sizeof(Oid));
+
+ /* Now we are done with the catalog */
+ ReleaseSysCache(tuple);
+
+ /* Process repeatable (seed) */
+ if (repeatable != NULL)
+ {
+ Node *arg = repeatable;
+
+ if (arg && IsA(arg, A_Const))
+ {
+ A_Const *con = (A_Const *) arg;
+
+ if (con->val.type == T_Null)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("REPEATABLE clause must be NOT NULL numeric value"),
+ parser_errposition(pstate, con->location)));
+
+ }
+
+ arg = transformExpr(pstate, arg, EXPR_KIND_FROM_FUNCTION);
+ arg = coerce_to_specific_type(pstate, arg, INT4OID, "REPEATABLE");
+ tablesample->repeatable = arg;
+ }
+ else
+ tablesample->repeatable = NULL;
+
+ /* Transform the rest of arguments ... */
+ fargs = NIL;
+ nargs = 0;
+ foreach(larg, sampleargs)
+ {
+ Node *arg = transformExpr(pstate, (Node *) lfirst(larg), EXPR_KIND_FROM_FUNCTION);
+ Oid argtype = exprType(arg);
+
+ fargs = lappend(fargs, arg);
+
+ actual_arg_types[nargs++] = argtype;
+ }
+
+ /*
+ * Check if parameters are correct.
+ *
+ * XXX: can we do better at hinting here?
+ */
+ if (initnargs != nargs ||
+ !can_coerce_type(initnargs, actual_arg_types, init_arg_types,
+ COERCION_IMPLICIT))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("wrong parameters for tablesample method \"%s\"",
+ samplemethod)));
+
+ /* perform the necessary typecasting of arguments */
+ make_fn_arguments(pstate, fargs, actual_arg_types, init_arg_types);
+
+ /* Pass the arguments down */
+ tablesample->args = fargs;
+
+ return tablesample;
+}
+
/* func_match_argtypes()
*
* Given a list of candidate functions (having the right name and number
diff --git a/src/backend/utils/Makefile b/src/backend/utils/Makefile
index 8374533..9daa2ae 100644
--- a/src/backend/utils/Makefile
+++ b/src/backend/utils/Makefile
@@ -9,7 +9,8 @@ top_builddir = ../../..
include $(top_builddir)/src/Makefile.global
OBJS = fmgrtab.o
-SUBDIRS = adt cache error fmgr hash init mb misc mmgr resowner sort time
+SUBDIRS = adt cache error fmgr hash init mb misc mmgr resowner sort time \
+ tablesample
# location of Catalog.pm
catalogdir = $(top_srcdir)/src/backend/catalog
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index dd748ac..4f1c534 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -31,6 +31,7 @@
#include "catalog/pg_opclass.h"
#include "catalog/pg_operator.h"
#include "catalog/pg_proc.h"
+#include "catalog/pg_tablesample_method.h"
#include "catalog/pg_trigger.h"
#include "catalog/pg_type.h"
#include "commands/defrem.h"
@@ -343,6 +344,8 @@ static void make_ruledef(StringInfo buf, HeapTuple ruletup, TupleDesc rulettc,
int prettyFlags);
static void make_viewdef(StringInfo buf, HeapTuple ruletup, TupleDesc rulettc,
int prettyFlags, int wrapColumn);
+static void get_tablesample_def(TableSampleClause *tablesample,
+ deparse_context *context);
static void get_query_def(Query *query, StringInfo buf, List *parentnamespace,
TupleDesc resultDesc,
int prettyFlags, int wrapColumn, int startIndent);
@@ -4157,6 +4160,50 @@ make_viewdef(StringInfo buf, HeapTuple ruletup, TupleDesc rulettc,
heap_close(ev_relation, AccessShareLock);
}
+/* ----------
+ * get_tablesample_def - Convert TableSampleClause back to SQL
+ * ----------
+ */
+static void
+get_tablesample_def(TableSampleClause *tablesample, deparse_context *context)
+{
+ StringInfo buf = context->buf;
+ HeapTuple tuple;
+ Form_pg_tablesample_method tsm;
+ char *tsmname;
+ int nargs;
+ ListCell *l;
+
+ /* Load the tablesample method */
+ tuple = SearchSysCache1(TABLESAMPLEMETHODOID, ObjectIdGetDatum(tablesample->tsmid));
+ if (!HeapTupleIsValid(tuple))
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_OBJECT),
+ errmsg("cache lookup failed for tablesample method %u",
+ tablesample->tsmid)));
+
+ tsm = (Form_pg_tablesample_method) GETSTRUCT(tuple);
+ tsmname = NameStr(tsm->tsmname);
+ appendStringInfo(buf, " TABLESAMPLE %s (", quote_identifier(tsmname));
+
+ ReleaseSysCache(tuple);
+
+ nargs = 0;
+ foreach(l, tablesample->args)
+ {
+ if (nargs++ > 0)
+ appendStringInfoString(buf, ", ");
+ get_rule_expr((Node *) lfirst(l), context, true);
+ }
+ appendStringInfoChar(buf, ')');
+
+ if (tablesample->repeatable != NULL)
+ {
+ appendStringInfoString(buf, " REPEATABLE (");
+ get_rule_expr(tablesample->repeatable, context, true);
+ appendStringInfoChar(buf, ')');
+ }
+}
/* ----------
* get_query_def - Parse back one query parsetree
@@ -8384,6 +8431,9 @@ get_from_clause_item(Node *jtnode, Query *query, deparse_context *context)
only_marker(rte),
generate_relation_name(rte->relid,
context->namespaces));
+
+ if (rte->tablesample)
+ get_tablesample_def(rte->tablesample, context);
break;
case RTE_SUBQUERY:
/* Subquery RTE */
diff --git a/src/backend/utils/cache/syscache.c b/src/backend/utils/cache/syscache.c
index bd27168..3a8f01e 100644
--- a/src/backend/utils/cache/syscache.c
+++ b/src/backend/utils/cache/syscache.c
@@ -55,6 +55,7 @@
#include "catalog/pg_shdescription.h"
#include "catalog/pg_shseclabel.h"
#include "catalog/pg_statistic.h"
+#include "catalog/pg_tablesample_method.h"
#include "catalog/pg_tablespace.h"
#include "catalog/pg_ts_config.h"
#include "catalog/pg_ts_config_map.h"
@@ -642,6 +643,28 @@ static const struct cachedesc cacheinfo[] = {
},
128
},
+ {TableSampleMethodRelationId, /* TABLESAMPLEMETHODNAME */
+ TableSampleMethodNameIndexId,
+ 1,
+ {
+ Anum_pg_tablesample_method_tsmname,
+ 0,
+ 0,
+ 0,
+ },
+ 2
+ },
+ {TableSampleMethodRelationId, /* TABLESAMPLEMETHODOID */
+ TableSampleMethodOidIndexId,
+ 1,
+ {
+ ObjectIdAttributeNumber,
+ 0,
+ 0,
+ 0,
+ },
+ 2
+ },
{TableSpaceRelationId, /* TABLESPACEOID */
TablespaceOidIndexId,
1,
diff --git a/src/backend/utils/tablesample/Makefile b/src/backend/utils/tablesample/Makefile
new file mode 100644
index 0000000..df92939
--- /dev/null
+++ b/src/backend/utils/tablesample/Makefile
@@ -0,0 +1,17 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+# Makefile for utils/tablesample
+#
+# IDENTIFICATION
+# src/backend/utils/tablesample/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/backend/utils/tablesample
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS = system.o bernoulli.o
+
+include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/utils/tablesample/bernoulli.c b/src/backend/utils/tablesample/bernoulli.c
new file mode 100644
index 0000000..1c7808d
--- /dev/null
+++ b/src/backend/utils/tablesample/bernoulli.c
@@ -0,0 +1,199 @@
+/*-------------------------------------------------------------------------
+ *
+ * bernoulli.c
+ * interface routines for BERNOULLI tablesample method
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/utils/tablesample/bernoulli.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "fmgr.h"
+
+#include "nodes/execnodes.h"
+#include "nodes/relation.h"
+#include "optimizer/clauses.h"
+#include "storage/bufmgr.h"
+#include "utils/sampling.h"
+#include "utils/tablesample.h"
+
+
+/* State */
+typedef struct
+{
+ uint32 seed; /* random seed */
+ BlockNumber tblocks; /* total blocks in relation */
+ BlockNumber blockno; /* current block */
+ float4 samplesize; /* percentage of tuples to return (0.0-1.0) */
+ OffsetNumber lt; /* last tuple returned from current block */
+} BernoulliSamplerData;
+
+/*
+ * Initialize the state.
+ */
+Datum
+tsm_bernoulli_init(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ uint32 seed = PG_GETARG_UINT32(1);
+ float4 percent = PG_ARGISNULL(2) ? -1 : PG_GETARG_FLOAT4(2);
+ Relation rel = scanstate->ss.ss_currentRelation;
+ BernoulliSamplerData *sampler;
+
+ if (percent < 0 || percent > 100)
+ ereport(ERROR,
+ (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+ errmsg("invalid sample size"),
+ errhint("Sample size must be numeric value between 0 and 100 (inclusive).")));
+
+ sampler = palloc0(sizeof(BernoulliSamplerData));
+
+ /* Remember initial values for reinit */
+ sampler->seed = seed;
+ sampler->tblocks = RelationGetNumberOfBlocks(rel);
+ sampler->blockno = InvalidBlockNumber;
+ sampler->samplesize = percent / 100;
+ sampler->lt = InvalidOffsetNumber;
+
+ sampler_setseed(seed);
+
+ scanstate->tsmdata = (void *) sampler;
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Get next block number to read or InvalidBlockNumber if we are at the
+ * end of the relation.
+ */
+Datum
+tsm_bernoulli_nextblock(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ BernoulliSamplerData *sampler = (BernoulliSamplerData *) scanstate->tsmdata;
+
+ if (sampler->blockno == InvalidBlockNumber)
+ sampler->blockno = 0;
+ else if (++sampler->blockno >= sampler->tblocks)
+ PG_RETURN_UINT32(InvalidBlockNumber);
+
+ PG_RETURN_UINT32(sampler->blockno);
+}
+
+/*
+ * Get next tuple from current block.
+ *
+ * This method implements the main logic in bernoulli sampling.
+ * The algorithm simply generates new random number (in 0.0-1.0 range) and if
+ * it falls within user specified probability (in the same range) return the
+ * tuple offset.
+ *
+ * If we reach end of the block return InvalidOffsetNumber which tells
+ * SampleScan to go to next block.
+ */
+Datum
+tsm_bernoulli_nexttuple(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ OffsetNumber maxoffset = PG_GETARG_UINT16(2);
+ BernoulliSamplerData *sampler = (BernoulliSamplerData *) scanstate->tsmdata;
+ OffsetNumber tupoffset = sampler->lt;
+ double samplesize = sampler->samplesize;
+
+ if (tupoffset == InvalidOffsetNumber)
+ tupoffset = FirstOffsetNumber;
+ else
+ tupoffset++;
+
+ /* Every tuple has percent chance of being returned */
+ while (sampler_random_fract() > samplesize)
+ {
+ tupoffset++;
+
+ if (tupoffset > maxoffset)
+ break;
+ }
+
+ if (tupoffset > maxoffset)
+ /* Tell SampleScan that we want next block. */
+ tupoffset = InvalidOffsetNumber;
+
+ sampler->lt = tupoffset;
+
+ PG_RETURN_UINT16(tupoffset);
+}
+
+/*
+ * Cleanup method.
+ */
+Datum
+tsm_bernoulli_end(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+
+ pfree(scanstate->tsmdata);
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Reset state (called by ReScan).
+ */
+Datum
+tsm_bernoulli_reset(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ BernoulliSamplerData *sampler = (BernoulliSamplerData *) scanstate->tsmdata;
+
+ sampler->blockno = InvalidBlockNumber;
+ sampler->lt = InvalidOffsetNumber;
+ sampler_setseed(sampler->seed);
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Costing function.
+ */
+Datum
+tsm_bernoulli_cost(PG_FUNCTION_ARGS)
+{
+ PlannerInfo *root = (PlannerInfo *) PG_GETARG_POINTER(0);
+ SamplePath *path = (SamplePath *) PG_GETARG_POINTER(1);
+ RelOptInfo *baserel = (RelOptInfo *) PG_GETARG_POINTER(2);
+ BlockNumber *pages = (BlockNumber *) PG_GETARG_POINTER(3);
+ double *tuples = (double *) PG_GETARG_POINTER(4);
+ List *args = path->tsmargs;
+ Node *pctnode;
+ float4 samplesize;
+
+ SamplerAccessStrategy *strategy = (SamplerAccessStrategy *) PG_GETARG_POINTER(5);
+
+ *strategy = SAS_SEQUENTIAL;
+ *pages = baserel->pages;
+
+ pctnode = linitial(args);
+ pctnode = estimate_expression_value(root, pctnode);
+
+ if (IsA(pctnode, RelabelType))
+ pctnode = (Node *) ((RelabelType *) pctnode)->arg;
+
+ if (!IsA(pctnode, Const))
+ {
+ *tuples = baserel->tuples * 0.1;
+ PG_RETURN_VOID();
+ }
+
+ samplesize = DatumGetFloat4(((Const *) pctnode)->constvalue);
+ samplesize /= 100.0;
+
+ *tuples = baserel->tuples * samplesize;
+
+ PG_RETURN_VOID();
+}
diff --git a/src/backend/utils/tablesample/system.c b/src/backend/utils/tablesample/system.c
new file mode 100644
index 0000000..83a7ccf
--- /dev/null
+++ b/src/backend/utils/tablesample/system.c
@@ -0,0 +1,186 @@
+/*-------------------------------------------------------------------------
+ *
+ * system.c
+ * interface routines for system tablesample method
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/utils/tablesample/system.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "fmgr.h"
+
+#include "nodes/execnodes.h"
+#include "nodes/relation.h"
+#include "optimizer/clauses.h"
+#include "storage/bufmgr.h"
+#include "utils/sampling.h"
+#include "utils/tablesample.h"
+
+
+/*
+ * State
+ */
+typedef struct
+{
+ BlockSamplerData bs;
+ uint32 seed; /* random seed */
+ BlockNumber tblocks; /* total blocks in relation */
+ int samplesize; /* number of blocks to return */
+ OffsetNumber lt; /* last tuple returned from current block */
+} SystemSamplerData;
+
+
+/*
+ * Initializes the state.
+ */
+Datum
+tsm_system_init(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ uint32 seed = PG_GETARG_UINT32(1);
+ float4 percent = PG_ARGISNULL(2) ? -1 : PG_GETARG_FLOAT4(2);
+ SystemSamplerData *sampler;
+
+ if (percent < 0 || percent > 100)
+ ereport(ERROR,
+ (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+ errmsg("invalid sample size"),
+ errhint("Sample size must be numeric value between 0 and 100 (inclusive).")));
+
+ sampler = palloc0(sizeof(SystemSamplerData));
+
+ /* Remember initial values for reinit */
+ sampler->seed = seed;
+ sampler->tblocks = RelationGetNumberOfBlocks(scanstate->ss.ss_currentRelation);
+ sampler->samplesize = 1 + (int) (sampler->tblocks * (percent / 100.0));
+ sampler->lt = InvalidOffsetNumber;
+
+ sampler_setseed(seed);
+ BlockSampler_Init(&sampler->bs, sampler->tblocks, sampler->samplesize);
+
+ scanstate->tsmdata = (void *) sampler;
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Get next block number or InvalidBlockNumber when we're done.
+ *
+ * Uses the same logic as VACUUM for picking the random blocks.
+ */
+Datum
+tsm_system_nextblock(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ SystemSamplerData *sampler = (SystemSamplerData *) scanstate->tsmdata;
+ BlockNumber blockno;
+
+ if (!BlockSampler_HasMore(&sampler->bs))
+ PG_RETURN_UINT32(InvalidBlockNumber);
+
+ blockno = BlockSampler_Next(&sampler->bs);
+
+ PG_RETURN_UINT32(blockno);
+}
+
+/*
+ * Get next tuple offset in current block or InvalidOffsetNumber if we are done
+ * with this block.
+ */
+Datum
+tsm_system_nexttuple(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ OffsetNumber maxoffset = PG_GETARG_UINT16(2);
+ SystemSamplerData *sampler = (SystemSamplerData *) scanstate->tsmdata;
+ OffsetNumber tupoffset = sampler->lt;
+
+ if (tupoffset == InvalidOffsetNumber)
+ tupoffset = FirstOffsetNumber;
+ else
+ tupoffset++;
+
+ if (tupoffset > maxoffset)
+ tupoffset = InvalidOffsetNumber;
+
+ sampler->lt = tupoffset;
+
+ PG_RETURN_UINT16(tupoffset);
+}
+
+/*
+ * Cleanup method.
+ */
+Datum
+tsm_system_end(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+
+ pfree(scanstate->tsmdata);
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Reset state (called by ReScan).
+ */
+Datum
+tsm_system_reset(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ SystemSamplerData *sampler = (SystemSamplerData *) scanstate->tsmdata;
+
+ sampler->lt = InvalidOffsetNumber;
+ sampler_setseed(sampler->seed);
+ BlockSampler_Init(&sampler->bs, sampler->tblocks, sampler->samplesize);
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Costing function.
+ */
+Datum
+tsm_system_cost(PG_FUNCTION_ARGS)
+{
+ PlannerInfo *root = (PlannerInfo *) PG_GETARG_POINTER(0);
+ SamplePath *path = (SamplePath *) PG_GETARG_POINTER(1);
+ RelOptInfo *baserel = (RelOptInfo *) PG_GETARG_POINTER(2);
+ BlockNumber *pages = (BlockNumber *) PG_GETARG_POINTER(3);
+ double *tuples = (double *) PG_GETARG_POINTER(4);
+ List *args = path->tsmargs;
+ Node *pctnode;
+ float4 percent;
+
+ SamplerAccessStrategy *strategy = (SamplerAccessStrategy *) PG_GETARG_POINTER(5);
+
+ *strategy = SAS_RANDOM;
+
+ pctnode = linitial(args);
+ pctnode = estimate_expression_value(root, pctnode);
+
+ if (IsA(pctnode, RelabelType))
+ pctnode = (Node *) ((RelabelType *) pctnode)->arg;
+
+ if (!IsA(pctnode, Const))
+ {
+ *pages = baserel->pages * 0.1;
+ *tuples = baserel->tuples * 0.1;
+ PG_RETURN_VOID();
+ }
+
+ percent = DatumGetFloat4(((Const *) pctnode)->constvalue);
+ percent /= 100.0;
+
+ *pages = baserel->pages * percent;
+ *tuples = baserel->tuples * percent;
+
+ PG_RETURN_VOID();
+}
diff --git a/src/include/catalog/indexing.h b/src/include/catalog/indexing.h
index a680229..c711cca 100644
--- a/src/include/catalog/indexing.h
+++ b/src/include/catalog/indexing.h
@@ -305,6 +305,11 @@ DECLARE_UNIQUE_INDEX(pg_policy_oid_index, 3257, on pg_policy using btree(oid oid
DECLARE_UNIQUE_INDEX(pg_policy_polrelid_polname_index, 3258, on pg_policy using btree(polrelid oid_ops, polname name_ops));
#define PolicyPolrelidPolnameIndexId 3258
+DECLARE_UNIQUE_INDEX(pg_tablesample_method_name_index, 3281, on pg_tablesample_method using btree(tsmname name_ops));
+#define TableSampleMethodNameIndexId 3281
+DECLARE_UNIQUE_INDEX(pg_tablesample_method_oid_index, 3282, on pg_tablesample_method using btree(oid oid_ops));
+#define TableSampleMethodOidIndexId 3282
+
/* last step of initialization script: build the indexes declared above */
BUILD_INDICES
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 9edfdb8..a0f97ac 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -5143,6 +5143,31 @@ DESCR("rank of hypothetical row without gaps");
DATA(insert OID = 3993 ( dense_rank_final PGNSP PGUID 12 1 0 2276 0 f f f f f f i 2 0 20 "2281 2276" "{2281,2276}" "{i,v}" _null_ _null_ hypothetical_dense_rank_final _null_ _null_ _null_ ));
DESCR("aggregate final function");
+DATA(insert OID = 3285 ( tsm_system_init PGNSP PGUID 12 1 0 0 0 f f f f t f v 3 0 2278 "2281 23 700" _null_ _null_ _null_ _null_ tsm_system_init _null_ _null_ _null_ ));
+DESCR("tsm_system_init(internal)");
+DATA(insert OID = 3286 ( tsm_system_nextblock PGNSP PGUID 12 1 0 0 0 f f f f t f v 2 0 23 "2281 16" _null_ _null_ _null_ _null_ tsm_system_nextblock _null_ _null_ _null_ ));
+DESCR("tsm_system_nextblock(internal)");
+DATA(insert OID = 3287 ( tsm_system_nexttuple PGNSP PGUID 12 1 0 0 0 f f f f t f v 4 0 21 "2281 23 21 16" _null_ _null_ _null_ _null_ tsm_system_nexttuple _null_ _null_ _null_ ));
+DESCR("tsm_system_nexttuple(internal)");
+DATA(insert OID = 3288 ( tsm_system_end PGNSP PGUID 12 1 0 0 0 f f f f t f v 1 0 2278 "2281" _null_ _null_ _null_ _null_ tsm_system_end _null_ _null_ _null_ ));
+DESCR("tsm_system_end(internal)");
+DATA(insert OID = 3289 ( tsm_system_reset PGNSP PGUID 12 1 0 0 0 f f f f t f v 1 0 2278 "2281" _null_ _null_ _null_ _null_ tsm_system_reset _null_ _null_ _null_ ));
+DESCR("tsm_system_reset(internal)");
+DATA(insert OID = 3290 ( tsm_system_cost PGNSP PGUID 12 1 0 0 0 f f f f t f v 6 0 2278 "2281 2281 2281 2281 2281 2281" _null_ _null_ _null_ _null_ tsm_system_cost _null_ _null_ _null_ ));
+DESCR("tsm_system_cost(internal)");
+
+DATA(insert OID = 3291 ( tsm_bernoulli_init PGNSP PGUID 12 1 0 0 0 f f f f t f v 3 0 2278 "2281 23 700" _null_ _null_ _null_ _null_ tsm_bernoulli_init _null_ _null_ _null_ ));
+DESCR("tsm_bernoulli_init(internal)");
+DATA(insert OID = 3292 ( tsm_bernoulli_nextblock PGNSP PGUID 12 1 0 0 0 f f f f t f v 2 0 23 "2281 16" _null_ _null_ _null_ _null_ tsm_bernoulli_nextblock _null_ _null_ _null_ ));
+DESCR("tsm_bernoulli_nextblock(internal)");
+DATA(insert OID = 3293 ( tsm_bernoulli_nexttuple PGNSP PGUID 12 1 0 0 0 f f f f t f v 4 0 21 "2281 23 21 16" _null_ _null_ _null_ _null_ tsm_bernoulli_nexttuple _null_ _null_ _null_ ));
+DESCR("tsm_bernoulli_nexttuple(internal)");
+DATA(insert OID = 3294 ( tsm_bernoulli_end PGNSP PGUID 12 1 0 0 0 f f f f t f v 1 0 2278 "2281" _null_ _null_ _null_ _null_ tsm_bernoulli_end _null_ _null_ _null_ ));
+DESCR("tsm_bernoulli_end(internal)");
+DATA(insert OID = 3296 ( tsm_bernoulli_reset PGNSP PGUID 12 1 0 0 0 f f f f t f v 1 0 2278 "2281" _null_ _null_ _null_ _null_ tsm_bernoulli_reset _null_ _null_ _null_ ));
+DESCR("tsm_bernoulli_reset(internal)");
+DATA(insert OID = 3297 ( tsm_bernoulli_cost PGNSP PGUID 12 1 0 0 0 f f f f t f v 6 0 2278 "2281 2281 2281 2281 2281 2281" _null_ _null_ _null_ _null_ tsm_bernoulli_cost _null_ _null_ _null_ ));
+DESCR("tsm_bernoulli_cost(internal)");
/*
* Symbolic values for provolatile column: these indicate whether the result
diff --git a/src/include/catalog/pg_tablesample_method.h b/src/include/catalog/pg_tablesample_method.h
new file mode 100644
index 0000000..0e4a716
--- /dev/null
+++ b/src/include/catalog/pg_tablesample_method.h
@@ -0,0 +1,70 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_tablesample_method.h
+ * definition of the table scan methods.
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/catalog/pg_tablesample_method.h
+ *
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PG_TABLESAMPLE_METHOD_H
+#define PG_TABLESAMPLE_METHOD_H
+
+#include "catalog/genbki.h"
+
+/* ----------------
+ * pg_tablesample_method definition. cpp turns this into
+ * typedef struct FormData_pg_tablesample_method
+ * ----------------
+ */
+#define TableSampleMethodRelationId 3280
+
+CATALOG(pg_tablesample_method,3280)
+{
+ NameData tsmname; /* tablescan method name */
+ regproc tsminit; /* init scan function */
+ regproc tsmnextblock; /* function returning next block to sample
+ or InvalidBlockOffset if finished */
+ regproc tsmnexttuple; /* function returning next tuple offset from current block
+ or InvalidOffsetNumber if end of the block was reacher */
+ regproc tsmend; /* end scan function*/
+ regproc tsmreset; /* reset state - used by rescan */
+ regproc tsmcost; /* costing function */
+} FormData_pg_tablesample_method;
+
+/* ----------------
+ * Form_pg_tablesample_method corresponds to a pointer to a tuple with
+ * the format of pg_tablesample_method relation.
+ * ----------------
+ */
+typedef FormData_pg_tablesample_method *Form_pg_tablesample_method;
+
+/* ----------------
+ * compiler constants for pg_tablesample_method
+ * ----------------
+ */
+#define Natts_pg_tablesample_method 7
+#define Anum_pg_tablesample_method_tsmname 1
+#define Anum_pg_tablesample_method_tsminit 2
+#define Anum_pg_tablesample_method_tsmnextblock 3
+#define Anum_pg_tablesample_method_tsmnexttuple 4
+#define Anum_pg_tablesample_method_tsmend 5
+#define Anum_pg_tablesample_method_tsmreset 6
+#define Anum_pg_tablesample_method_tsmcost 7
+
+/* ----------------
+ * initial contents of pg_tablesample_method
+ * ----------------
+ */
+
+DATA(insert OID = 3283 ( system tsm_system_init tsm_system_nextblock tsm_system_nexttuple tsm_system_end tsm_system_reset tsm_system_cost ));
+DESCR("SYSTEM table sampling method");
+DATA(insert OID = 3284 ( bernoulli tsm_bernoulli_init tsm_bernoulli_nextblock tsm_bernoulli_nexttuple tsm_bernoulli_end tsm_bernoulli_reset tsm_bernoulli_cost ));
+DESCR("BERNOULLI table sampling method");
+
+#endif /* PG_TABLESAMPLE_METHOD_H */
diff --git a/src/include/executor/nodeSamplescan.h b/src/include/executor/nodeSamplescan.h
new file mode 100644
index 0000000..4b769da
--- /dev/null
+++ b/src/include/executor/nodeSamplescan.h
@@ -0,0 +1,24 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeSamplescan.h
+ *
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/executor/nodeSamplescan.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef NODESAMPLESCAN_H
+#define NODESAMPLESCAN_H
+
+#include "nodes/execnodes.h"
+
+extern SampleScanState *ExecInitSampleScan(SampleScan *node, EState *estate, int eflags);
+extern TupleTableSlot *ExecSampleScan(SampleScanState *node);
+extern void ExecEndSampleScan(SampleScanState *node);
+extern void ExecReScanSampleScan(SampleScanState *node);
+
+#endif /* NODESAMPLESCAN_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 41288ed..43e1a30 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1212,6 +1212,26 @@ typedef struct ScanState
typedef ScanState SeqScanState;
/*
+ * SampleScan
+ */
+typedef struct SampleScanState
+{
+ ScanState ss;
+
+ /* Sampling method functions. */
+ FmgrInfo tsminit;
+ FmgrInfo tsmnextblock;
+ FmgrInfo tsmnexttuple;
+ FmgrInfo tsmend;
+ FmgrInfo tsmreset;
+
+ Buffer openbuffer; /* currently open buffer */
+ HeapTupleData tup; /* last tuple */
+
+ void *tsmdata; /* for use by table scan method */
+} SampleScanState;
+
+/*
* These structs store information about index quals that don't have simple
* constant right-hand sides. See comments for ExecIndexBuildScanKeys()
* for discussion.
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 97ef0fc..99ac985 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -61,6 +61,7 @@ typedef enum NodeTag
T_ValuesScan,
T_CteScan,
T_WorkTableScan,
+ T_SampleScan,
T_ForeignScan,
T_CustomScan,
T_Join,
@@ -97,6 +98,7 @@ typedef enum NodeTag
T_BitmapOrState,
T_ScanState,
T_SeqScanState,
+ T_SampleScanState,
T_IndexScanState,
T_IndexOnlyScanState,
T_BitmapIndexScanState,
@@ -225,6 +227,7 @@ typedef enum NodeTag
T_MergePath,
T_HashPath,
T_TidPath,
+ T_SamplePath,
T_ForeignPath,
T_CustomPath,
T_AppendPath,
@@ -413,6 +416,8 @@ typedef enum NodeTag
T_XmlSerialize,
T_WithClause,
T_CommonTableExpr,
+ T_RangeTableSample,
+ T_TableSampleClause,
/*
* TAGS FOR REPLICATION GRAMMAR PARSE NODES (replnodes.h)
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index b1dfa85..d87343f 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -307,6 +307,23 @@ typedef struct FuncCall
} FuncCall;
/*
+ * TableSampleClause - a sampling method information
+ */
+typedef struct TableSampleClause
+{
+ NodeTag type;
+ Oid tsmid;
+ Oid tsminit;
+ Oid tsmnextblock;
+ Oid tsmnexttuple;
+ Oid tsmend;
+ Oid tsmreset;
+ Oid tsmcost;
+ Node *repeatable;
+ List *args;
+} TableSampleClause;
+
+/*
* A_Star - '*' representing all columns of a table or compound field
*
* This can appear within ColumnRef.fields, A_Indirection.indirection, and
@@ -507,6 +524,21 @@ typedef struct RangeFunction
} RangeFunction;
/*
+ * RangeTableSample - represents <table> TABLESAMPLE <method> (<params>) REPEATABLE (<num>)
+ *
+ * We are more generic than SQL Standard so we pass generic function
+ * arguments to the sampling method.
+ */
+typedef struct RangeTableSample
+{
+ NodeTag type;
+ RangeVar *relation;
+ char *method; /* sampling method */
+ Node *repeatable;
+ List *args; /* arguments for sampling method */
+} RangeTableSample;
+
+/*
* ColumnDef - column definition (used in various creates)
*
* If the column has a default value, we may have the value expression
@@ -751,6 +783,7 @@ typedef struct RangeTblEntry
*/
Oid relid; /* OID of the relation */
char relkind; /* relation kind (see pg_class.relkind) */
+ TableSampleClause *tablesample; /* sampling method and parameters */
/*
* Fields valid for a subquery RTE (else NULL):
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 316c9ce..8a2a146 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -278,6 +278,12 @@ typedef struct Scan
typedef Scan SeqScan;
/* ----------------
+ * table sample scan node
+ * ----------------
+ */
+typedef Scan SampleScan;
+
+/* ----------------
* index scan node
*
* indexqualorig is an implicitly-ANDed list of index qual expressions, each
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 6845a40..67c3b1f 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -870,6 +870,18 @@ typedef struct TidPath
} TidPath;
/*
+ * SamplePath represents a sample sacn
+ *
+ * args is list of parameters for the the TABLESAMPLE clause
+ */
+typedef struct SamplePath
+{
+ Path path;
+ Oid tsmcost; /* table sample method costing function */
+ List *tsmargs; /* arguments to a TABLESAMPLE clause */
+} SamplePath;
+
+/*
* ForeignPath represents a potential scan of a foreign table
*
* fdw_private stores FDW private data about the scan. While fdw_private is
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 9c2000b..3777054 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -68,6 +68,7 @@ extern double index_pages_fetched(double tuples_fetched, BlockNumber pages,
double index_pages, PlannerInfo *root);
extern void cost_seqscan(Path *path, PlannerInfo *root, RelOptInfo *baserel,
ParamPathInfo *param_info);
+extern void cost_samplescan(SamplePath *path, PlannerInfo *root, RelOptInfo *baserel);
extern void cost_index(IndexPath *path, PlannerInfo *root,
double loop_count);
extern void cost_bitmap_heap_scan(Path *path, PlannerInfo *root, RelOptInfo *baserel,
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 9923f0e..dfb580e 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -32,6 +32,8 @@ extern bool add_path_precheck(RelOptInfo *parent_rel,
extern Path *create_seqscan_path(PlannerInfo *root, RelOptInfo *rel,
Relids required_outer);
+extern SamplePath *create_samplescan_path(PlannerInfo *root, RelOptInfo *rel,
+ Relids required_outer);
extern IndexPath *create_index_path(PlannerInfo *root,
IndexOptInfo *index,
List *indexclauses,
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index 7c243ec..6ff7b44 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -312,7 +312,7 @@ PG_KEYWORD("reindex", REINDEX, UNRESERVED_KEYWORD)
PG_KEYWORD("relative", RELATIVE_P, UNRESERVED_KEYWORD)
PG_KEYWORD("release", RELEASE, UNRESERVED_KEYWORD)
PG_KEYWORD("rename", RENAME, UNRESERVED_KEYWORD)
-PG_KEYWORD("repeatable", REPEATABLE, UNRESERVED_KEYWORD)
+PG_KEYWORD("repeatable", REPEATABLE, RESERVED_KEYWORD)
PG_KEYWORD("replace", REPLACE, UNRESERVED_KEYWORD)
PG_KEYWORD("replica", REPLICA, UNRESERVED_KEYWORD)
PG_KEYWORD("reset", RESET, UNRESERVED_KEYWORD)
@@ -366,6 +366,7 @@ PG_KEYWORD("sysid", SYSID, UNRESERVED_KEYWORD)
PG_KEYWORD("system", SYSTEM_P, UNRESERVED_KEYWORD)
PG_KEYWORD("table", TABLE, RESERVED_KEYWORD)
PG_KEYWORD("tables", TABLES, UNRESERVED_KEYWORD)
+PG_KEYWORD("tablesample", TABLESAMPLE, TYPE_FUNC_NAME_KEYWORD)
PG_KEYWORD("tablespace", TABLESPACE, UNRESERVED_KEYWORD)
PG_KEYWORD("temp", TEMP, UNRESERVED_KEYWORD)
PG_KEYWORD("template", TEMPLATE, UNRESERVED_KEYWORD)
diff --git a/src/include/parser/parse_func.h b/src/include/parser/parse_func.h
index 3264691..6727e55 100644
--- a/src/include/parser/parse_func.h
+++ b/src/include/parser/parse_func.h
@@ -33,6 +33,10 @@ typedef enum
extern Node *ParseFuncOrColumn(ParseState *pstate, List *funcname, List *fargs,
FuncCall *fn, int location);
+extern TableSampleClause *ParseTableSample(ParseState *pstate,
+ char *samplemethod,
+ Node *repeatable, List *args);
+
extern FuncDetailCode func_get_detail(List *funcname,
List *fargs, List *fargnames,
int nargs, Oid *argtypes,
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index 6bd786d..185bd81 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -63,7 +63,6 @@ typedef struct RelationAmInfo
FmgrInfo amcanreturn;
} RelationAmInfo;
-
/*
* Here are the contents of a relation cache entry.
*/
diff --git a/src/include/utils/sampling.h b/src/include/utils/sampling.h
index 734cdc0..3098ab4 100644
--- a/src/include/utils/sampling.h
+++ b/src/include/utils/sampling.h
@@ -38,10 +38,6 @@ extern void BlockSampler_Init(BlockSampler bs, BlockNumber nblocks,
extern bool BlockSampler_HasMore(BlockSampler bs);
extern BlockNumber BlockSampler_Next(BlockSampler bs);
-/* Vitter reservoir sampling functions */
-extern double vitter_init_selection_state(int n);
-extern double vitter_get_next_S(double t, int n, double *stateptr);
-
/* Random generator */
extern void sampler_setseed(long seed);
extern double sampler_random_fract(void);
diff --git a/src/include/utils/syscache.h b/src/include/utils/syscache.h
index ba0b090..6b628f6 100644
--- a/src/include/utils/syscache.h
+++ b/src/include/utils/syscache.h
@@ -79,6 +79,8 @@ enum SysCacheIdentifier
RELOID,
RULERELNAME,
STATRELATTINH,
+ TABLESAMPLEMETHODNAME,
+ TABLESAMPLEMETHODOID,
TABLESPACEOID,
TSCONFIGMAP,
TSCONFIGNAMENSP,
diff --git a/src/include/utils/tablesample.h b/src/include/utils/tablesample.h
new file mode 100644
index 0000000..1a24cb6
--- /dev/null
+++ b/src/include/utils/tablesample.h
@@ -0,0 +1,27 @@
+/*--------------------------------------------------------------------------
+ * tablesample.h
+ * Header file for builtin table sampling methods.
+ *
+ * Copyright (c) 2006-2014, PostgreSQL Global Development Group
+ *
+ * src/include/utils/tablesample.h
+ *--------------------------------------------------------------------------
+ */
+#ifndef TABLESAMPLE_H
+#define TABLESAMPLE_H
+
+extern Datum tsm_system_init(PG_FUNCTION_ARGS);
+extern Datum tsm_system_nextblock(PG_FUNCTION_ARGS);
+extern Datum tsm_system_nexttuple(PG_FUNCTION_ARGS);
+extern Datum tsm_system_end(PG_FUNCTION_ARGS);
+extern Datum tsm_system_reset(PG_FUNCTION_ARGS);
+extern Datum tsm_system_cost(PG_FUNCTION_ARGS);
+
+extern Datum tsm_bernoulli_init(PG_FUNCTION_ARGS);
+extern Datum tsm_bernoulli_nextblock(PG_FUNCTION_ARGS);
+extern Datum tsm_bernoulli_nexttuple(PG_FUNCTION_ARGS);
+extern Datum tsm_bernoulli_end(PG_FUNCTION_ARGS);
+extern Datum tsm_bernoulli_reset(PG_FUNCTION_ARGS);
+extern Datum tsm_bernoulli_cost(PG_FUNCTION_ARGS);
+
+#endif /* TABLESAMPLE_H */
diff --git a/src/test/regress/expected/sanity_check.out b/src/test/regress/expected/sanity_check.out
index c7be273..5946edf 100644
--- a/src/test/regress/expected/sanity_check.out
+++ b/src/test/regress/expected/sanity_check.out
@@ -127,6 +127,7 @@ pg_shdepend|t
pg_shdescription|t
pg_shseclabel|t
pg_statistic|t
+pg_tablesample_method|t
pg_tablespace|t
pg_trigger|t
pg_ts_config|t
diff --git a/src/test/regress/expected/tablesample.out b/src/test/regress/expected/tablesample.out
new file mode 100644
index 0000000..436b754
--- /dev/null
+++ b/src/test/regress/expected/tablesample.out
@@ -0,0 +1,77 @@
+CREATE TABLE test_tablesample (id INT, name text) WITH (fillfactor=10); -- force smaller pages so we don't have to load too much data to get multiple pages
+INSERT INTO test_tablesample SELECT i, repeat(i::text, 200) FROM generate_series(0, 9) s(i) ORDER BY i;
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50) REPEATABLE (10);
+ id
+----
+ 0
+ 1
+ 2
+ 3
+ 4
+ 5
+ 9
+(7 rows)
+
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (100.0/11) REPEATABLE (9999);
+ id
+----
+ 6
+ 7
+ 8
+(3 rows)
+
+SELECT count(*) FROM test_tablesample TABLESAMPLE SYSTEM (100);
+ count
+-------
+ 10
+(1 row)
+
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50) REPEATABLE (100);
+ id
+----
+ 0
+ 1
+ 2
+ 6
+ 7
+ 8
+ 9
+(7 rows)
+
+SELECT id FROM test_tablesample TABLESAMPLE BERNOULLI (50) REPEATABLE (100);
+ id
+----
+ 0
+ 1
+ 3
+ 4
+ 5
+(5 rows)
+
+SELECT id FROM test_tablesample TABLESAMPLE BERNOULLI (5.5) REPEATABLE (1);
+ id
+----
+ 0
+ 5
+(2 rows)
+
+CREATE VIEW test_tablesample_v1 AS SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (10*2) REPEATABLE (2);
+CREATE VIEW test_tablesample_v2 AS SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (99);
+SELECT pg_get_viewdef('test_tablesample_v1'::regclass);
+ pg_get_viewdef
+--------------------------------------------------------------------------------
+ SELECT test_tablesample.id +
+ FROM test_tablesample TABLESAMPLE system (((10 * 2))::real) REPEATABLE (2);
+(1 row)
+
+SELECT pg_get_viewdef('test_tablesample_v2'::regclass);
+ pg_get_viewdef
+-----------------------------------------------------------
+ SELECT test_tablesample.id +
+ FROM test_tablesample TABLESAMPLE system ((99)::real);
+(1 row)
+
+DROP TABLE test_tablesample CASCADE;
+NOTICE: drop cascades to 2 other objects
+DETAIL: drop cascades to view test_tablesample_v1
+drop cascades to view test_tablesample_v2
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 62ef6ec..2e1b200 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -83,7 +83,7 @@ test: select_into select_distinct select_distinct_on select_implicit select_havi
# ----------
# Another group of parallel tests
# ----------
-test: brin gin gist spgist privileges security_label collate matview lock replica_identity object_address
+test: brin gin gist spgist privileges security_label collate matview lock replica_identity object_address tablesample
# rowsecurity creates an event trigger, so don't run it in parallel
test: rowsecurity
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index b491b97..2d74c9f 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -152,3 +152,4 @@ test: with
test: xml
test: event_trigger
test: stats
+test: tablesample
diff --git a/src/test/regress/sql/tablesample.sql b/src/test/regress/sql/tablesample.sql
new file mode 100644
index 0000000..e8313a3
--- /dev/null
+++ b/src/test/regress/sql/tablesample.sql
@@ -0,0 +1,17 @@
+CREATE TABLE test_tablesample (id INT, name text) WITH (fillfactor=10); -- force smaller pages so we don't have to load too much data to get multiple pages
+
+INSERT INTO test_tablesample SELECT i, repeat(i::text, 200) FROM generate_series(0, 9) s(i) ORDER BY i;
+
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50) REPEATABLE (10);
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (100.0/11) REPEATABLE (9999);
+SELECT count(*) FROM test_tablesample TABLESAMPLE SYSTEM (100);
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50) REPEATABLE (100);
+SELECT id FROM test_tablesample TABLESAMPLE BERNOULLI (50) REPEATABLE (100);
+SELECT id FROM test_tablesample TABLESAMPLE BERNOULLI (5.5) REPEATABLE (1);
+
+CREATE VIEW test_tablesample_v1 AS SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (10*2) REPEATABLE (2);
+CREATE VIEW test_tablesample_v2 AS SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (99);
+SELECT pg_get_viewdef('test_tablesample_v1'::regclass);
+SELECT pg_get_viewdef('test_tablesample_v2'::regclass);
+
+DROP TABLE test_tablesample CASCADE;
--
1.9.1
0003-tablesample-ddl.patchtext/x-diff; name=0003-tablesample-ddl.patchDownload
>From ec16f7e1dfe75f4f019a42801f60818c7c37bdcd Mon Sep 17 00:00:00 2001
From: Petr Jelinek <pjmodos@pjmodos.net>
Date: Wed, 7 Jan 2015 23:51:44 +0100
Subject: [PATCH 3/3] tablesample-ddl
---
doc/src/sgml/ref/allfiles.sgml | 2 +
doc/src/sgml/ref/create_tablesamplemethod.sgml | 149 ++++++++
doc/src/sgml/ref/drop_tablesamplemethod.sgml | 87 +++++
doc/src/sgml/reference.sgml | 2 +
src/backend/catalog/dependency.c | 15 +-
src/backend/catalog/objectaddress.c | 65 +++-
src/backend/commands/Makefile | 6 +-
src/backend/commands/dropcmds.c | 4 +
src/backend/commands/event_trigger.c | 3 +
src/backend/commands/tablecmds.c | 1 +
src/backend/commands/tablesample.c | 398 +++++++++++++++++++++
src/backend/parser/gram.y | 14 +-
src/backend/tcop/utility.c | 12 +
src/include/catalog/dependency.h | 1 +
src/include/catalog/pg_tablesample_method.h | 11 +
src/include/nodes/parsenodes.h | 1 +
src/include/nodes/relation.h | 2 +-
src/include/parser/kwlist.h | 1 +
src/test/modules/Makefile | 3 +-
src/test/modules/tablesample/.gitignore | 4 +
src/test/modules/tablesample/Makefile | 21 ++
.../modules/tablesample/expected/tablesample.out | 39 ++
src/test/modules/tablesample/sql/tablesample.sql | 14 +
src/test/modules/tablesample/tsm_test--1.0.sql | 44 +++
src/test/modules/tablesample/tsm_test.c | 179 +++++++++
src/test/modules/tablesample/tsm_test.control | 5 +
26 files changed, 1074 insertions(+), 9 deletions(-)
create mode 100644 doc/src/sgml/ref/create_tablesamplemethod.sgml
create mode 100644 doc/src/sgml/ref/drop_tablesamplemethod.sgml
create mode 100644 src/backend/commands/tablesample.c
create mode 100644 src/test/modules/tablesample/.gitignore
create mode 100644 src/test/modules/tablesample/Makefile
create mode 100644 src/test/modules/tablesample/expected/tablesample.out
create mode 100644 src/test/modules/tablesample/sql/tablesample.sql
create mode 100644 src/test/modules/tablesample/tsm_test--1.0.sql
create mode 100644 src/test/modules/tablesample/tsm_test.c
create mode 100644 src/test/modules/tablesample/tsm_test.control
diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index 7aa3128..2fad084 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -78,6 +78,7 @@ Complete list of usable sgml source files in this directory.
<!ENTITY createServer SYSTEM "create_server.sgml">
<!ENTITY createTable SYSTEM "create_table.sgml">
<!ENTITY createTableAs SYSTEM "create_table_as.sgml">
+<!ENTITY createTablesampleMethod SYSTEM "create_tablesamplemethod.sgml">
<!ENTITY createTableSpace SYSTEM "create_tablespace.sgml">
<!ENTITY createTrigger SYSTEM "create_trigger.sgml">
<!ENTITY createTSConfig SYSTEM "create_tsconfig.sgml">
@@ -119,6 +120,7 @@ Complete list of usable sgml source files in this directory.
<!ENTITY dropSequence SYSTEM "drop_sequence.sgml">
<!ENTITY dropServer SYSTEM "drop_server.sgml">
<!ENTITY dropTable SYSTEM "drop_table.sgml">
+<!ENTITY dropTablesampleMethod SYSTEM "drop_tablesamplemethod.sgml">
<!ENTITY dropTableSpace SYSTEM "drop_tablespace.sgml">
<!ENTITY dropTrigger SYSTEM "drop_trigger.sgml">
<!ENTITY dropTSConfig SYSTEM "drop_tsconfig.sgml">
diff --git a/doc/src/sgml/ref/create_tablesamplemethod.sgml b/doc/src/sgml/ref/create_tablesamplemethod.sgml
new file mode 100644
index 0000000..70720e5
--- /dev/null
+++ b/doc/src/sgml/ref/create_tablesamplemethod.sgml
@@ -0,0 +1,149 @@
+<!--
+doc/src/sgml/ref/create_tablesamplemethod.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="SQL-CREATETABLESAMPLEMETHOD">
+ <indexterm zone="sql-createtablesamplemethod">
+ <primary>CREATE TABLESAMPLE METHOD</primary>
+ </indexterm>
+
+ <refmeta>
+ <refentrytitle>CREATE TABLESAMPLE METHOD</refentrytitle>
+ <manvolnum>7</manvolnum>
+ <refmiscinfo>SQL - Language Statements</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+ <refname>CREATE TABLESAMPLE METHOD</refname>
+ <refpurpose>define custom tablesample method</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+<synopsis>
+CREATE TABLESAMPLE METHOD <replaceable class="parameter">name</replaceable> (
+ INIT = <replaceable class="parameter">init_function</replaceable> ,
+ NEXTBLOCK = <replaceable class="parameter">nextblock_function</replaceable> ,
+ NEXTTUPLE = <replaceable class="parameter">nexttuple_function</replaceable> ,
+ END = <replaceable class="parameter">end_function</replaceable> ,
+ RESET = <replaceable class="parameter">reset_function</replaceable> ,
+ COST = <replaceable class="parameter">cost_function</replaceable>
+)
+</synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+ <title>Description</title>
+
+ <para>
+ <command>CREATE TABLESAMPLE METHOD</command> creates a tablesample method.
+ A tablesample method provides alrorithm for reading sample part of a table
+ when used in <command>TABLESAMPLE</> clause of a <command>SELECT</>
+ statement.
+ </para>
+
+ <para>
+ You must be a superuser to use <command>CREATE TABLESAMPLE METHOD</command>.
+ </para>
+ </refsect1>
+
+ <refsect1>
+ <title>Parameters</title>
+
+ <variablelist>
+ <varlistentry>
+ <term><replaceable class="parameter">name</replaceable></term>
+ <listitem>
+ <para>
+ The name of the tablesample method to be created. This name must be
+ unique within the database.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable class="parameter">init_function</replaceable></term>
+ <listitem>
+ <para>
+ The name of the init function for the tablesample method.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable class="parameter">nextblock_function</replaceable></term>
+ <listitem>
+ <para>
+ The name of the get-next-block function for the tablesample method.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable class="parameter">nexttuple_function</replaceable></term>
+ <listitem>
+ <para>
+ The name of the get-next-tuple function for the tablesample method.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable class="parameter">end_function</replaceable></term>
+ <listitem>
+ <para>
+ The name of the end function for the tablesample method.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable class="parameter">reset_function</replaceable></term>
+ <listitem>
+ <para>
+ The name of the reset function for the tablesample method.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable class="parameter">cost_function</replaceable></term>
+ <listitem>
+ <para>
+ The name of the costing function for the tablesample method.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ </variablelist>
+
+ <para>
+ The function names can be schema-qualified if necessary. Argument types
+ are not given, since the argument list for each type of function is
+ predetermined. All functions are required.
+ </para>
+
+ <para>
+ The arguments can appear in any order, not only the one shown above.
+ </para>
+ </refsect1>
+
+ <refsect1>
+ <title>Compatibility</title>
+
+ <para>
+ There is no
+ <command>CREATE TABLESAMPLE METHOD</command> statement in the SQL
+ standard.
+ </para>
+ </refsect1>
+
+ <refsect1>
+ <title>See Also</title>
+
+ <simplelist type="inline">
+ <member><xref linkend="sql-droptablesamplemethod"></member>
+ <member><xref linkend="sql-select"></member>
+ </simplelist>
+ </refsect1>
+</refentry>
diff --git a/doc/src/sgml/ref/drop_tablesamplemethod.sgml b/doc/src/sgml/ref/drop_tablesamplemethod.sgml
new file mode 100644
index 0000000..dffd2ec
--- /dev/null
+++ b/doc/src/sgml/ref/drop_tablesamplemethod.sgml
@@ -0,0 +1,87 @@
+<!--
+doc/src/sgml/ref/drop_tablesamplemethod.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="SQL-DROPTABLESAMPLEMETHOD">
+ <indexterm zone="sql-droptablesamplemethod">
+ <primary>DROP TABLESAMPLE METHOD</primary>
+ </indexterm>
+
+ <refmeta>
+ <refentrytitle>DROP TABLESAMPLE METHOD</refentrytitle>
+ <manvolnum>7</manvolnum>
+ <refmiscinfo>SQL - Language Statements</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+ <refname>DROP TABLESAMPLE METHOD</refname>
+ <refpurpose>remove a custom tablesample method</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+<synopsis>
+DROP TABLESAMPLE METHOD [ IF EXISTS ] <replaceable class="parameter">name</replaceable>
+</synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+ <title>Description</title>
+
+ <para>
+ <command>DROP TABLESAMPLE METHOD</command> drop an existing tablesample
+ method.
+ </para>
+
+ <para>
+ You must be a superuser to use <command>CREATE TABLESAMPLE METHOD</command>.
+ </para>
+ </refsect1>
+
+ <refsect1>
+ <title>Parameters</title>
+
+ <variablelist>
+
+ <varlistentry>
+ <term><literal>IF EXISTS</literal></term>
+ <listitem>
+ <para>
+ Do not throw an error if the tablesample method does not exist.
+ A notice is issued in this case.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable class="parameter">name</replaceable></term>
+ <listitem>
+ <para>
+ The name of an existing tablesample method to be removed.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ </variablelist>
+
+ </refsect1>
+
+ <refsect1>
+ <title>Compatibility</title>
+
+ <para>
+ There is no
+ <command>DROP TABLESAMPLE METHOD</command> statement in the SQL
+ standard.
+ </para>
+ </refsect1>
+
+ <refsect1>
+ <title>See Also</title>
+
+ <simplelist type="inline">
+ <member><xref linkend="sql-createtablesamplemethod"></member>
+ <member><xref linkend="sql-select"></member>
+ </simplelist>
+ </refsect1>
+</refentry>
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index 10c9a6d..2c09a3c 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -106,6 +106,7 @@
&createServer;
&createTable;
&createTableAs;
+ &createTablesampleMethod;
&createTableSpace;
&createTSConfig;
&createTSDictionary;
@@ -147,6 +148,7 @@
&dropSequence;
&dropServer;
&dropTable;
+ &dropTablesampleMethod;
&dropTableSpace;
&dropTSConfig;
&dropTSDictionary;
diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index bacb242..6acb5b3 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -46,6 +46,7 @@
#include "catalog/pg_policy.h"
#include "catalog/pg_proc.h"
#include "catalog/pg_rewrite.h"
+#include "catalog/pg_tablesample_method.h"
#include "catalog/pg_tablespace.h"
#include "catalog/pg_trigger.h"
#include "catalog/pg_ts_config.h"
@@ -157,7 +158,8 @@ static const Oid object_classes[MAX_OCLASS] = {
DefaultAclRelationId, /* OCLASS_DEFACL */
ExtensionRelationId, /* OCLASS_EXTENSION */
EventTriggerRelationId, /* OCLASS_EVENT_TRIGGER */
- PolicyRelationId /* OCLASS_POLICY */
+ PolicyRelationId, /* OCLASS_POLICY */
+ TableSampleMethodRelationId /* OCLASS_TABLESAMPLEMETHOD */
};
@@ -1265,6 +1267,10 @@ doDeletion(const ObjectAddress *object, int flags)
RemovePolicyById(object->objectId);
break;
+ case OCLASS_TABLESAMPLEMETHOD:
+ RemoveTablesampleMethodById(object->objectId);
+ break;
+
default:
elog(ERROR, "unrecognized object class: %u",
object->classId);
@@ -1794,6 +1800,10 @@ find_expr_references_walker(Node *node,
case RTE_RELATION:
add_object_address(OCLASS_CLASS, rte->relid, 0,
context->addrs);
+ if (rte->tablesample)
+ add_object_address(OCLASS_TABLESAMPLEMETHOD,
+ rte->tablesample->tsmid, 0,
+ context->addrs);
break;
default:
break;
@@ -2373,6 +2383,9 @@ getObjectClass(const ObjectAddress *object)
case PolicyRelationId:
return OCLASS_POLICY;
+
+ case TableSampleMethodRelationId:
+ return OCLASS_TABLESAMPLEMETHOD;
}
/* shouldn't get here */
diff --git a/src/backend/catalog/objectaddress.c b/src/backend/catalog/objectaddress.c
index 825d8b2..02edc0a 100644
--- a/src/backend/catalog/objectaddress.c
+++ b/src/backend/catalog/objectaddress.c
@@ -44,6 +44,7 @@
#include "catalog/pg_proc.h"
#include "catalog/pg_policy.h"
#include "catalog/pg_rewrite.h"
+#include "catalog/pg_tablesample_method.h"
#include "catalog/pg_tablespace.h"
#include "catalog/pg_trigger.h"
#include "catalog/pg_ts_config.h"
@@ -429,7 +430,19 @@ static const ObjectPropertyType ObjectProperty[] =
Anum_pg_type_typacl,
ACL_KIND_TYPE,
true
- }
+ },
+ {
+ TableSampleMethodRelationId,
+ TableSampleMethodOidIndexId,
+ TABLESAMPLEMETHODOID,
+ TABLESAMPLEMETHODNAME,
+ Anum_pg_tablesample_method_tsmname,
+ InvalidAttrNumber,
+ InvalidAttrNumber,
+ InvalidAttrNumber,
+ -1,
+ true
+ },
};
/*
@@ -528,7 +541,9 @@ ObjectTypeMap[] =
/* OCLASS_EVENT_TRIGGER */
{ "event trigger", OBJECT_EVENT_TRIGGER },
/* OCLASS_POLICY */
- { "policy", OBJECT_POLICY }
+ { "policy", OBJECT_POLICY },
+ /* OCLASS_TABLESAMPLEMETHOD */
+ { "tablesample method", OBJECT_TABLESAMPLEMETHOD }
};
@@ -670,6 +685,7 @@ get_object_address(ObjectType objtype, List *objname, List *objargs,
case OBJECT_FDW:
case OBJECT_FOREIGN_SERVER:
case OBJECT_EVENT_TRIGGER:
+ case OBJECT_TABLESAMPLEMETHOD:
address = get_object_address_unqualified(objtype,
objname, missing_ok);
break;
@@ -896,6 +912,9 @@ get_object_address_unqualified(ObjectType objtype,
case OBJECT_EVENT_TRIGGER:
msg = gettext_noop("event trigger name cannot be qualified");
break;
+ case OBJECT_TABLESAMPLEMETHOD:
+ msg = gettext_noop("tablesample method name cannot be qualified");
+ break;
default:
elog(ERROR, "unrecognized objtype: %d", (int) objtype);
msg = NULL; /* placate compiler */
@@ -956,6 +975,11 @@ get_object_address_unqualified(ObjectType objtype,
address.objectId = get_event_trigger_oid(name, missing_ok);
address.objectSubId = 0;
break;
+ case OBJECT_TABLESAMPLEMETHOD:
+ address.classId = TableSampleMethodRelationId;
+ address.objectId = get_tablesample_method_oid(name, missing_ok);
+ address.objectSubId = 0;
+ break;
default:
elog(ERROR, "unrecognized objtype: %d", (int) objtype);
/* placate compiler, which doesn't know elog won't return */
@@ -1720,6 +1744,7 @@ check_object_ownership(Oid roleid, ObjectType objtype, ObjectAddress address,
break;
case OBJECT_TSPARSER:
case OBJECT_TSTEMPLATE:
+ case OBJECT_TABLESAMPLEMETHOD:
/* We treat these object types as being owned by superusers */
if (!superuser_arg(roleid))
ereport(ERROR,
@@ -2654,6 +2679,21 @@ getObjectDescription(const ObjectAddress *object)
break;
}
+ case OCLASS_TABLESAMPLEMETHOD:
+ {
+ HeapTuple tup;
+
+ tup = SearchSysCache1(TABLESAMPLEMETHODOID,
+ ObjectIdGetDatum(object->objectId));
+ if (!HeapTupleIsValid(tup))
+ elog(ERROR, "cache lookup failed for tablesample method %u",
+ object->objectId);
+ appendStringInfo(&buffer, _("tablesample method %s"),
+ NameStr(((Form_pg_tablesample_method) GETSTRUCT(tup))->tsmname));
+ ReleaseSysCache(tup);
+ break;
+ }
+
default:
appendStringInfo(&buffer, "unrecognized object %u %u %d",
object->classId,
@@ -3131,6 +3171,10 @@ getObjectTypeDescription(const ObjectAddress *object)
appendStringInfoString(&buffer, "policy");
break;
+ case OCLASS_TABLESAMPLEMETHOD:
+ appendStringInfoString(&buffer, "tablesample method");
+ break;
+
default:
appendStringInfo(&buffer, "unrecognized %u", object->classId);
break;
@@ -4025,6 +4069,23 @@ getObjectIdentityParts(const ObjectAddress *object,
break;
}
+ case OCLASS_TABLESAMPLEMETHOD:
+ {
+ HeapTuple tup;
+ Form_pg_tablesample_method tsmForm;
+
+ tup = SearchSysCache1(TABLESAMPLEMETHODOID,
+ ObjectIdGetDatum(object->objectId));
+ if (!HeapTupleIsValid(tup))
+ elog(ERROR, "cache lookup failed for tablesample method %u",
+ object->objectId);
+ tsmForm = (Form_pg_tablesample_method) GETSTRUCT(tup);
+ appendStringInfoString(&buffer,
+ quote_identifier(NameStr(tsmForm->tsmname)));
+ ReleaseSysCache(tup);
+ break;
+ }
+
default:
appendStringInfo(&buffer, "unrecognized object %u %u %d",
object->classId,
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index b1ac704..04fcd8c 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -18,8 +18,8 @@ OBJS = aggregatecmds.o alter.o analyze.o async.o cluster.o comment.o \
event_trigger.o explain.o extension.o foreigncmds.o functioncmds.o \
indexcmds.o lockcmds.o matview.o operatorcmds.o opclasscmds.o \
policy.o portalcmds.o prepare.o proclang.o \
- schemacmds.o seclabel.o sequence.o tablecmds.o tablespace.o trigger.o \
- tsearchcmds.o typecmds.o user.o vacuum.o vacuumlazy.o \
- variable.o view.o
+ schemacmds.o seclabel.o sequence.o tablecmds.o tablesample.o \
+ tablespace.o trigger.o tsearchcmds.o typecmds.o user.o vacuum.o \
+ vacuumlazy.o variable.o view.o
include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/commands/dropcmds.c b/src/backend/commands/dropcmds.c
index e5185ba..04d29a2 100644
--- a/src/backend/commands/dropcmds.c
+++ b/src/backend/commands/dropcmds.c
@@ -421,6 +421,10 @@ does_not_exist_skipping(ObjectType objtype, List *objname, List *objargs)
args = strVal(linitial(objargs));
}
break;
+ case OBJECT_TABLESAMPLEMETHOD:
+ msg = gettext_noop("tablesample method \"%s\" does not exist, skipping");
+ name = NameListToString(objname);
+ break;
default:
elog(ERROR, "unexpected object type (%d)", (int) objtype);
break;
diff --git a/src/backend/commands/event_trigger.c b/src/backend/commands/event_trigger.c
index a33a5ad..f20e9f7 100644
--- a/src/backend/commands/event_trigger.c
+++ b/src/backend/commands/event_trigger.c
@@ -97,6 +97,7 @@ static event_trigger_support_data event_trigger_support[] = {
{"SEQUENCE", true},
{"SERVER", true},
{"TABLE", true},
+ {"TABLESAMPLE METHOD", true},
{"TABLESPACE", false},
{"TRIGGER", true},
{"TEXT SEARCH CONFIGURATION", true},
@@ -1078,6 +1079,7 @@ EventTriggerSupportsObjectType(ObjectType obtype)
case OBJECT_SEQUENCE:
case OBJECT_TABCONSTRAINT:
case OBJECT_TABLE:
+ case OBJECT_TABLESAMPLEMETHOD:
case OBJECT_TRIGGER:
case OBJECT_TSCONFIGURATION:
case OBJECT_TSDICTIONARY:
@@ -1134,6 +1136,7 @@ EventTriggerSupportsObjectClass(ObjectClass objclass)
case OCLASS_DEFACL:
case OCLASS_EXTENSION:
case OCLASS_POLICY:
+ case OCLASS_TABLESAMPLEMETHOD:
return true;
case MAX_OCLASS:
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 66d5083..b67c560 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -8059,6 +8059,7 @@ ATExecAlterColumnType(AlteredTableInfo *tab, Relation rel,
case OCLASS_USER_MAPPING:
case OCLASS_DEFACL:
case OCLASS_EXTENSION:
+ case OCLASS_TABLESAMPLEMETHOD:
/*
* We don't expect any of these sorts of objects to depend on
diff --git a/src/backend/commands/tablesample.c b/src/backend/commands/tablesample.c
new file mode 100644
index 0000000..58cddf5
--- /dev/null
+++ b/src/backend/commands/tablesample.c
@@ -0,0 +1,398 @@
+/*-------------------------------------------------------------------------
+ *
+ * tablesample.c
+ * Commands to manipulate tablesample methods
+ *
+ * Table sampling methods provide algorithms for doing sample scan over
+ * the table.
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ * src/backend/commands/tablesample.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <unistd.h>
+#include <dirent.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "catalog/dependency.h"
+#include "catalog/indexing.h"
+#include "catalog/objectaccess.h"
+#include "catalog/pg_proc.h"
+#include "catalog/pg_tablesample_method.h"
+#include "catalog/pg_type.h"
+#include "commands/defrem.h"
+#include "miscadmin.h"
+#include "parser/parse_func.h"
+#include "utils/builtins.h"
+#include "utils/rel.h"
+#include "utils/lsyscache.h"
+#include "utils/syscache.h"
+
+
+static Datum
+get_tabmesample_method_func(DefElem *defel, int attnum)
+{
+ List *funcName = defGetQualifiedName(defel);
+ /* Big enough size for our needs. */
+ Oid *typeId = palloc0(6 * sizeof(Oid));
+ Oid retTypeId;
+ int nargs;
+ Oid procOid = InvalidOid;
+ FuncCandidateList clist;
+
+ switch (attnum)
+ {
+ case Anum_pg_tablesample_method_tsminit:
+ /*
+ * tsminit needs special handling because it is defined as function
+ * with 3 or more arguments and only first two arguments must have
+ * specific type, the rest is up to the tablesample method creator.
+ */
+ {
+ nargs = 2;
+ typeId[0] = INTERNALOID;
+ typeId[1] = INT4OID;
+ retTypeId = VOIDOID;
+
+ clist = FuncnameGetCandidates(funcName, -1, NIL, false, false, false);
+
+ while (clist)
+ {
+ if (clist->nargs >= 3 &&
+ memcmp(typeId, clist->args, nargs * sizeof(Oid)) == 0)
+ {
+ procOid = clist->oid;
+ /* Save real function signature for future errors. */
+ nargs = clist->nargs;
+ pfree(typeId);
+ typeId = clist->args;
+ break;
+ }
+ clist = clist->next;
+ }
+
+ if (!OidIsValid(procOid))
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_FUNCTION),
+ errmsg("function \"%s\" does not exist or does not have valid signature",
+ NameListToString(funcName)),
+ errhint("The tamplesample method init function "
+ "must have at least 3 input parameters "
+ "with first one of type INTERNAL and second of type INTEGER.")));
+ }
+ break;
+
+ case Anum_pg_tablesample_method_tsmnextblock:
+ nargs = 2;
+ typeId[0] = INTERNALOID;
+ typeId[1] = BOOLOID;
+ retTypeId = INT4OID;
+ break;
+ case Anum_pg_tablesample_method_tsmnexttuple:
+ nargs = 4;
+ typeId[0] = INTERNALOID;
+ typeId[1] = INT4OID;
+ typeId[2] = INT2OID;
+ typeId[3] = BOOLOID;
+ retTypeId = INT2OID;
+ break;
+ case Anum_pg_tablesample_method_tsmend:
+ case Anum_pg_tablesample_method_tsmreset:
+ nargs = 1;
+ typeId[0] = INTERNALOID;
+ retTypeId = VOIDOID;
+ break;
+ case Anum_pg_tablesample_method_tsmcost:
+ nargs = 6;
+ typeId[0] = INTERNALOID;
+ typeId[1] = INTERNALOID;
+ typeId[2] = INTERNALOID;
+ typeId[3] = INTERNALOID;
+ typeId[4] = INTERNALOID;
+ typeId[5] = INTERNALOID;
+ retTypeId = VOIDOID;
+ break;
+ default:
+ /* should not be here */
+ elog(ERROR, "unrecognized attribute for tablesample method: %d",
+ attnum);
+ nargs = 0; /* keep compiler quiet */
+ }
+
+ if (!OidIsValid(procOid))
+ procOid = LookupFuncName(funcName, nargs, typeId, false);
+ if (get_func_rettype(procOid) != retTypeId)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("function %s should return type %s",
+ func_signature_string(funcName, nargs, NIL, typeId),
+ format_type_be(retTypeId))));
+
+ return ObjectIdGetDatum(procOid);
+}
+
+/*
+ * make pg_depend entries for a new pg_tablsample_method entry
+ */
+static void
+makeParserDependencies(HeapTuple tuple)
+{
+ Form_pg_tablesample_method tsm = (Form_pg_tablesample_method) GETSTRUCT(tuple);
+ ObjectAddress myself,
+ referenced;
+
+ myself.classId = TableSampleMethodRelationId;
+ myself.objectId = HeapTupleGetOid(tuple);
+ myself.objectSubId = 0;
+
+ /* dependency on extension */
+ recordDependencyOnCurrentExtension(&myself, false);
+
+ /* dependencies on functions */
+ referenced.classId = ProcedureRelationId;
+ referenced.objectSubId = 0;
+
+ referenced.objectId = tsm->tsminit;
+ recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);
+
+ referenced.objectId = tsm->tsmnextblock;
+ recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);
+
+ referenced.objectId = tsm->tsmnexttuple;
+ recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);
+
+ referenced.objectId = tsm->tsmend;
+ recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);
+
+ referenced.objectId = tsm->tsmreset;
+ recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);
+
+ referenced.objectId = tsm->tsmcost;
+ recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);
+}
+
+/*
+ * Create a table sampling method
+ *
+ * Only superusers can create a table sampling methods.
+ */
+Oid
+DefineTablesampleMethod(List *names, List *parameters)
+{
+ char *tsmname = strVal(linitial(names));
+ Oid tsmoid;
+ ListCell *pl;
+ Relation rel;
+ Datum values[Natts_pg_tablesample_method];
+ bool nulls[Natts_pg_tablesample_method];
+ HeapTuple tuple;
+
+ /* Must be super user. */
+ if (!superuser())
+ ereport(ERROR,
+ (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+ errmsg("permission denied to create tablesample method \"%s\"",
+ tsmname),
+ errhint("Must be superuser to create a tablesample method.")));
+
+ /* Must not already exist. */
+ tsmoid = get_tablesample_method_oid(tsmname, true);
+ if (OidIsValid(tsmoid))
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_OBJECT),
+ errmsg("tablesample method \"%s\" already exists",
+ tsmname)));
+
+ /* Initialize the values. */
+ memset(values, 0, sizeof(values));
+ memset(nulls, false, sizeof(nulls));
+
+ values[Anum_pg_tablesample_method_tsmname - 1] =
+ DirectFunctionCall1(namein, CStringGetDatum(tsmname));
+
+ /*
+ * loop over the definition list and extract the information we need.
+ */
+ foreach(pl, parameters)
+ {
+ DefElem *defel = (DefElem *) lfirst(pl);
+
+ if (pg_strcasecmp(defel->defname, "init") == 0)
+ {
+ values[Anum_pg_tablesample_method_tsminit - 1] =
+ get_tabmesample_method_func(defel,
+ Anum_pg_tablesample_method_tsminit);
+ }
+ else if (pg_strcasecmp(defel->defname, "nextblock") == 0)
+ {
+ values[Anum_pg_tablesample_method_tsmnextblock - 1] =
+ get_tabmesample_method_func(defel,
+ Anum_pg_tablesample_method_tsmnextblock);
+ }
+ else if (pg_strcasecmp(defel->defname, "nexttuple") == 0)
+ {
+ values[Anum_pg_tablesample_method_tsmnexttuple - 1] =
+ get_tabmesample_method_func(defel,
+ Anum_pg_tablesample_method_tsmnexttuple);
+ }
+ else if (pg_strcasecmp(defel->defname, "end") == 0)
+ {
+ values[Anum_pg_tablesample_method_tsmend - 1] =
+ get_tabmesample_method_func(defel,
+ Anum_pg_tablesample_method_tsmend);
+ }
+ else if (pg_strcasecmp(defel->defname, "reset") == 0)
+ {
+ values[Anum_pg_tablesample_method_tsmreset - 1] =
+ get_tabmesample_method_func(defel,
+ Anum_pg_tablesample_method_tsmreset);
+ }
+ else if (pg_strcasecmp(defel->defname, "cost") == 0)
+ {
+ values[Anum_pg_tablesample_method_tsmcost - 1] =
+ get_tabmesample_method_func(defel,
+ Anum_pg_tablesample_method_tsmcost);
+ }
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("tablesample method parameter \"%s\" not recognized",
+ defel->defname)));
+ }
+
+ /*
+ * Validation.
+ */
+ if (!OidIsValid(DatumGetObjectId(values[Anum_pg_tablesample_method_tsminit - 1])))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("tablesample method init function is required")));
+
+ if (!OidIsValid(DatumGetObjectId(values[Anum_pg_tablesample_method_tsmnextblock - 1])))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("tablesample method nextblock function is required")));
+
+ if (!OidIsValid(DatumGetObjectId(values[Anum_pg_tablesample_method_tsmnexttuple - 1])))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("tablesample method nexttuple function is required")));
+
+ if (!OidIsValid(DatumGetObjectId(values[Anum_pg_tablesample_method_tsmend - 1])))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("tablesample method end function is required")));
+
+ if (!OidIsValid(DatumGetObjectId(values[Anum_pg_tablesample_method_tsmreset - 1])))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("tablesample method reset function is required")));
+
+ if (!OidIsValid(DatumGetObjectId(values[Anum_pg_tablesample_method_tsmcost - 1])))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("tablesample method cost function is required")));
+
+ /*
+ * Insert tuple into pg_tablesample_method.
+ */
+ rel = heap_open(TableSampleMethodRelationId, RowExclusiveLock);
+
+ tuple = heap_form_tuple(rel->rd_att, values, nulls);
+
+ tsmoid = simple_heap_insert(rel, tuple);
+
+ CatalogUpdateIndexes(rel, tuple);
+
+ makeParserDependencies(tuple);
+
+ heap_freetuple(tuple);
+
+ /* Post creation hook for new tablesample method */
+ InvokeObjectPostCreateHook(TableSampleMethodRelationId, tsmoid, 0);
+
+ heap_close(rel, RowExclusiveLock);
+
+ return tsmoid;
+}
+
+/*
+ * Drop a tablesample method.
+ */
+void
+RemoveTablesampleMethodById(Oid tsmoid)
+{
+ Relation rel;
+ HeapTuple tuple;
+ Form_pg_tablesample_method tsm;
+
+ /*
+ * Find the target tuple
+ */
+ rel = heap_open(TableSampleMethodRelationId, RowExclusiveLock);
+
+ tuple = SearchSysCache1(TABLESAMPLEMETHODOID, ObjectIdGetDatum(tsmoid));
+ if (!HeapTupleIsValid(tuple))
+ elog(ERROR, "cache lookup failed for tablesample method %u",
+ tsmoid);
+
+ tsm = (Form_pg_tablesample_method) GETSTRUCT(tuple);
+ /* Can't drop builtin tablesample methods. */
+ if (tsmoid == TABLESAMPLE_METHOD_SYSTEM_OID ||
+ tsmoid == TABLESAMPLE_METHOD_BERNOULLI_OID)
+ ereport(ERROR,
+ (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+ errmsg("permission denied for tablesample method %s",
+ NameStr(tsm->tsmname))));
+
+ /*
+ * Remove the pg_tablespace tuple (this will roll back if we fail below)
+ */
+ simple_heap_delete(rel, &tuple->t_self);
+
+ ReleaseSysCache(tuple);
+
+ heap_close(rel, RowExclusiveLock);
+}
+
+/*
+ * get_tablesample_method_oid - given a tablesample method name,
+ * look up the OID
+ *
+ * If missing_ok is false, throw an error if tablesample method name not found.
+ * If true, just return InvalidOid.
+ */
+Oid
+get_tablesample_method_oid(const char *tsmname, bool missing_ok)
+{
+ Oid result;
+ HeapTuple tuple;
+
+ tuple = SearchSysCache1(TABLESAMPLEMETHODNAME, PointerGetDatum(tsmname));
+ if (HeapTupleIsValid(tuple))
+ {
+ result = HeapTupleGetOid(tuple);
+ ReleaseSysCache(tuple);
+ }
+ else
+ result = InvalidOid;
+
+ if (!OidIsValid(result) && !missing_ok)
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_OBJECT),
+ errmsg("tablesample method \"%s\" does not exist",
+ tsmname)));
+
+ return result;
+}
+
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 01d72d4..4bf4aff 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -586,7 +586,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
LEADING LEAKPROOF LEAST LEFT LEVEL LIKE LIMIT LISTEN LOAD LOCAL
LOCALTIME LOCALTIMESTAMP LOCATION LOCK_P LOCKED LOGGED
- MAPPING MATCH MATERIALIZED MAXVALUE MINUTE_P MINVALUE MODE MONTH_P MOVE
+ MAPPING MATCH MATERIALIZED MAXVALUE METHOD MINUTE_P MINVALUE MODE MONTH_P
+ MOVE
NAME_P NAMES NATIONAL NATURAL NCHAR NEXT NO NONE
NOT NOTHING NOTIFY NOTNULL NOWAIT NULL_P NULLIF
@@ -5094,6 +5095,15 @@ DefineStmt:
n->definition = list_make1(makeDefElem("from", (Node *) $5));
$$ = (Node *)n;
}
+ | CREATE TABLESAMPLE METHOD name definition
+ {
+ DefineStmt *n = makeNode(DefineStmt);
+ n->kind = OBJECT_TABLESAMPLEMETHOD;
+ n->args = NIL;
+ n->defnames = list_make1(makeString($4));
+ n->definition = $5;
+ $$ = (Node *)n;
+ }
;
definition: '(' def_list ')' { $$ = $2; }
@@ -5552,6 +5562,7 @@ drop_type: TABLE { $$ = OBJECT_TABLE; }
| TEXT_P SEARCH DICTIONARY { $$ = OBJECT_TSDICTIONARY; }
| TEXT_P SEARCH TEMPLATE { $$ = OBJECT_TSTEMPLATE; }
| TEXT_P SEARCH CONFIGURATION { $$ = OBJECT_TSCONFIGURATION; }
+ | TABLESAMPLE METHOD { $$ = OBJECT_TABLESAMPLEMETHOD; }
;
any_name_list:
@@ -13313,6 +13324,7 @@ unreserved_keyword:
| MATCH
| MATERIALIZED
| MAXVALUE
+ | METHOD
| MINUTE_P
| MINVALUE
| MODE
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 3533cfa..532256d 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -23,6 +23,7 @@
#include "access/xlog.h"
#include "catalog/catalog.h"
#include "catalog/namespace.h"
+#include "catalog/pg_tablesample_method.h"
#include "catalog/toasting.h"
#include "commands/alter.h"
#include "commands/async.h"
@@ -1106,6 +1107,11 @@ ProcessUtilitySlow(Node *parsetree,
Assert(stmt->args == NIL);
DefineCollation(stmt->defnames, stmt->definition);
break;
+ case OBJECT_TABLESAMPLEMETHOD:
+ Assert(stmt->args == NIL);
+ Assert(list_length(stmt->defnames) == 1);
+ DefineTablesampleMethod(stmt->defnames, stmt->definition);
+ break;
default:
elog(ERROR, "unrecognized define stmt type: %d",
(int) stmt->kind);
@@ -1960,6 +1966,9 @@ CreateCommandTag(Node *parsetree)
case OBJECT_POLICY:
tag = "DROP POLICY";
break;
+ case OBJECT_TABLESAMPLEMETHOD:
+ tag = "DROP TABLESAMPLE METHOD";
+ break;
default:
tag = "???";
}
@@ -2056,6 +2065,9 @@ CreateCommandTag(Node *parsetree)
case OBJECT_COLLATION:
tag = "CREATE COLLATION";
break;
+ case OBJECT_TABLESAMPLEMETHOD:
+ tag = "CREATE TABLESAMPLE METHOD";
+ break;
default:
tag = "???";
}
diff --git a/src/include/catalog/dependency.h b/src/include/catalog/dependency.h
index 6481ac8..30653f8 100644
--- a/src/include/catalog/dependency.h
+++ b/src/include/catalog/dependency.h
@@ -148,6 +148,7 @@ typedef enum ObjectClass
OCLASS_EXTENSION, /* pg_extension */
OCLASS_EVENT_TRIGGER, /* pg_event_trigger */
OCLASS_POLICY, /* pg_policy */
+ OCLASS_TABLESAMPLEMETHOD, /* pg_tablesample_method */
MAX_OCLASS /* MUST BE LAST */
} ObjectClass;
diff --git a/src/include/catalog/pg_tablesample_method.h b/src/include/catalog/pg_tablesample_method.h
index 0e4a716..4ae8364 100644
--- a/src/include/catalog/pg_tablesample_method.h
+++ b/src/include/catalog/pg_tablesample_method.h
@@ -64,7 +64,18 @@ typedef FormData_pg_tablesample_method *Form_pg_tablesample_method;
DATA(insert OID = 3283 ( system tsm_system_init tsm_system_nextblock tsm_system_nexttuple tsm_system_end tsm_system_reset tsm_system_cost ));
DESCR("SYSTEM table sampling method");
+#define TABLESAMPLE_METHOD_SYSTEM_OID 3283
DATA(insert OID = 3284 ( bernoulli tsm_bernoulli_init tsm_bernoulli_nextblock tsm_bernoulli_nexttuple tsm_bernoulli_end tsm_bernoulli_reset tsm_bernoulli_cost ));
DESCR("BERNOULLI table sampling method");
+#define TABLESAMPLE_METHOD_BERNOULLI_OID 3284
+
+/* ----------------
+ * functions for manipulation of pg_tablesample_method
+ * ----------------
+ */
+
+extern Oid DefineTablesampleMethod(List *names, List *parameters);
+extern void RemoveTablesampleMethodById(Oid tsmoid);
+extern Oid get_tablesample_method_oid(const char *tsmname, bool missing_ok);
#endif /* PG_TABLESAMPLE_METHOD_H */
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index d87343f..54c4ba5 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -1267,6 +1267,7 @@ typedef enum ObjectType
OBJECT_SEQUENCE,
OBJECT_TABCONSTRAINT,
OBJECT_TABLE,
+ OBJECT_TABLESAMPLEMETHOD,
OBJECT_TABLESPACE,
OBJECT_TRIGGER,
OBJECT_TSCONFIGURATION,
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 67c3b1f..33c0f3d 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -877,7 +877,7 @@ typedef struct TidPath
typedef struct SamplePath
{
Path path;
- Oid tsmcost; /* table sample method costing function */
+ Oid tsmcost; /* tablesample method costing function */
List *tsmargs; /* arguments to a TABLESAMPLE clause */
} SamplePath;
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index 6ff7b44..c3269c0 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -236,6 +236,7 @@ PG_KEYWORD("mapping", MAPPING, UNRESERVED_KEYWORD)
PG_KEYWORD("match", MATCH, UNRESERVED_KEYWORD)
PG_KEYWORD("materialized", MATERIALIZED, UNRESERVED_KEYWORD)
PG_KEYWORD("maxvalue", MAXVALUE, UNRESERVED_KEYWORD)
+PG_KEYWORD("method", METHOD, UNRESERVED_KEYWORD)
PG_KEYWORD("minute", MINUTE_P, UNRESERVED_KEYWORD)
PG_KEYWORD("minvalue", MINVALUE, UNRESERVED_KEYWORD)
PG_KEYWORD("mode", MODE, UNRESERVED_KEYWORD)
diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile
index 93d93af..37ea524 100644
--- a/src/test/modules/Makefile
+++ b/src/test/modules/Makefile
@@ -9,7 +9,8 @@ SUBDIRS = \
worker_spi \
dummy_seclabel \
test_shm_mq \
- test_parser
+ test_parser \
+ tablesample
all: submake-errcodes
diff --git a/src/test/modules/tablesample/.gitignore b/src/test/modules/tablesample/.gitignore
new file mode 100644
index 0000000..5dcb3ff
--- /dev/null
+++ b/src/test/modules/tablesample/.gitignore
@@ -0,0 +1,4 @@
+# Generated subdirectories
+/log/
+/results/
+/tmp_check/
diff --git a/src/test/modules/tablesample/Makefile b/src/test/modules/tablesample/Makefile
new file mode 100644
index 0000000..f5220a9
--- /dev/null
+++ b/src/test/modules/tablesample/Makefile
@@ -0,0 +1,21 @@
+# src/test/modules/test_parser/Makefile
+
+MODULE_big = tsm_test
+OBJS = tsm_test.o $(WIN32RES)
+PGFILEDESC = "tsn_test - example of a custom tablesample method"
+
+EXTENSION = tsm_test
+DATA = tsm_test--1.0.sql
+
+REGRESS = tablesample
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/tablesample
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/src/test/modules/tablesample/expected/tablesample.out b/src/test/modules/tablesample/expected/tablesample.out
new file mode 100644
index 0000000..95c8036
--- /dev/null
+++ b/src/test/modules/tablesample/expected/tablesample.out
@@ -0,0 +1,39 @@
+CREATE EXTENSION tsm_test;
+CREATE TABLE test_tsm AS SELECT md5(i::text) a FROM generate_series(1,10) g(i);
+SELECT * FROM test_tsm TABLESAMPLE tsm_test(true);
+ a
+----------------------------------
+ c4ca4238a0b923820dcc509a6f75849b
+ c81e728d9d4c2f636f067f89cc14862c
+ eccbc87e4b5ce2fe28308fd9f2a7baf3
+ a87ff679a2f3e71d9181a67b7542122c
+ e4da3b7fbbce2345d7772b0674a318d5
+ 1679091c5a880faf6fb5e6087eb1b2dc
+ 8f14e45fceea167a5a36dedd4bea2543
+ c9f0f895fb98ab9159f51fd0297e236d
+ 45c48cce2e2d7fbdea1afc51c7c6ad26
+ d3d9446802a44259755d38e6d163e820
+(10 rows)
+
+SELECT * FROM test_tsm TABLESAMPLE tsm_test(false);
+ a
+---
+(0 rows)
+
+CREATE VIEW test_tsm_v AS SELECT * FROM test_tsm TABLESAMPLE tsm_test(true);
+DROP TABLESAMPLE METHOD tsm_test;
+ERROR: cannot drop tablesample method tsm_test because extension tsm_test requires it
+HINT: You can drop extension tsm_test instead.
+DROP EXTENSION tsm_test;
+ERROR: cannot drop extension tsm_test because other objects depend on it
+DETAIL: view test_tsm_v depends on tablesample method tsm_test
+HINT: Use DROP ... CASCADE to drop the dependent objects too.
+DROP EXTENSION tsm_test CASCADE;
+NOTICE: drop cascades to view test_tsm_v
+SELECT * FROM pg_tablesample_method;
+ tsmname | tsminit | tsmnextblock | tsmnexttuple | tsmend | tsmreset | tsmcost
+-----------+--------------------+-------------------------+-------------------------+-------------------+---------------------+--------------------
+ system | tsm_system_init | tsm_system_nextblock | tsm_system_nexttuple | tsm_system_end | tsm_system_reset | tsm_system_cost
+ bernoulli | tsm_bernoulli_init | tsm_bernoulli_nextblock | tsm_bernoulli_nexttuple | tsm_bernoulli_end | tsm_bernoulli_reset | tsm_bernoulli_cost
+(2 rows)
+
diff --git a/src/test/modules/tablesample/sql/tablesample.sql b/src/test/modules/tablesample/sql/tablesample.sql
new file mode 100644
index 0000000..70997bd
--- /dev/null
+++ b/src/test/modules/tablesample/sql/tablesample.sql
@@ -0,0 +1,14 @@
+CREATE EXTENSION tsm_test;
+
+CREATE TABLE test_tsm AS SELECT md5(i::text) a FROM generate_series(1,10) g(i);
+
+SELECT * FROM test_tsm TABLESAMPLE tsm_test(true);
+SELECT * FROM test_tsm TABLESAMPLE tsm_test(false);
+
+CREATE VIEW test_tsm_v AS SELECT * FROM test_tsm TABLESAMPLE tsm_test(true);
+
+DROP TABLESAMPLE METHOD tsm_test;
+DROP EXTENSION tsm_test;
+DROP EXTENSION tsm_test CASCADE;
+
+SELECT * FROM pg_tablesample_method;
diff --git a/src/test/modules/tablesample/tsm_test--1.0.sql b/src/test/modules/tablesample/tsm_test--1.0.sql
new file mode 100644
index 0000000..6cfa014
--- /dev/null
+++ b/src/test/modules/tablesample/tsm_test--1.0.sql
@@ -0,0 +1,44 @@
+/* src/test/modules/tablesample/tsm_test--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION tsm_test" to load this file. \quit
+
+CREATE FUNCTION tsm_test_init(internal, int4, bool)
+RETURNS void
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION tsm_test_nextblock(internal, bool)
+RETURNS int4
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION tsm_test_nexttuple(internal, int4, int2, bool)
+RETURNS int2
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION tsm_test_end(internal)
+RETURNS void
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION tsm_test_reset(internal)
+RETURNS void
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION tsm_test_cost(internal, internal, internal, internal, internal, internal)
+RETURNS void
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+
+CREATE TABLESAMPLE METHOD tsm_test (
+ INIT = tsm_test_init,
+ NEXTBLOCK = tsm_test_nextblock,
+ NEXTTUPLE = tsm_test_nexttuple,
+ END = tsm_test_end,
+ RESET = tsm_test_reset,
+ COST = tsm_test_cost
+);
diff --git a/src/test/modules/tablesample/tsm_test.c b/src/test/modules/tablesample/tsm_test.c
new file mode 100644
index 0000000..98d5721
--- /dev/null
+++ b/src/test/modules/tablesample/tsm_test.c
@@ -0,0 +1,179 @@
+/*-------------------------------------------------------------------------
+ *
+ * tsm_test.c
+ * Simple example of a custom tablesample method
+ *
+ * Copyright (c) 2007-2014, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/test/modules/tablesample/tsm_test.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "fmgr.h"
+
+#include "nodes/execnodes.h"
+#include "nodes/relation.h"
+#include "optimizer/clauses.h"
+#include "storage/bufmgr.h"
+#include "utils/sampling.h"
+
+PG_MODULE_MAGIC;
+
+/* State */
+typedef struct
+{
+ bool ret;
+ BlockNumber tblocks; /* total blocks in relation */
+ BlockNumber blockno; /* current block */
+ OffsetNumber lt; /* last tuple returned from current block */
+} tsm_test_state;
+
+
+PG_FUNCTION_INFO_V1(tsm_test_init);
+PG_FUNCTION_INFO_V1(tsm_test_nextblock);
+PG_FUNCTION_INFO_V1(tsm_test_nexttuple);
+PG_FUNCTION_INFO_V1(tsm_test_end);
+PG_FUNCTION_INFO_V1(tsm_test_reset);
+PG_FUNCTION_INFO_V1(tsm_test_cost);
+
+/*
+ * Initialize the state.
+ */
+Datum
+tsm_test_init(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ bool ret;
+ Relation rel = scanstate->ss.ss_currentRelation;
+ tsm_test_state *state;
+
+ if (PG_ARGISNULL(2))
+ ereport(ERROR,
+ (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+ errmsg("invalid parameter for tablesample method tsm_test"),
+ errhint("Return has cannot be NULL.")));
+
+ ret = PG_GETARG_BOOL(2);
+
+ state = palloc0(sizeof(tsm_test_state));
+
+ /* Remember initial values for reinit */
+ state->ret = ret;
+ state->tblocks = RelationGetNumberOfBlocks(rel);
+ state->blockno = InvalidBlockNumber;
+ state->lt = InvalidOffsetNumber;
+ scanstate->tsmdata = (void *) state;
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Get next block number to read or InvalidBlockNumber if we are at the
+ * end of the relation.
+ */
+Datum
+tsm_test_nextblock(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ tsm_test_state *state = (tsm_test_state *) scanstate->tsmdata;
+
+ if (!state->ret)
+ PG_RETURN_UINT32(InvalidBlockNumber);
+
+ if (state->blockno == InvalidBlockNumber)
+ state->blockno = 0;
+ else if (++state->blockno >= state->tblocks)
+ PG_RETURN_UINT32(InvalidBlockNumber);
+
+ PG_RETURN_UINT32(state->blockno);
+}
+
+/*
+ * Get next tuple from current block.
+ */
+Datum
+tsm_test_nexttuple(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ OffsetNumber maxoffset = PG_GETARG_UINT16(2);
+ tsm_test_state *state = (tsm_test_state *) scanstate->tsmdata;
+
+ if (!state->ret)
+ PG_RETURN_UINT16(InvalidOffsetNumber);
+
+ if (state->lt == InvalidOffsetNumber)
+ state->lt = FirstOffsetNumber;
+ else if (++state->lt > maxoffset)
+ PG_RETURN_UINT16(InvalidOffsetNumber);
+
+ PG_RETURN_UINT16(state->lt);
+}
+
+/*
+ * Cleanup method.
+ */
+Datum
+tsm_test_end(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+
+ pfree(scanstate->tsmdata);
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Reset state (called by ReScan).
+ */
+Datum
+tsm_test_reset(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ tsm_test_state *state = (tsm_test_state *) scanstate->tsmdata;
+
+ state->blockno = InvalidBlockNumber;
+ state->lt = InvalidOffsetNumber;
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Costing function.
+ */
+Datum
+tsm_test_cost(PG_FUNCTION_ARGS)
+{
+ PlannerInfo *root = (PlannerInfo *) PG_GETARG_POINTER(0);
+ SamplePath *path = (SamplePath *) PG_GETARG_POINTER(1);
+ RelOptInfo *baserel = (RelOptInfo *) PG_GETARG_POINTER(2);
+ BlockNumber *pages = (BlockNumber *) PG_GETARG_POINTER(3);
+ double *tuples = (double *) PG_GETARG_POINTER(4);
+ List *args = path->tsmargs;
+ Node *pctnode;
+ bool ret;
+
+ SamplerAccessStrategy *strategy = (SamplerAccessStrategy *) PG_GETARG_POINTER(5);
+
+ *strategy = SAS_SEQUENTIAL;
+
+ pctnode = linitial(args);
+ pctnode = estimate_expression_value(root, pctnode);
+
+ if (IsA(pctnode, RelabelType))
+ pctnode = (Node *) ((RelabelType *) pctnode)->arg;
+
+ if (IsA(pctnode, Const))
+ ret = DatumGetBool(((Const *) pctnode)->constvalue);
+ else
+ ret = true;
+
+ *pages = ret ? baserel->pages : 0;
+ *tuples = ret ? baserel->tuples : 0;
+
+ PG_RETURN_VOID();
+}
+
diff --git a/src/test/modules/tablesample/tsm_test.control b/src/test/modules/tablesample/tsm_test.control
new file mode 100644
index 0000000..a7b2741
--- /dev/null
+++ b/src/test/modules/tablesample/tsm_test.control
@@ -0,0 +1,5 @@
+# tsm_test extension
+comment = 'test module for custom tablesample method'
+default_version = '1.0'
+module_pathname = '$libdir/tsm_test'
+relocatable = true
--
1.9.1
On Fri, Jan 9, 2015 at 1:10 AM, Petr Jelinek <petr@2ndquadrant.com> wrote:
On 06/01/15 14:22, Petr Jelinek wrote:
On 06/01/15 08:51, Michael Paquier wrote:
On Tue, Dec 23, 2014 at 5:21 AM, Petr Jelinek <petr@2ndquadrant.com>
wrote:Attached is v3 which besides the fixes mentioned above also includes
changes
discussed with Tomas (except the CREATE/DROP TABLESAMPLE METHOD),
fixes for
crash with FETCH FIRST and is rebased against current master.This patch needs a rebase, there is a small conflict in
parallel_schedule.Sigh, I really wish we had automation that checks this automatically for
patches in CF.Here is rebase against current master.
Thanks!
Structurally speaking, I think that the tsm methods should be added in
src/backend/utils and not src/backend/access which is more high-level
as tsm_bernoulli.c and tsm_system.c contain only a set of newI am not sure if I parsed this correctly, do you mean to say that only
low-level access functions belong to src/backend/access? Makes sense.Made this change.
procedure functions. Having a single header file tsm.h would be also a
good thing.Moved into single tablesample.h file.
Regarding the naming, is "tsm" (table sample method) really appealing?
Wouldn't it be better to use simply tablesample_* for the file names
and the method names?I created the src/backend/tablesample and files are named just system.c and
bernoulli.c, but I kept tsm_ prefix for methods as they become too long for
my taste when prefixing with tablesample_.This is a large patch... Wouldn't sampling.[c|h] extracted from
ANALYZE live better as a refactoring patch? This would limit a bit bug
occurrences on the main patch.That's a good idea, I'll split it into patch series.
I've split the sampling.c/h into separate patch.
I also wrote basic CREATE/DROP TABLESAMPLE METHOD support, again as separate
patch in the attached patch-set. This also includes modules test with simple
custom tablesample method.There are some very minor cleanups in the main tablesample code itself but
no functional changes.
Some comments about the 1st patch:
1) Nitpicking:
+ * Block sampling routines shared by ANALYZE and TABLESAMPLE.
TABLESAMPLE is not added yet. You may as well mention simplify this
description with something like "Sampling routines for relation
blocks".
2) file_fdw and postgres_fdw do not compile correctly as they still
use anl_random_fract. This function is still mentioned in vacuum.h as
well.
3) Not really an issue of this patch, but I'd think that this comment
should be reworked:
+ * BlockSampler is used for stage one of our new two-stage tuple
+ * sampling mechanism as discussed on pgsql-hackers 2004-04-02 (subject
+ * "Large DB"). It selects a random sample of samplesize blocks out of
+ * the nblocks blocks in the table. If the table has less than
+ * samplesize blocks, all blocks are selected.
4) As a refactoring patch, why is the function providing a random
value changed? Shouldn't sample_random_fract be consistent with
anl_random_fract?
5) The seed numbers can be changed to RAND48_SEED_0, RAND48_SEED_1 and
RAND48_SEED_2 instead of being hardcoded?
Regards,
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 09/01/15 09:27, Michael Paquier wrote:
Some comments about the 1st patch:
1) Nitpicking:
+ * Block sampling routines shared by ANALYZE and TABLESAMPLE.
TABLESAMPLE is not added yet. You may as well mention simplify this
description with something like "Sampling routines for relation
blocks".
Changed.
2) file_fdw and postgres_fdw do not compile correctly as they still
use anl_random_fract. This function is still mentioned in vacuum.h as
well.
Gah, didn't notice this, fixed. And also since the Vitter's reservoir
sampling methods are used by other component than just analyze, I moved
those to sampling.c/h.
3) Not really an issue of this patch, but I'd think that this comment should be reworked: + * BlockSampler is used for stage one of our new two-stage tuple + * sampling mechanism as discussed on pgsql-hackers 2004-04-02 (subject + * "Large DB"). It selects a random sample of samplesize blocks out of + * the nblocks blocks in the table. If the table has less than + * samplesize blocks, all blocks are selected.
I changed the wording slightly, it's still not too great though.
4) As a refactoring patch, why is the function providing a random
value changed? Shouldn't sample_random_fract be consistent with
anl_random_fract?
Yeah I needed that for TABLESAMPLE and it should not really affect the
randomness but you are correct it should be part of second patch of the
patch-set.
5) The seed numbers can be changed to RAND48_SEED_0, RAND48_SEED_1 and
RAND48_SEED_2 instead of being hardcoded?
Regards,
Removed this part from the first patch as it's not needed there anymore.
In second patch which implements the TABLESAMPLE itself I changed the
implementation of random generator because when I looked at the code
again I realized the old one would produce wrong results if there were
multiple TABLESAMPLE statements in same query or multiple cursors in
same transaction.
In addition to the above changes I added test for cursors and test for
the issue with random generator I mentioned above. Also fixed some typos
in comments and function name. And finally I added note to docs saying
that same REPEATABLE might produce different results in subsequent queries.
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
0001-separate-block-sampling-functions-v2.patchtext/x-diff; name=0001-separate-block-sampling-functions-v2.patchDownload
>From 0d61cf71bd65ea054f0b80a12fb5c70bf652a0dc Mon Sep 17 00:00:00 2001
From: Petr Jelinek <pjmodos@pjmodos.net>
Date: Wed, 7 Jan 2015 23:36:56 +0100
Subject: [PATCH 1/3] separate block sampling functions v2
---
contrib/file_fdw/file_fdw.c | 9 +-
contrib/postgres_fdw/postgres_fdw.c | 10 +-
src/backend/commands/analyze.c | 225 +----------------------------------
src/backend/utils/misc/Makefile | 2 +-
src/backend/utils/misc/sampling.c | 226 ++++++++++++++++++++++++++++++++++++
src/include/commands/vacuum.h | 3 -
src/include/utils/sampling.h | 44 +++++++
7 files changed, 287 insertions(+), 232 deletions(-)
create mode 100644 src/backend/utils/misc/sampling.c
create mode 100644 src/include/utils/sampling.h
diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index d569760..df732c0 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -34,6 +34,7 @@
#include "optimizer/var.h"
#include "utils/memutils.h"
#include "utils/rel.h"
+#include "utils/sampling.h"
PG_MODULE_MAGIC;
@@ -1005,7 +1006,7 @@ file_acquire_sample_rows(Relation onerel, int elevel,
{
int numrows = 0;
double rowstoskip = -1; /* -1 means not set yet */
- double rstate;
+ ReservoirStateData rstate;
TupleDesc tupDesc;
Datum *values;
bool *nulls;
@@ -1043,7 +1044,7 @@ file_acquire_sample_rows(Relation onerel, int elevel,
ALLOCSET_DEFAULT_MAXSIZE);
/* Prepare for sampling rows */
- rstate = anl_init_selection_state(targrows);
+ reservoir_init_selection_state(&rstate, targrows);
/* Set up callback to identify error line number. */
errcallback.callback = CopyFromErrorCallback;
@@ -1087,7 +1088,7 @@ file_acquire_sample_rows(Relation onerel, int elevel,
* not-yet-incremented value of totalrows as t.
*/
if (rowstoskip < 0)
- rowstoskip = anl_get_next_S(*totalrows, targrows, &rstate);
+ rowstoskip = reservoir_get_next_S(&rstate, *totalrows, targrows);
if (rowstoskip <= 0)
{
@@ -1095,7 +1096,7 @@ file_acquire_sample_rows(Relation onerel, int elevel,
* Found a suitable tuple, so save it, replacing one old tuple
* at random
*/
- int k = (int) (targrows * anl_random_fract());
+ int k = (int) (targrows * sampler_random_fract());
Assert(k >= 0 && k < targrows);
heap_freetuple(rows[k]);
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index d76e739..cbcba6e 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -37,6 +37,7 @@
#include "utils/lsyscache.h"
#include "utils/memutils.h"
#include "utils/rel.h"
+#include "utils/sampling.h"
PG_MODULE_MAGIC;
@@ -202,7 +203,7 @@ typedef struct PgFdwAnalyzeState
/* for random sampling */
double samplerows; /* # of rows fetched */
double rowstoskip; /* # of rows to skip before next sample */
- double rstate; /* random state */
+ ReservoirStateData rstate; /* state for reservoir sampling*/
/* working memory contexts */
MemoryContext anl_cxt; /* context for per-analyze lifespan data */
@@ -2393,7 +2394,7 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,
astate.numrows = 0;
astate.samplerows = 0;
astate.rowstoskip = -1; /* -1 means not set yet */
- astate.rstate = anl_init_selection_state(targrows);
+ reservoir_init_selection_state(&astate.rstate, targrows);
/* Remember ANALYZE context, and create a per-tuple temp context */
astate.anl_cxt = CurrentMemoryContext;
@@ -2533,13 +2534,12 @@ analyze_row_processor(PGresult *res, int row, PgFdwAnalyzeState *astate)
* analyze.c; see Jeff Vitter's paper.
*/
if (astate->rowstoskip < 0)
- astate->rowstoskip = anl_get_next_S(astate->samplerows, targrows,
- &astate->rstate);
+ astate->rowstoskip = reservoir_get_next_S(&astate->rstate, astate->samplerows, targrows);
if (astate->rowstoskip <= 0)
{
/* Choose a random reservoir element to replace. */
- pos = (int) (targrows * anl_random_fract());
+ pos = (int) (targrows * sampler_random_fract());
Assert(pos >= 0 && pos < targrows);
heap_freetuple(astate->rows[pos]);
}
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 5de2b39..ae44d22 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -50,23 +50,13 @@
#include "utils/lsyscache.h"
#include "utils/memutils.h"
#include "utils/pg_rusage.h"
+#include "utils/sampling.h"
#include "utils/sortsupport.h"
#include "utils/syscache.h"
#include "utils/timestamp.h"
#include "utils/tqual.h"
-/* Data structure for Algorithm S from Knuth 3.4.2 */
-typedef struct
-{
- BlockNumber N; /* number of blocks, known in advance */
- int n; /* desired sample size */
- BlockNumber t; /* current block number */
- int m; /* blocks selected so far */
-} BlockSamplerData;
-
-typedef BlockSamplerData *BlockSampler;
-
/* Per-index data for ANALYZE */
typedef struct AnlIndexData
{
@@ -88,10 +78,6 @@ static BufferAccessStrategy vac_strategy;
static void do_analyze_rel(Relation onerel, VacuumStmt *vacstmt,
AcquireSampleRowsFunc acquirefunc, BlockNumber relpages,
bool inh, bool in_outer_xact, int elevel);
-static void BlockSampler_Init(BlockSampler bs, BlockNumber nblocks,
- int samplesize);
-static bool BlockSampler_HasMore(BlockSampler bs);
-static BlockNumber BlockSampler_Next(BlockSampler bs);
static void compute_index_stats(Relation onerel, double totalrows,
AnlIndexData *indexdata, int nindexes,
HeapTuple *rows, int numrows,
@@ -947,94 +933,6 @@ examine_attribute(Relation onerel, int attnum, Node *index_expr)
}
/*
- * BlockSampler_Init -- prepare for random sampling of blocknumbers
- *
- * BlockSampler is used for stage one of our new two-stage tuple
- * sampling mechanism as discussed on pgsql-hackers 2004-04-02 (subject
- * "Large DB"). It selects a random sample of samplesize blocks out of
- * the nblocks blocks in the table. If the table has less than
- * samplesize blocks, all blocks are selected.
- *
- * Since we know the total number of blocks in advance, we can use the
- * straightforward Algorithm S from Knuth 3.4.2, rather than Vitter's
- * algorithm.
- */
-static void
-BlockSampler_Init(BlockSampler bs, BlockNumber nblocks, int samplesize)
-{
- bs->N = nblocks; /* measured table size */
-
- /*
- * If we decide to reduce samplesize for tables that have less or not much
- * more than samplesize blocks, here is the place to do it.
- */
- bs->n = samplesize;
- bs->t = 0; /* blocks scanned so far */
- bs->m = 0; /* blocks selected so far */
-}
-
-static bool
-BlockSampler_HasMore(BlockSampler bs)
-{
- return (bs->t < bs->N) && (bs->m < bs->n);
-}
-
-static BlockNumber
-BlockSampler_Next(BlockSampler bs)
-{
- BlockNumber K = bs->N - bs->t; /* remaining blocks */
- int k = bs->n - bs->m; /* blocks still to sample */
- double p; /* probability to skip block */
- double V; /* random */
-
- Assert(BlockSampler_HasMore(bs)); /* hence K > 0 and k > 0 */
-
- if ((BlockNumber) k >= K)
- {
- /* need all the rest */
- bs->m++;
- return bs->t++;
- }
-
- /*----------
- * It is not obvious that this code matches Knuth's Algorithm S.
- * Knuth says to skip the current block with probability 1 - k/K.
- * If we are to skip, we should advance t (hence decrease K), and
- * repeat the same probabilistic test for the next block. The naive
- * implementation thus requires an anl_random_fract() call for each block
- * number. But we can reduce this to one anl_random_fract() call per
- * selected block, by noting that each time the while-test succeeds,
- * we can reinterpret V as a uniform random number in the range 0 to p.
- * Therefore, instead of choosing a new V, we just adjust p to be
- * the appropriate fraction of its former value, and our next loop
- * makes the appropriate probabilistic test.
- *
- * We have initially K > k > 0. If the loop reduces K to equal k,
- * the next while-test must fail since p will become exactly zero
- * (we assume there will not be roundoff error in the division).
- * (Note: Knuth suggests a "<=" loop condition, but we use "<" just
- * to be doubly sure about roundoff error.) Therefore K cannot become
- * less than k, which means that we cannot fail to select enough blocks.
- *----------
- */
- V = anl_random_fract();
- p = 1.0 - (double) k / (double) K;
- while (V < p)
- {
- /* skip */
- bs->t++;
- K--; /* keep K == N - t */
-
- /* adjust p to be new cutoff point in reduced range */
- p *= 1.0 - (double) k / (double) K;
- }
-
- /* select */
- bs->m++;
- return bs->t++;
-}
-
-/*
* acquire_sample_rows -- acquire a random sample of rows from the table
*
* Selected rows are returned in the caller-allocated array rows[], which
@@ -1080,7 +978,7 @@ acquire_sample_rows(Relation onerel, int elevel,
BlockNumber totalblocks;
TransactionId OldestXmin;
BlockSamplerData bs;
- double rstate;
+ ReservoirStateData rstate;
Assert(targrows > 0);
@@ -1090,9 +988,9 @@ acquire_sample_rows(Relation onerel, int elevel,
OldestXmin = GetOldestXmin(onerel, true);
/* Prepare for sampling block numbers */
- BlockSampler_Init(&bs, totalblocks, targrows);
+ BlockSampler_Init(&bs, totalblocks, targrows, random());
/* Prepare for sampling rows */
- rstate = anl_init_selection_state(targrows);
+ reservoir_init_selection_state(&rstate, targrows);
/* Outer loop over blocks to sample */
while (BlockSampler_HasMore(&bs))
@@ -1240,8 +1138,7 @@ acquire_sample_rows(Relation onerel, int elevel,
* t.
*/
if (rowstoskip < 0)
- rowstoskip = anl_get_next_S(samplerows, targrows,
- &rstate);
+ rowstoskip = reservoir_get_next_S(&rstate, samplerows, targrows);
if (rowstoskip <= 0)
{
@@ -1249,7 +1146,7 @@ acquire_sample_rows(Relation onerel, int elevel,
* Found a suitable tuple, so save it, replacing one
* old tuple at random
*/
- int k = (int) (targrows * anl_random_fract());
+ int k = (int) (targrows * sampler_random_fract());
Assert(k >= 0 && k < targrows);
heap_freetuple(rows[k]);
@@ -1308,116 +1205,6 @@ acquire_sample_rows(Relation onerel, int elevel,
return numrows;
}
-/* Select a random value R uniformly distributed in (0 - 1) */
-double
-anl_random_fract(void)
-{
- return ((double) random() + 1) / ((double) MAX_RANDOM_VALUE + 2);
-}
-
-/*
- * These two routines embody Algorithm Z from "Random sampling with a
- * reservoir" by Jeffrey S. Vitter, in ACM Trans. Math. Softw. 11, 1
- * (Mar. 1985), Pages 37-57. Vitter describes his algorithm in terms
- * of the count S of records to skip before processing another record.
- * It is computed primarily based on t, the number of records already read.
- * The only extra state needed between calls is W, a random state variable.
- *
- * anl_init_selection_state computes the initial W value.
- *
- * Given that we've already read t records (t >= n), anl_get_next_S
- * determines the number of records to skip before the next record is
- * processed.
- */
-double
-anl_init_selection_state(int n)
-{
- /* Initial value of W (for use when Algorithm Z is first applied) */
- return exp(-log(anl_random_fract()) / n);
-}
-
-double
-anl_get_next_S(double t, int n, double *stateptr)
-{
- double S;
-
- /* The magic constant here is T from Vitter's paper */
- if (t <= (22.0 * n))
- {
- /* Process records using Algorithm X until t is large enough */
- double V,
- quot;
-
- V = anl_random_fract(); /* Generate V */
- S = 0;
- t += 1;
- /* Note: "num" in Vitter's code is always equal to t - n */
- quot = (t - (double) n) / t;
- /* Find min S satisfying (4.1) */
- while (quot > V)
- {
- S += 1;
- t += 1;
- quot *= (t - (double) n) / t;
- }
- }
- else
- {
- /* Now apply Algorithm Z */
- double W = *stateptr;
- double term = t - (double) n + 1;
-
- for (;;)
- {
- double numer,
- numer_lim,
- denom;
- double U,
- X,
- lhs,
- rhs,
- y,
- tmp;
-
- /* Generate U and X */
- U = anl_random_fract();
- X = t * (W - 1.0);
- S = floor(X); /* S is tentatively set to floor(X) */
- /* Test if U <= h(S)/cg(X) in the manner of (6.3) */
- tmp = (t + 1) / term;
- lhs = exp(log(((U * tmp * tmp) * (term + S)) / (t + X)) / n);
- rhs = (((t + X) / (term + S)) * term) / t;
- if (lhs <= rhs)
- {
- W = rhs / lhs;
- break;
- }
- /* Test if U <= f(S)/cg(X) */
- y = (((U * (t + 1)) / term) * (t + S + 1)) / (t + X);
- if ((double) n < S)
- {
- denom = t;
- numer_lim = term + S;
- }
- else
- {
- denom = t - (double) n + S;
- numer_lim = t + 1;
- }
- for (numer = t + S; numer >= numer_lim; numer -= 1)
- {
- y *= numer / denom;
- denom -= 1;
- }
- W = exp(-log(anl_random_fract()) / n); /* Generate W in advance */
- if (exp(log(y) / n) <= (t + X) / t)
- break;
- }
- *stateptr = W;
- }
- return S;
-}
-
/*
* qsort comparator for sorting rows[] array
*/
diff --git a/src/backend/utils/misc/Makefile b/src/backend/utils/misc/Makefile
index 449d5b4..848ba29 100644
--- a/src/backend/utils/misc/Makefile
+++ b/src/backend/utils/misc/Makefile
@@ -15,7 +15,7 @@ include $(top_builddir)/src/Makefile.global
override CPPFLAGS := -I. -I$(srcdir) $(CPPFLAGS)
OBJS = guc.o help_config.o pg_rusage.o ps_status.o \
- superuser.o timeout.o tzparser.o
+ sampling.o superuser.o timeout.o tzparser.o
# This location might depend on the installation directories. Therefore
# we can't subsitute it into pg_config.h.
diff --git a/src/backend/utils/misc/sampling.c b/src/backend/utils/misc/sampling.c
new file mode 100644
index 0000000..1eeabaf
--- /dev/null
+++ b/src/backend/utils/misc/sampling.c
@@ -0,0 +1,226 @@
+/*-------------------------------------------------------------------------
+ *
+ * sampling.c
+ * Relation block sampling routines.
+ *
+ * Portions Copyright (c) 1996-2012, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ * src/backend/utils/misc/sampling.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include <math.h>
+
+#include "utils/sampling.h"
+
+
+/*
+ * BlockSampler_Init -- prepare for random sampling of blocknumbers
+ *
+ * BlockSampler provides algorithm for block level sampling of a relation
+ * as discussed on pgsql-hackers 2004-04-02 (subject "Large DB")
+ * It selects a random sample of samplesize blocks out of
+ * the nblocks blocks in the table. If the table has less than
+ * samplesize blocks, all blocks are selected.
+ *
+ * Since we know the total number of blocks in advance, we can use the
+ * straightforward Algorithm S from Knuth 3.4.2, rather than Vitter's
+ * algorithm.
+ */
+void
+BlockSampler_Init(BlockSampler bs, BlockNumber nblocks, int samplesize,
+ long randseed)
+{
+ bs->N = nblocks; /* measured table size */
+
+ /*
+ * If we decide to reduce samplesize for tables that have less or not much
+ * more than samplesize blocks, here is the place to do it.
+ */
+ bs->n = samplesize;
+ bs->t = 0; /* blocks scanned so far */
+ bs->m = 0; /* blocks selected so far */
+}
+
+bool
+BlockSampler_HasMore(BlockSampler bs)
+{
+ return (bs->t < bs->N) && (bs->m < bs->n);
+}
+
+BlockNumber
+BlockSampler_Next(BlockSampler bs)
+{
+ BlockNumber K = bs->N - bs->t; /* remaining blocks */
+ int k = bs->n - bs->m; /* blocks still to sample */
+ double p; /* probability to skip block */
+ double V; /* random */
+
+ Assert(BlockSampler_HasMore(bs)); /* hence K > 0 and k > 0 */
+
+ if ((BlockNumber) k >= K)
+ {
+ /* need all the rest */
+ bs->m++;
+ return bs->t++;
+ }
+
+ /*----------
+ * It is not obvious that this code matches Knuth's Algorithm S.
+ * Knuth says to skip the current block with probability 1 - k/K.
+ * If we are to skip, we should advance t (hence decrease K), and
+ * repeat the same probabilistic test for the next block. The naive
+ * implementation thus requires an sampler_random_fract() call for each
+ * block number. But we can reduce this to one sampler_random_fract()
+ * call per selected block, by noting that each time the while-test
+ * succeeds, we can reinterpret V as a uniform random number in the range
+ * 0 to p. Therefore, instead of choosing a new V, we just adjust p to be
+ * the appropriate fraction of its former value, and our next loop
+ * makes the appropriate probabilistic test.
+ *
+ * We have initially K > k > 0. If the loop reduces K to equal k,
+ * the next while-test must fail since p will become exactly zero
+ * (we assume there will not be roundoff error in the division).
+ * (Note: Knuth suggests a "<=" loop condition, but we use "<" just
+ * to be doubly sure about roundoff error.) Therefore K cannot become
+ * less than k, which means that we cannot fail to select enough blocks.
+ *----------
+ */
+ V = sampler_random_fract();
+ p = 1.0 - (double) k / (double) K;
+ while (V < p)
+ {
+ /* skip */
+ bs->t++;
+ K--; /* keep K == N - t */
+
+ /* adjust p to be new cutoff point in reduced range */
+ p *= 1.0 - (double) k / (double) K;
+ }
+
+ /* select */
+ bs->m++;
+ return bs->t++;
+}
+
+/*
+ * These two routines embody Algorithm Z from "Random sampling with a
+ * reservoir" by Jeffrey S. Vitter, in ACM Trans. Math. Softw. 11, 1
+ * (Mar. 1985), Pages 37-57. Vitter describes his algorithm in terms
+ * of the count S of records to skip before processing another record.
+ * It is computed primarily based on t, the number of records already read.
+ * The only extra state needed between calls is W, a random state variable.
+ *
+ * reservoir_init_selection_state computes the initial W value.
+ *
+ * Given that we've already read t records (t >= n), reservoir_get_next_S
+ * determines the number of records to skip before the next record is
+ * processed.
+ */
+void
+reservoir_init_selection_state(ReservoirState rs, int n)
+{
+ /* Initial value of W (for use when Algorithm Z is first applied) */
+ *rs = exp(-log(sampler_random_fract()) / n);
+}
+
+double
+reservoir_get_next_S(ReservoirState rs, double t, int n)
+{
+ double S;
+
+ /* The magic constant here is T from Vitter's paper */
+ if (t <= (22.0 * n))
+ {
+ /* Process records using Algorithm X until t is large enough */
+ double V,
+ quot;
+
+ V = sampler_random_fract(); /* Generate V */
+ S = 0;
+ t += 1;
+ /* Note: "num" in Vitter's code is always equal to t - n */
+ quot = (t - (double) n) / t;
+ /* Find min S satisfying (4.1) */
+ while (quot > V)
+ {
+ S += 1;
+ t += 1;
+ quot *= (t - (double) n) / t;
+ }
+ }
+ else
+ {
+ /* Now apply Algorithm Z */
+ double W = *rs;
+ double term = t - (double) n + 1;
+
+ for (;;)
+ {
+ double numer,
+ numer_lim,
+ denom;
+ double U,
+ X,
+ lhs,
+ rhs,
+ y,
+ tmp;
+
+ /* Generate U and X */
+ U = sampler_random_fract();
+ X = t * (W - 1.0);
+ S = floor(X); /* S is tentatively set to floor(X) */
+ /* Test if U <= h(S)/cg(X) in the manner of (6.3) */
+ tmp = (t + 1) / term;
+ lhs = exp(log(((U * tmp * tmp) * (term + S)) / (t + X)) / n);
+ rhs = (((t + X) / (term + S)) * term) / t;
+ if (lhs <= rhs)
+ {
+ W = rhs / lhs;
+ break;
+ }
+ /* Test if U <= f(S)/cg(X) */
+ y = (((U * (t + 1)) / term) * (t + S + 1)) / (t + X);
+ if ((double) n < S)
+ {
+ denom = t;
+ numer_lim = term + S;
+ }
+ else
+ {
+ denom = t - (double) n + S;
+ numer_lim = t + 1;
+ }
+ for (numer = t + S; numer >= numer_lim; numer -= 1)
+ {
+ y *= numer / denom;
+ denom -= 1;
+ }
+ W = exp(-log(sampler_random_fract()) / n); /* Generate W in advance */
+ if (exp(log(y) / n) <= (t + X) / t)
+ break;
+ }
+ *rs = W;
+ }
+ return S;
+}
+
+
+/*----------
+ * Random number generator used by sampling
+ *----------
+ */
+
+/* Select a random value R uniformly distributed in (0 - 1) */
+double
+sampler_random_fract()
+{
+ return ((double) random() + 1) / ((double) MAX_RANDOM_VALUE + 2);
+}
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index 4275484..d38fead 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -178,8 +178,5 @@ extern void lazy_vacuum_rel(Relation onerel, VacuumStmt *vacstmt,
extern void analyze_rel(Oid relid, VacuumStmt *vacstmt,
bool in_outer_xact, BufferAccessStrategy bstrategy);
extern bool std_typanalyze(VacAttrStats *stats);
-extern double anl_random_fract(void);
-extern double anl_init_selection_state(int n);
-extern double anl_get_next_S(double t, int n, double *stateptr);
#endif /* VACUUM_H */
diff --git a/src/include/utils/sampling.h b/src/include/utils/sampling.h
new file mode 100644
index 0000000..e3e7f9c
--- /dev/null
+++ b/src/include/utils/sampling.h
@@ -0,0 +1,44 @@
+/*-------------------------------------------------------------------------
+ *
+ * sampling.h
+ * definitions for sampling functions
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/utils/sampling.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef SAMPLING_H
+#define SAMPLING_H
+
+#include "storage/bufmgr.h"
+
+extern double sampler_random_fract(void);
+
+/* Block sampling methods */
+/* Data structure for Algorithm S from Knuth 3.4.2 */
+typedef struct
+{
+ BlockNumber N; /* number of blocks, known in advance */
+ int n; /* desired sample size */
+ BlockNumber t; /* current block number */
+ int m; /* blocks selected so far */
+} BlockSamplerData;
+
+typedef BlockSamplerData *BlockSampler;
+
+extern void BlockSampler_Init(BlockSampler bs, BlockNumber nblocks,
+ int samplesize, long randseed);
+extern bool BlockSampler_HasMore(BlockSampler bs);
+extern BlockNumber BlockSampler_Next(BlockSampler bs);
+
+/* Reservoid sampling methods */
+typedef double ReservoirStateData;
+typedef ReservoirStateData *ReservoirState;
+
+extern void reservoir_init_selection_state(ReservoirState rs, int n);
+extern double reservoir_get_next_S(ReservoirState rs, double t, int n);
+
+#endif /* SAMPLING_H */
--
1.9.1
0002-tablesample-v5.patchtext/x-diff; name=0002-tablesample-v5.patchDownload
>From 5efa4aaab144ac31a4b5a419747de72a0a788348 Mon Sep 17 00:00:00 2001
From: Petr Jelinek <pjmodos@pjmodos.net>
Date: Wed, 7 Jan 2015 23:37:55 +0100
Subject: [PATCH 2/3] tablesample v5
---
contrib/file_fdw/file_fdw.c | 2 +-
contrib/postgres_fdw/postgres_fdw.c | 2 +-
doc/src/sgml/ref/select.sgml | 38 ++-
src/backend/access/Makefile | 3 +-
src/backend/catalog/Makefile | 2 +-
src/backend/commands/analyze.c | 2 +-
src/backend/commands/explain.c | 7 +
src/backend/executor/Makefile | 2 +-
src/backend/executor/execAmi.c | 8 +
src/backend/executor/execCurrent.c | 1 +
src/backend/executor/execProcnode.c | 14 +
src/backend/executor/nodeSamplescan.c | 404 ++++++++++++++++++++++++++++
src/backend/nodes/copyfuncs.c | 57 ++++
src/backend/nodes/equalfuncs.c | 34 +++
src/backend/nodes/nodeFuncs.c | 12 +
src/backend/nodes/outfuncs.c | 58 ++++
src/backend/nodes/readfuncs.c | 42 +++
src/backend/optimizer/path/allpaths.c | 35 +++
src/backend/optimizer/path/costsize.c | 67 +++++
src/backend/optimizer/plan/createplan.c | 69 +++++
src/backend/optimizer/plan/setrefs.c | 11 +
src/backend/optimizer/plan/subselect.c | 1 +
src/backend/optimizer/util/pathnode.c | 29 ++
src/backend/parser/gram.y | 40 ++-
src/backend/parser/parse_clause.c | 38 ++-
src/backend/parser/parse_func.c | 128 +++++++++
src/backend/utils/Makefile | 3 +-
src/backend/utils/adt/ruleutils.c | 50 ++++
src/backend/utils/cache/syscache.c | 23 ++
src/backend/utils/misc/sampling.c | 33 ++-
src/backend/utils/tablesample/Makefile | 17 ++
src/backend/utils/tablesample/bernoulli.c | 203 ++++++++++++++
src/backend/utils/tablesample/system.c | 188 +++++++++++++
src/include/catalog/indexing.h | 5 +
src/include/catalog/pg_proc.h | 25 ++
src/include/catalog/pg_tablesample_method.h | 70 +++++
src/include/executor/nodeSamplescan.h | 24 ++
src/include/nodes/execnodes.h | 20 ++
src/include/nodes/nodes.h | 5 +
src/include/nodes/parsenodes.h | 33 +++
src/include/nodes/plannodes.h | 6 +
src/include/nodes/relation.h | 12 +
src/include/optimizer/cost.h | 1 +
src/include/optimizer/pathnode.h | 2 +
src/include/parser/kwlist.h | 3 +-
src/include/parser/parse_func.h | 4 +
src/include/utils/rel.h | 1 -
src/include/utils/sampling.h | 15 +-
src/include/utils/syscache.h | 2 +
src/include/utils/tablesample.h | 33 +++
src/test/regress/expected/sanity_check.out | 1 +
src/test/regress/expected/tablesample.out | 165 ++++++++++++
src/test/regress/parallel_schedule | 2 +-
src/test/regress/serial_schedule | 1 +
src/test/regress/sql/tablesample.sql | 39 +++
55 files changed, 2065 insertions(+), 27 deletions(-)
create mode 100644 src/backend/executor/nodeSamplescan.c
create mode 100644 src/backend/utils/tablesample/Makefile
create mode 100644 src/backend/utils/tablesample/bernoulli.c
create mode 100644 src/backend/utils/tablesample/system.c
create mode 100644 src/include/catalog/pg_tablesample_method.h
create mode 100644 src/include/executor/nodeSamplescan.h
create mode 100644 src/include/utils/tablesample.h
create mode 100644 src/test/regress/expected/tablesample.out
create mode 100644 src/test/regress/sql/tablesample.sql
diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index df732c0..3fc3962 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -1096,7 +1096,7 @@ file_acquire_sample_rows(Relation onerel, int elevel,
* Found a suitable tuple, so save it, replacing one old tuple
* at random
*/
- int k = (int) (targrows * sampler_random_fract());
+ int k = (int) (targrows * sampler_random_fract(rstate.randstate));
Assert(k >= 0 && k < targrows);
heap_freetuple(rows[k]);
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index cbcba6e..95f196e 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -2539,7 +2539,7 @@ analyze_row_processor(PGresult *res, int row, PgFdwAnalyzeState *astate)
if (astate->rowstoskip <= 0)
{
/* Choose a random reservoir element to replace. */
- pos = (int) (targrows * sampler_random_fract());
+ pos = (int) (targrows * sampler_random_fract(astate->rstate.randstate));
Assert(pos >= 0 && pos < targrows);
heap_freetuple(astate->rows[pos]);
}
diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml
index 01d24a5..407bf9d 100644
--- a/doc/src/sgml/ref/select.sgml
+++ b/doc/src/sgml/ref/select.sgml
@@ -49,7 +49,7 @@ SELECT [ ALL | DISTINCT [ ON ( <replaceable class="parameter">expression</replac
<phrase>where <replaceable class="parameter">from_item</replaceable> can be one of:</phrase>
- [ ONLY ] <replaceable class="parameter">table_name</replaceable> [ * ] [ [ AS ] <replaceable class="parameter">alias</replaceable> [ ( <replaceable class="parameter">column_alias</replaceable> [, ...] ) ] ]
+ [ ONLY ] <replaceable class="parameter">table_name</replaceable> [ * ] [ TABLESAMPLE <replaceable class="parameter">sampling_method</replaceable> ( <replaceable class="parameter">argument</replaceable> [, ...] ) [ REPEATABLE ( <replaceable class="parameter">seed</replaceable> ) ] ] [ [ AS ] <replaceable class="parameter">alias</replaceable> [ ( <replaceable class="parameter">column_alias</replaceable> [, ...] ) ] ]
[ LATERAL ] ( <replaceable class="parameter">select</replaceable> ) [ AS ] <replaceable class="parameter">alias</replaceable> [ ( <replaceable class="parameter">column_alias</replaceable> [, ...] ) ]
<replaceable class="parameter">with_query_name</replaceable> [ [ AS ] <replaceable class="parameter">alias</replaceable> [ ( <replaceable class="parameter">column_alias</replaceable> [, ...] ) ] ]
[ LATERAL ] <replaceable class="parameter">function_name</replaceable> ( [ <replaceable class="parameter">argument</replaceable> [, ...] ] )
@@ -317,6 +317,42 @@ TABLE [ ONLY ] <replaceable class="parameter">table_name</replaceable> [ * ]
</varlistentry>
<varlistentry>
+ <term>TABLESAMPLE <replaceable class="parameter">sampling_method</replaceable> ( <replaceable class="parameter">argument</replaceable> [, ...] ) [ REPEATABLE ( <replaceable class="parameter">seed</replaceable> ) ]</term>
+ <listitem>
+ <para>
+ Table sample clause after
+ <replaceable class="parameter">table_name</replaceable> indicates that
+ a <replaceable class="parameter">sampling_method</replaceable> should
+ be used to retrieve subset of rows in the table.
+ The <replaceable class="parameter">sampling_method</replaceable> can be
+ one of:
+ <itemizedlist>
+ <listitem>
+ <para><literal>SYSTEM</literal></para>
+ </listitem>
+ <listitem>
+ <para><literal>BERNOULLI</literal></para>
+ </listitem>
+ </itemizedlist>
+ Both of those sampling methods currently accept only single argument
+ which is the percent (floating point from 0 to 100) of the rows to
+ be returned.
+ The <literal>SYSTEM</literal> sampling method does block level
+ sampling with each block having same chance of being selected and
+ returns all rows from each selected block.
+ The <literal>BERNOULLI</literal> scans whole table and returns
+ individual rows with equal probability.
+ The optional numeric parameter <literal>REPEATABLE</literal> is used
+ as random seed for sampling. Note that subsequent commands may return
+ different results even if same <literal>REPEATABLE</literal> clause
+ was specified. This happens because <acronym>DML</acronym> statements
+ and maintenance operations such as <command>VACUUM</> affect physical
+ distribution of data.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><replaceable class="parameter">alias</replaceable></term>
<listitem>
<para>
diff --git a/src/backend/access/Makefile b/src/backend/access/Makefile
index 21721b4..238057a 100644
--- a/src/backend/access/Makefile
+++ b/src/backend/access/Makefile
@@ -8,6 +8,7 @@ subdir = src/backend/access
top_builddir = ../../..
include $(top_builddir)/src/Makefile.global
-SUBDIRS = brin common gin gist hash heap index nbtree rmgrdesc spgist transam
+SUBDIRS = brin common gin gist hash heap index nbtree rmgrdesc spgist \
+ transam
include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/catalog/Makefile b/src/backend/catalog/Makefile
index a403c64..34db3e6 100644
--- a/src/backend/catalog/Makefile
+++ b/src/backend/catalog/Makefile
@@ -39,7 +39,7 @@ POSTGRES_BKI_SRCS = $(addprefix $(top_srcdir)/src/include/catalog/,\
pg_ts_config.h pg_ts_config_map.h pg_ts_dict.h \
pg_ts_parser.h pg_ts_template.h pg_extension.h \
pg_foreign_data_wrapper.h pg_foreign_server.h pg_user_mapping.h \
- pg_foreign_table.h pg_policy.h \
+ pg_foreign_table.h pg_policy.h pg_tablesample_method.h \
pg_default_acl.h pg_seclabel.h pg_shseclabel.h pg_collation.h pg_range.h \
toasting.h indexing.h \
)
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index ae44d22..1dd4dcb 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -1146,7 +1146,7 @@ acquire_sample_rows(Relation onerel, int elevel,
* Found a suitable tuple, so save it, replacing one
* old tuple at random
*/
- int k = (int) (targrows * sampler_random_fract());
+ int k = (int) (targrows * sampler_random_fract(rstate.randstate));
Assert(k >= 0 && k < targrows);
heap_freetuple(rows[k]);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 8a0be5d..5152964 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -724,6 +724,7 @@ ExplainPreScanNode(PlanState *planstate, Bitmapset **rels_used)
case T_WorkTableScan:
case T_ForeignScan:
case T_CustomScan:
+ case T_SampleScan:
*rels_used = bms_add_member(*rels_used,
((Scan *) plan)->scanrelid);
break;
@@ -950,6 +951,9 @@ ExplainNode(PlanState *planstate, List *ancestors,
else
pname = sname;
break;
+ case T_SampleScan:
+ pname = sname = "Sample Scan";
+ break;
case T_Material:
pname = sname = "Materialize";
break;
@@ -1067,6 +1071,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
case T_WorkTableScan:
case T_ForeignScan:
case T_CustomScan:
+ case T_SampleScan:
ExplainScanTarget((Scan *) plan, es);
break;
case T_IndexScan:
@@ -1319,6 +1324,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
case T_CteScan:
case T_WorkTableScan:
case T_SubqueryScan:
+ case T_SampleScan:
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
@@ -2147,6 +2153,7 @@ ExplainTargetRel(Plan *plan, Index rti, ExplainState *es)
case T_TidScan:
case T_ForeignScan:
case T_CustomScan:
+ case T_SampleScan:
case T_ModifyTable:
/* Assert it's on a real relation */
Assert(rte->rtekind == RTE_RELATION);
diff --git a/src/backend/executor/Makefile b/src/backend/executor/Makefile
index af707b0..75f799c 100644
--- a/src/backend/executor/Makefile
+++ b/src/backend/executor/Makefile
@@ -21,7 +21,7 @@ OBJS = execAmi.o execCurrent.o execGrouping.o execJunk.o execMain.o \
nodeLimit.o nodeLockRows.o \
nodeMaterial.o nodeMergeAppend.o nodeMergejoin.o nodeModifyTable.o \
nodeNestloop.o nodeFunctionscan.o nodeRecursiveunion.o nodeResult.o \
- nodeSeqscan.o nodeSetOp.o nodeSort.o nodeUnique.o \
+ nodeSamplescan.o nodeSeqscan.o nodeSetOp.o nodeSort.o nodeUnique.o \
nodeValuesscan.o nodeCtescan.o nodeWorktablescan.o \
nodeGroup.o nodeSubplan.o nodeSubqueryscan.o nodeTidscan.o \
nodeForeignscan.o nodeWindowAgg.o tstoreReceiver.o spi.o
diff --git a/src/backend/executor/execAmi.c b/src/backend/executor/execAmi.c
index 6ebad2f..4948a26 100644
--- a/src/backend/executor/execAmi.c
+++ b/src/backend/executor/execAmi.c
@@ -39,6 +39,7 @@
#include "executor/nodeNestloop.h"
#include "executor/nodeRecursiveunion.h"
#include "executor/nodeResult.h"
+#include "executor/nodeSamplescan.h"
#include "executor/nodeSeqscan.h"
#include "executor/nodeSetOp.h"
#include "executor/nodeSort.h"
@@ -155,6 +156,10 @@ ExecReScan(PlanState *node)
ExecReScanSeqScan((SeqScanState *) node);
break;
+ case T_SampleScanState:
+ ExecReScanSampleScan((SampleScanState *) node);
+ break;
+
case T_IndexScanState:
ExecReScanIndexScan((IndexScanState *) node);
break;
@@ -480,6 +485,9 @@ ExecSupportsBackwardScan(Plan *node)
}
return false;
+ case T_SampleScan:
+ return false;
+
case T_Material:
case T_Sort:
/* these don't evaluate tlist */
diff --git a/src/backend/executor/execCurrent.c b/src/backend/executor/execCurrent.c
index 1c8be25..5cfe549 100644
--- a/src/backend/executor/execCurrent.c
+++ b/src/backend/executor/execCurrent.c
@@ -261,6 +261,7 @@ search_plan_tree(PlanState *node, Oid table_oid)
* Relation scan nodes can all be treated alike
*/
case T_SeqScanState:
+ case T_SampleScanState:
case T_IndexScanState:
case T_IndexOnlyScanState:
case T_BitmapHeapScanState:
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 9892499..03c2feb 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -102,6 +102,7 @@
#include "executor/nodeNestloop.h"
#include "executor/nodeRecursiveunion.h"
#include "executor/nodeResult.h"
+#include "executor/nodeSamplescan.h"
#include "executor/nodeSeqscan.h"
#include "executor/nodeSetOp.h"
#include "executor/nodeSort.h"
@@ -190,6 +191,11 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
estate, eflags);
break;
+ case T_SampleScan:
+ result = (PlanState *) ExecInitSampleScan((SampleScan *) node,
+ estate, eflags);
+ break;
+
case T_IndexScan:
result = (PlanState *) ExecInitIndexScan((IndexScan *) node,
estate, eflags);
@@ -406,6 +412,10 @@ ExecProcNode(PlanState *node)
result = ExecSeqScan((SeqScanState *) node);
break;
+ case T_SampleScanState:
+ result = ExecSampleScan((SampleScanState *) node);
+ break;
+
case T_IndexScanState:
result = ExecIndexScan((IndexScanState *) node);
break;
@@ -644,6 +654,10 @@ ExecEndNode(PlanState *node)
ExecEndSeqScan((SeqScanState *) node);
break;
+ case T_SampleScanState:
+ ExecEndSampleScan((SampleScanState *) node);
+ break;
+
case T_IndexScanState:
ExecEndIndexScan((IndexScanState *) node);
break;
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
new file mode 100644
index 0000000..27f5f05
--- /dev/null
+++ b/src/backend/executor/nodeSamplescan.c
@@ -0,0 +1,404 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeSamplescan.c
+ * Support routines for sample scans of relations (table sampling).
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ * src/backend/executor/nodeSamplescan.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/relscan.h"
+#include "catalog/pg_tablesample_method.h"
+#include "executor/executor.h"
+#include "executor/nodeSamplescan.h"
+#include "miscadmin.h"
+#include "parser/parsetree.h"
+#include "storage/bufmgr.h"
+#include "utils/rel.h"
+#include "utils/syscache.h"
+#include "utils/tqual.h"
+
+static void InitScanRelation(SampleScanState *node, EState *estate, int eflags);
+static TupleTableSlot *SampleNext(SampleScanState *node);
+
+/*
+ * Initialize the sampling method - loads function info and
+ * calls the tsminit function.
+ *
+ * We need special handling for this because the tsminit function
+ * is allowed to take optional additional arguments.
+ */
+static void
+InitSamplingMethod(SampleScanState *scanstate, TableSampleClause *tablesample)
+{
+ FunctionCallInfoData fcinfo;
+ int i;
+ List *args = tablesample->args;
+ ListCell *arg;
+ ExprContext *econtext = scanstate->ss.ps.ps_ExprContext;
+
+ /* Load functions */
+ fmgr_info(tablesample->tsminit, &(scanstate->tsminit));
+ fmgr_info(tablesample->tsmnextblock, &(scanstate->tsmnextblock));
+ fmgr_info(tablesample->tsmnexttuple, &(scanstate->tsmnexttuple));
+ fmgr_info(tablesample->tsmend, &(scanstate->tsmend));
+ fmgr_info(tablesample->tsmreset, &(scanstate->tsmreset));
+
+ InitFunctionCallInfoData(fcinfo, &scanstate->tsminit,
+ list_length(args) + 2,
+ InvalidOid, NULL, NULL);
+
+ /* First arg is always SampleScanState */
+ fcinfo.arg[0] = PointerGetDatum(scanstate);
+ fcinfo.argnull[0] = false;
+
+ /*
+ * Second arg is always REPEATABLE
+ * When tablesample->repeatable is NULL then REPEATABLE clause was not
+ * specified.
+ * When specified, the expression cannot evaluate to NULL.
+ */
+ if (tablesample->repeatable)
+ {
+ ExprState *argstate = ExecInitExpr((Expr *) tablesample->repeatable,
+ (PlanState *) scanstate);
+ fcinfo.arg[1] = ExecEvalExpr(argstate, econtext,
+ &fcinfo.argnull[1], NULL);
+ if (fcinfo.argnull[1])
+ ereport(ERROR,
+ (errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED),
+ errmsg("REPEATABLE clause cannot be NULL")));
+ }
+ else
+ {
+ fcinfo.arg[1] = UInt32GetDatum(random());
+ fcinfo.argnull[1] = false;
+ }
+
+
+ i = 2;
+ foreach(arg, args)
+ {
+ Expr *argexpr = (Expr *) lfirst(arg);
+ ExprState *argstate = ExecInitExpr(argexpr, (PlanState *) scanstate);
+
+ if (argstate == NULL)
+ {
+ fcinfo.argnull[i] = true;
+ fcinfo.arg[i] = (Datum) 0;;
+ }
+
+ fcinfo.arg[i] = ExecEvalExpr(argstate, econtext,
+ &fcinfo.argnull[i], NULL);
+ i++;
+ }
+ Assert(i == fcinfo.nargs);
+
+ (void) FunctionCallInvoke(&fcinfo);
+}
+
+
+/* ----------------------------------------------------------------
+ * Scan Support
+ * ----------------------------------------------------------------
+ */
+
+/* ----------------------------------------------------------------
+ * SampleNext
+ *
+ * This is a workhorse for ExecSampleScan
+ * ----------------------------------------------------------------
+ */
+static TupleTableSlot *
+SampleNext(SampleScanState *node)
+{
+ EState *estate;
+ TupleTableSlot *slot;
+ BlockNumber blockno = InvalidBlockNumber;
+ Snapshot snapshot;
+ Relation relation;
+ bool found = false;
+ bool retry = false;
+ OffsetNumber tupoffset, maxoffset;
+ Buffer buffer;
+ Page page;
+ HeapTuple tuple = &(node->tup);
+
+ /*
+ * get information from the estate and scan state
+ */
+ estate = node->ss.ps.state;
+ snapshot = estate->es_snapshot;
+ slot = node->ss.ss_ScanTupleSlot;
+ relation = node->ss.ss_currentRelation;
+ buffer = node->openbuffer;
+
+ if (BufferIsValid(buffer))
+ {
+ blockno = BufferGetBlockNumber(buffer);
+ page = BufferGetPage(buffer);
+ maxoffset = PageGetMaxOffsetNumber(page);
+ }
+
+ /*
+ * get the next tuple from the table
+ */
+ for (;;)
+ {
+ ItemId itemid;
+
+ CHECK_FOR_INTERRUPTS();
+
+ /* Load next block if needed. */
+ if (!BufferIsValid(buffer))
+ {
+ blockno = DatumGetInt32(FunctionCall2(&node->tsmnextblock,
+ PointerGetDatum(node),
+ BoolGetDatum(retry)));
+ /* No more blocks to fetch */
+ if (!BlockNumberIsValid(blockno))
+ break;
+
+ buffer = ReadBufferExtended(relation, MAIN_FORKNUM, blockno,
+ RBM_NORMAL, NULL);
+ LockBuffer(buffer, BUFFER_LOCK_SHARE);
+
+ node->openbuffer = buffer;
+ page = BufferGetPage(buffer);
+ maxoffset = PageGetMaxOffsetNumber(page);
+ }
+
+ tupoffset = DatumGetUInt16(FunctionCall4(&node->tsmnexttuple,
+ PointerGetDatum(node),
+ UInt32GetDatum(blockno),
+ UInt16GetDatum(maxoffset),
+ BoolGetDatum(retry)));
+ /* Go to next block. */
+ if (!OffsetNumberIsValid(tupoffset))
+ {
+ UnlockReleaseBuffer(buffer);
+ node->openbuffer = buffer = InvalidBuffer;
+ continue;
+ }
+ retry = true;
+
+ /* Skip invalid tuple pointers. */
+ itemid = PageGetItemId(page, tupoffset);
+ if (!ItemIdIsNormal(itemid))
+ continue;
+
+ tuple->t_tableOid = RelationGetRelid(relation);
+ tuple->t_data = (HeapTupleHeader) PageGetItem(page, itemid);
+ tuple->t_len = ItemIdGetLength(itemid);
+ ItemPointerSet(&tuple->t_self, blockno, tupoffset);
+
+ /* Found visible tuple, return it. */
+ if (HeapTupleSatisfiesVisibility(tuple, snapshot, buffer))
+ {
+ found = true;
+ break;
+ }
+ }
+
+ if (found)
+ ExecStoreTuple(tuple, /* tuple to store */
+ slot, /* slot to store in */
+ buffer, /* buffer associated with this tuple */
+ false); /* don't pfree this pointer */
+ else
+ ExecClearTuple(slot);
+
+ return slot;
+}
+
+/*
+ * SampleRecheck -- access method routine to recheck a tuple in EvalPlanQual
+ */
+static bool
+SampleRecheck(SampleScanState *node, TupleTableSlot *slot)
+{
+ /* No need to recheck for SampleScan */
+ return true;
+}
+
+/* ----------------------------------------------------------------
+ * ExecSampleScan(node)
+ *
+ * Scans the relation using the sampling method and returns
+ * the next qualifying tuple.
+ * We call the ExecScan() routine and pass it the appropriate
+ * access method functions.
+ * ----------------------------------------------------------------
+ */
+TupleTableSlot *
+ExecSampleScan(SampleScanState *node)
+{
+ return ExecScan((ScanState *) node,
+ (ExecScanAccessMtd) SampleNext,
+ (ExecScanRecheckMtd) SampleRecheck);
+}
+
+/* ----------------------------------------------------------------
+ * InitScanRelation
+ *
+ * Set up to access the scan relation.
+ * ----------------------------------------------------------------
+ */
+static void
+InitScanRelation(SampleScanState *node, EState *estate, int eflags)
+{
+ Relation currentRelation;
+
+ /*
+ * get the relation object id from the relid'th entry in the range table,
+ * open that relation and acquire appropriate lock on it.
+ */
+ currentRelation = ExecOpenScanRelation(estate,
+ ((SampleScan *) node->ss.ps.plan)->scanrelid,
+ eflags);
+
+ node->ss.ss_currentRelation = currentRelation;
+ node->ss.ss_currentScanDesc = NULL;
+
+ /* and report the scan tuple slot's rowtype */
+ ExecAssignScanType(&node->ss, RelationGetDescr(currentRelation));
+}
+
+
+/* ----------------------------------------------------------------
+ * ExecInitSampleScan
+ * ----------------------------------------------------------------
+ */
+SampleScanState *
+ExecInitSampleScan(SampleScan *node, EState *estate, int eflags)
+{
+ SampleScanState *scanstate;
+ RangeTblEntry *rte = rt_fetch(node->scanrelid,
+ estate->es_range_table);
+
+ Assert(outerPlan(node) == NULL);
+ Assert(innerPlan(node) == NULL);
+ Assert(rte->tablesample != NULL);
+
+ /*
+ * create state structure
+ */
+ scanstate = makeNode(SampleScanState);
+ scanstate->ss.ps.plan = (Plan *) node;
+ scanstate->ss.ps.state = estate;
+
+ /*
+ * Miscellaneous initialization
+ *
+ * create expression context for node
+ */
+ ExecAssignExprContext(estate, &scanstate->ss.ps);
+
+ /*
+ * initialize child expressions
+ */
+ scanstate->ss.ps.targetlist = (List *)
+ ExecInitExpr((Expr *) node->plan.targetlist,
+ (PlanState *) scanstate);
+ scanstate->ss.ps.qual = (List *)
+ ExecInitExpr((Expr *) node->plan.qual,
+ (PlanState *) scanstate);
+
+ /*
+ * tuple table initialization
+ */
+ ExecInitResultTupleSlot(estate, &scanstate->ss.ps);
+ ExecInitScanTupleSlot(estate, &scanstate->ss);
+
+ /*
+ * initialize scan relation
+ */
+ InitScanRelation(scanstate, estate, eflags);
+
+ scanstate->ss.ps.ps_TupFromTlist = false;
+
+ /*
+ * Initialize result tuple type and projection info.
+ */
+ ExecAssignResultTypeFromTL(&scanstate->ss.ps);
+ ExecAssignScanProjectionInfo(&scanstate->ss);
+
+ scanstate->openbuffer = InvalidBuffer;
+
+ InitSamplingMethod(scanstate, rte->tablesample);
+
+ return scanstate;
+}
+
+/* ----------------------------------------------------------------
+ * ExecEndSampleScan
+ *
+ * frees any storage allocated through C routines.
+ * ----------------------------------------------------------------
+ */
+void
+ExecEndSampleScan(SampleScanState *node)
+{
+ /*
+ * Tell sampling function that we finished thes can.
+ */
+ FunctionCall1(&node->tsmend, PointerGetDatum(node));
+
+ if (BufferIsValid(node->openbuffer))
+ {
+ UnlockReleaseBuffer(node->openbuffer);
+ node->openbuffer = InvalidBuffer;
+ }
+
+ /*
+ * Free the exprcontext
+ */
+ ExecFreeExprContext(&node->ss.ps);
+
+ /*
+ * clean out the tuple table
+ */
+ ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
+
+ /*
+ * close the heap relation.
+ */
+ ExecCloseScanRelation(node->ss.ss_currentRelation);
+}
+
+/* ----------------------------------------------------------------
+ * Join Support
+ * ----------------------------------------------------------------
+ */
+
+/* ----------------------------------------------------------------
+ * ExecReScanSampleScan
+ *
+ * Rescans the relation.
+ *
+ * ----------------------------------------------------------------
+ */
+void
+ExecReScanSampleScan(SampleScanState *scanstate)
+{
+ if (BufferIsValid(scanstate->openbuffer))
+ {
+ UnlockReleaseBuffer(scanstate->openbuffer);
+ scanstate->openbuffer = InvalidBuffer;
+ }
+
+ /*
+ * Tell sampling function to reset its state for rescan.
+ */
+ FunctionCall1(&scanstate->tsmreset, PointerGetDatum(scanstate));
+
+ ExecScanReScan(&scanstate->ss);
+}
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index f1a24f5..8861512 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -628,6 +628,22 @@ _copyCustomScan(const CustomScan *from)
}
/*
+ * _copySampleScan
+ */
+static SampleScan *
+_copySampleScan(const SampleScan *from)
+{
+ SampleScan *newnode = makeNode(SampleScan);
+
+ /*
+ * copy node superclass fields
+ */
+ CopyScanFields((const Scan *) from, (Scan *) newnode);
+
+ return newnode;
+}
+
+/*
* CopyJoinFields
*
* This function copies the fields of the Join node. It is used by
@@ -2006,6 +2022,7 @@ _copyRangeTblEntry(const RangeTblEntry *from)
COPY_SCALAR_FIELD(rtekind);
COPY_SCALAR_FIELD(relid);
COPY_SCALAR_FIELD(relkind);
+ COPY_NODE_FIELD(tablesample);
COPY_NODE_FIELD(subquery);
COPY_SCALAR_FIELD(security_barrier);
COPY_SCALAR_FIELD(jointype);
@@ -2138,6 +2155,37 @@ _copyCommonTableExpr(const CommonTableExpr *from)
return newnode;
}
+static RangeTableSample *
+_copyRangeTableSample(const RangeTableSample *from)
+{
+ RangeTableSample *newnode = makeNode(RangeTableSample);
+
+ COPY_NODE_FIELD(relation);
+ COPY_STRING_FIELD(method);
+ COPY_NODE_FIELD(repeatable);
+ COPY_NODE_FIELD(args);
+
+ return newnode;
+}
+
+static TableSampleClause *
+_copyTableSampleClause(const TableSampleClause *from)
+{
+ TableSampleClause *newnode = makeNode(TableSampleClause);
+
+ COPY_SCALAR_FIELD(tsmid);
+ COPY_SCALAR_FIELD(tsminit);
+ COPY_SCALAR_FIELD(tsmnextblock);
+ COPY_SCALAR_FIELD(tsmnexttuple);
+ COPY_SCALAR_FIELD(tsmend);
+ COPY_SCALAR_FIELD(tsmreset);
+ COPY_SCALAR_FIELD(tsmcost);
+ COPY_NODE_FIELD(repeatable);
+ COPY_NODE_FIELD(args);
+
+ return newnode;
+}
+
static A_Expr *
_copyAExpr(const A_Expr *from)
{
@@ -4075,6 +4123,9 @@ copyObject(const void *from)
case T_CustomScan:
retval = _copyCustomScan(from);
break;
+ case T_SampleScan:
+ retval = _copySampleScan(from);
+ break;
case T_Join:
retval = _copyJoin(from);
break;
@@ -4723,6 +4774,12 @@ copyObject(const void *from)
case T_CommonTableExpr:
retval = _copyCommonTableExpr(from);
break;
+ case T_RangeTableSample:
+ retval = _copyRangeTableSample(from);
+ break;
+ case T_TableSampleClause:
+ retval = _copyTableSampleClause(from);
+ break;
case T_PrivGrantee:
retval = _copyPrivGrantee(from);
break;
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 6e8b308..1e7ebbf 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -2323,6 +2323,7 @@ _equalRangeTblEntry(const RangeTblEntry *a, const RangeTblEntry *b)
COMPARE_SCALAR_FIELD(rtekind);
COMPARE_SCALAR_FIELD(relid);
COMPARE_SCALAR_FIELD(relkind);
+ COMPARE_NODE_FIELD(tablesample);
COMPARE_NODE_FIELD(subquery);
COMPARE_SCALAR_FIELD(security_barrier);
COMPARE_SCALAR_FIELD(jointype);
@@ -2442,6 +2443,33 @@ _equalCommonTableExpr(const CommonTableExpr *a, const CommonTableExpr *b)
}
static bool
+_equalRangeTableSample(const RangeTableSample *a, const RangeTableSample *b)
+{
+ COMPARE_NODE_FIELD(relation);
+ COMPARE_STRING_FIELD(method);
+ COMPARE_NODE_FIELD(repeatable);
+ COMPARE_NODE_FIELD(args);
+
+ return true;
+}
+
+static bool
+_equalTableSampleClause(const TableSampleClause *a, const TableSampleClause *b)
+{
+ COMPARE_SCALAR_FIELD(tsmid);
+ COMPARE_SCALAR_FIELD(tsminit);
+ COMPARE_SCALAR_FIELD(tsmnextblock);
+ COMPARE_SCALAR_FIELD(tsmnexttuple);
+ COMPARE_SCALAR_FIELD(tsmend);
+ COMPARE_SCALAR_FIELD(tsmreset);
+ COMPARE_SCALAR_FIELD(tsmcost);
+ COMPARE_NODE_FIELD(repeatable);
+ COMPARE_NODE_FIELD(args);
+
+ return true;
+}
+
+static bool
_equalXmlSerialize(const XmlSerialize *a, const XmlSerialize *b)
{
COMPARE_SCALAR_FIELD(xmloption);
@@ -3150,6 +3178,12 @@ equal(const void *a, const void *b)
case T_CommonTableExpr:
retval = _equalCommonTableExpr(a, b);
break;
+ case T_RangeTableSample:
+ retval = _equalRangeTableSample(a, b);
+ break;
+ case T_TableSampleClause:
+ retval = _equalTableSampleClause(a, b);
+ break;
case T_PrivGrantee:
retval = _equalPrivGrantee(a, b);
break;
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index 21dfda7..bd9ce09 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -3209,6 +3209,18 @@ raw_expression_tree_walker(Node *node,
return walker(((WithClause *) node)->ctes, context);
case T_CommonTableExpr:
return walker(((CommonTableExpr *) node)->ctequery, context);
+ case T_RangeTableSample:
+ {
+ RangeTableSample *rts = (RangeTableSample *) node;
+
+ if (walker(rts->relation, context))
+ return true;
+ if (walker(rts->repeatable, context))
+ return true;
+ if (walker(rts->args, context))
+ return true;
+ }
+ break;
default:
elog(ERROR, "unrecognized node type: %d",
(int) nodeTag(node));
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index dd1278b..684cd7e 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -578,6 +578,14 @@ _outCustomScan(StringInfo str, const CustomScan *node)
}
static void
+_outSampleScan(StringInfo str, const SampleScan *node)
+{
+ WRITE_NODE_TYPE("SAMPLESCAN");
+
+ _outScanInfo(str, (const Scan *) node);
+}
+
+static void
_outJoin(StringInfo str, const Join *node)
{
WRITE_NODE_TYPE("JOIN");
@@ -1589,6 +1597,17 @@ _outTidPath(StringInfo str, const TidPath *node)
}
static void
+_outSamplePath(StringInfo str, const SamplePath *node)
+{
+ WRITE_NODE_TYPE("SAMPLEPATH");
+
+ _outPathInfo(str, (const Path *) node);
+
+ WRITE_OID_FIELD(tsmcost);
+ WRITE_NODE_FIELD(tsmargs);
+}
+
+static void
_outForeignPath(StringInfo str, const ForeignPath *node)
{
WRITE_NODE_TYPE("FOREIGNPATH");
@@ -2391,6 +2410,33 @@ _outCommonTableExpr(StringInfo str, const CommonTableExpr *node)
}
static void
+_outRangeTableSample(StringInfo str, const RangeTableSample *node)
+{
+ WRITE_NODE_TYPE("RANGETABLESAMPLE");
+
+ WRITE_NODE_FIELD(relation);
+ WRITE_STRING_FIELD(method);
+ WRITE_NODE_FIELD(repeatable);
+ WRITE_NODE_FIELD(args);
+}
+
+static void
+_outTableSampleClause(StringInfo str, const TableSampleClause *node)
+{
+ WRITE_NODE_TYPE("TABLESAMPLECLAUSE");
+
+ WRITE_OID_FIELD(tsmid);
+ WRITE_OID_FIELD(tsminit);
+ WRITE_OID_FIELD(tsmnextblock);
+ WRITE_OID_FIELD(tsmnexttuple);
+ WRITE_OID_FIELD(tsmend);
+ WRITE_OID_FIELD(tsmreset);
+ WRITE_OID_FIELD(tsmcost);
+ WRITE_NODE_FIELD(repeatable);
+ WRITE_NODE_FIELD(args);
+}
+
+static void
_outSetOperationStmt(StringInfo str, const SetOperationStmt *node)
{
WRITE_NODE_TYPE("SETOPERATIONSTMT");
@@ -2420,6 +2466,7 @@ _outRangeTblEntry(StringInfo str, const RangeTblEntry *node)
case RTE_RELATION:
WRITE_OID_FIELD(relid);
WRITE_CHAR_FIELD(relkind);
+ WRITE_NODE_FIELD(tablesample);
break;
case RTE_SUBQUERY:
WRITE_NODE_FIELD(subquery);
@@ -2887,6 +2934,9 @@ _outNode(StringInfo str, const void *obj)
case T_CustomScan:
_outCustomScan(str, obj);
break;
+ case T_SampleScan:
+ _outSampleScan(str, obj);
+ break;
case T_Join:
_outJoin(str, obj);
break;
@@ -3092,6 +3142,8 @@ _outNode(StringInfo str, const void *obj)
case T_TidPath:
_outTidPath(str, obj);
break;
+ case T_SamplePath:
+ _outSamplePath(str, obj);
case T_ForeignPath:
_outForeignPath(str, obj);
break;
@@ -3228,6 +3280,12 @@ _outNode(StringInfo str, const void *obj)
case T_CommonTableExpr:
_outCommonTableExpr(str, obj);
break;
+ case T_RangeTableSample:
+ _outRangeTableSample(str, obj);
+ break;
+ case T_TableSampleClause:
+ _outTableSampleClause(str, obj);
+ break;
case T_SetOperationStmt:
_outSetOperationStmt(str, obj);
break;
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index ae24d05..fe08107 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -350,6 +350,43 @@ _readCommonTableExpr(void)
}
/*
+ * _readRangeTableSample
+ */
+static RangeTableSample *
+_readRangeTableSample(void)
+{
+ READ_LOCALS(RangeTableSample);
+
+ READ_NODE_FIELD(relation);
+ READ_STRING_FIELD(method);
+ READ_NODE_FIELD(repeatable);
+ READ_NODE_FIELD(args);
+
+ READ_DONE();
+}
+
+/*
+ * _readTableSampleClause
+ */
+static TableSampleClause *
+_readTableSampleClause(void)
+{
+ READ_LOCALS(TableSampleClause);
+
+ READ_OID_FIELD(tsmid);
+ READ_OID_FIELD(tsminit);
+ READ_OID_FIELD(tsmnextblock);
+ READ_OID_FIELD(tsmnexttuple);
+ READ_OID_FIELD(tsmend);
+ READ_OID_FIELD(tsmreset);
+ READ_OID_FIELD(tsmcost);
+ READ_NODE_FIELD(repeatable);
+ READ_NODE_FIELD(args);
+
+ READ_DONE();
+}
+
+/*
* _readSetOperationStmt
*/
static SetOperationStmt *
@@ -1216,6 +1253,7 @@ _readRangeTblEntry(void)
case RTE_RELATION:
READ_OID_FIELD(relid);
READ_CHAR_FIELD(relkind);
+ READ_NODE_FIELD(tablesample);
break;
case RTE_SUBQUERY:
READ_NODE_FIELD(subquery);
@@ -1311,6 +1349,10 @@ parseNodeString(void)
return_value = _readRowMarkClause();
else if (MATCH("COMMONTABLEEXPR", 15))
return_value = _readCommonTableExpr();
+ else if (MATCH("RANGETABLESAMPLE", 16))
+ return_value = _readRangeTableSample();
+ else if (MATCH("TABLESAMPLECLAUSE", 17))
+ return_value = _readTableSampleClause();
else if (MATCH("SETOPERATIONSTMT", 16))
return_value = _readSetOperationStmt();
else if (MATCH("ALIAS", 5))
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 58d78e6..c18973c 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -71,6 +71,8 @@ static void set_plain_rel_size(PlannerInfo *root, RelOptInfo *rel,
RangeTblEntry *rte);
static void set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
RangeTblEntry *rte);
+static void set_tablesample_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
+ RangeTblEntry *rte);
static void set_foreign_size(PlannerInfo *root, RelOptInfo *rel,
RangeTblEntry *rte);
static void set_foreign_pathlist(PlannerInfo *root, RelOptInfo *rel,
@@ -332,6 +334,11 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
/* Foreign table */
set_foreign_pathlist(root, rel, rte);
}
+ else if (rte->tablesample != NULL)
+ {
+ /* Build sample scan on relation */
+ set_tablesample_rel_pathlist(root, rel, rte);
+ }
else
{
/* Plain relation */
@@ -418,6 +425,34 @@ set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
}
/*
+ * set_tablesample_rel_pathlist
+ * Build access paths for a sampled relation
+ *
+ * There is only one possible path - sampling scan
+ */
+static void
+set_tablesample_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
+{
+ Relids required_outer;
+
+ /*
+ * We don't support pushing join clauses into the quals of a seqscan, but
+ * it could still have required parameterization due to LATERAL refs in
+ * its tlist.
+ */
+ required_outer = rel->lateral_relids;
+
+ /* We only do sample scan if it was requested */
+ add_path(rel, (Path *) create_samplescan_path(root, rel, required_outer));
+
+ /*
+ * There is only one plan to consider but we still need to set
+ * parameters for RelOptInfo.
+ */
+ set_cheapest(rel);
+}
+
+/*
* set_foreign_size
* Set size estimates for a foreign table RTE
*/
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 020558b..e4025a8 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -90,6 +90,7 @@
#include "utils/lsyscache.h"
#include "utils/selfuncs.h"
#include "utils/spccache.h"
+#include "utils/tablesample.h"
#include "utils/tuplesort.h"
@@ -219,6 +220,72 @@ cost_seqscan(Path *path, PlannerInfo *root,
}
/*
+ * cost_samplescan
+ * Determines and returns the cost of scanning a relation using sampling.
+ *
+ * From planner/optimizer perspective, we dont't care all that much about cost
+ * itself since there is always only one scan path to consider when sampling
+ * scan is present, but number of rows estimation is still important.
+ *
+ * 'baserel' is the relation to be scanned
+ * 'param_info' is the ParamPathInfo if this is a parameterized path, else NULL
+ */
+void
+cost_samplescan(SamplePath *path, PlannerInfo *root, RelOptInfo *baserel)
+{
+ Cost startup_cost = 0;
+ Cost run_cost = 0;
+ double spc_seq_page_cost,
+ spc_random_page_cost,
+ spc_page_cost;
+ QualCost qpqual_cost;
+ Cost cpu_per_tuple;
+ BlockNumber pages;
+ double tuples;
+ SamplerAccessStrategy strategy;
+
+ /* Should only be applied to base relations */
+ Assert(baserel->relid > 0);
+ Assert(baserel->rtekind == RTE_RELATION);
+
+ /* Call the sampling method's costing function. */
+ OidFunctionCall6(path->tsmcost, PointerGetDatum(root),
+ PointerGetDatum(path), PointerGetDatum(baserel),
+ PointerGetDatum(&pages), PointerGetDatum(&tuples),
+ PointerGetDatum(&strategy));
+
+ /* Mark the path with the correct row estimate */
+ if (path->path.param_info)
+ path->path.rows = path->path.param_info->ppi_rows;
+ else
+ path->path.rows = tuples;
+
+ /* fetch estimated page cost for tablespace containing table */
+ get_tablespace_page_costs(baserel->reltablespace,
+ &spc_random_page_cost,
+ &spc_seq_page_cost);
+
+
+ spc_page_cost = strategy == SAS_RANDOM ?
+ spc_random_page_cost : spc_seq_page_cost;
+
+ /*
+ * disk costs
+ */
+ run_cost += spc_page_cost * pages;
+
+ /* CPU costs */
+ get_restriction_qual_cost(root, baserel, path->path.param_info, &qpqual_cost);
+
+ startup_cost += qpqual_cost.startup;
+ cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple;
+ run_cost += cpu_per_tuple * tuples;
+
+ path->path.startup_cost = startup_cost;
+ path->path.total_cost = startup_cost + run_cost;
+}
+
+/*
* cost_index
* Determines and returns the cost of scanning a relation using an index.
*
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 655be81..10a5e02 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -58,6 +58,8 @@ static Material *create_material_plan(PlannerInfo *root, MaterialPath *best_path
static Plan *create_unique_plan(PlannerInfo *root, UniquePath *best_path);
static SeqScan *create_seqscan_plan(PlannerInfo *root, Path *best_path,
List *tlist, List *scan_clauses);
+static SampleScan *create_samplescan_plan(PlannerInfo *root, Path *best_path,
+ List *tlist, List *scan_clauses);
static Scan *create_indexscan_plan(PlannerInfo *root, IndexPath *best_path,
List *tlist, List *scan_clauses, bool indexonly);
static BitmapHeapScan *create_bitmap_scan_plan(PlannerInfo *root,
@@ -100,6 +102,7 @@ static List *order_qual_clauses(PlannerInfo *root, List *clauses);
static void copy_path_costsize(Plan *dest, Path *src);
static void copy_plan_costsize(Plan *dest, Plan *src);
static SeqScan *make_seqscan(List *qptlist, List *qpqual, Index scanrelid);
+static SampleScan *make_samplescan(List *qptlist, List *qpqual, Index scanrelid);
static IndexScan *make_indexscan(List *qptlist, List *qpqual, Index scanrelid,
Oid indexid, List *indexqual, List *indexqualorig,
List *indexorderby, List *indexorderbyorig,
@@ -228,6 +231,7 @@ create_plan_recurse(PlannerInfo *root, Path *best_path)
switch (best_path->pathtype)
{
case T_SeqScan:
+ case T_SampleScan:
case T_IndexScan:
case T_IndexOnlyScan:
case T_BitmapHeapScan:
@@ -343,6 +347,13 @@ create_scan_plan(PlannerInfo *root, Path *best_path)
scan_clauses);
break;
+ case T_SampleScan:
+ plan = (Plan *) create_samplescan_plan(root,
+ best_path,
+ tlist,
+ scan_clauses);
+ break;
+
case T_IndexScan:
plan = (Plan *) create_indexscan_plan(root,
(IndexPath *) best_path,
@@ -546,6 +557,7 @@ disuse_physical_tlist(PlannerInfo *root, Plan *plan, Path *path)
switch (path->pathtype)
{
case T_SeqScan:
+ case T_SampleScan:
case T_IndexScan:
case T_IndexOnlyScan:
case T_BitmapHeapScan:
@@ -1133,6 +1145,45 @@ create_seqscan_plan(PlannerInfo *root, Path *best_path,
}
/*
+ * create_samplescan_plan
+ * Returns a samplecan plan for the base relation scanned by 'best_path'
+ * with restriction clauses 'scan_clauses' and targetlist 'tlist'.
+ */
+static SampleScan *
+create_samplescan_plan(PlannerInfo *root, Path *best_path,
+ List *tlist, List *scan_clauses)
+{
+ SampleScan *scan_plan;
+ Index scan_relid = best_path->parent->relid;
+
+ /* it should be a base rel with tablesample clause... */
+ Assert(scan_relid > 0);
+ Assert(best_path->parent->rtekind == RTE_RELATION);
+ Assert(best_path->pathtype == T_SampleScan);
+
+ /* Sort clauses into best execution order */
+ scan_clauses = order_qual_clauses(root, scan_clauses);
+
+ /* Reduce RestrictInfo list to bare expressions; ignore pseudoconstants */
+ scan_clauses = extract_actual_clauses(scan_clauses, false);
+
+ /* Replace any outer-relation variables with nestloop params */
+ if (best_path->param_info)
+ {
+ scan_clauses = (List *)
+ replace_nestloop_params(root, (Node *) scan_clauses);
+ }
+
+ scan_plan = make_samplescan(tlist,
+ scan_clauses,
+ scan_relid);
+
+ copy_path_costsize(&scan_plan->plan, best_path);
+
+ return scan_plan;
+}
+
+/*
* create_indexscan_plan
* Returns an indexscan plan for the base relation scanned by 'best_path'
* with restriction clauses 'scan_clauses' and targetlist 'tlist'.
@@ -3318,6 +3369,24 @@ make_seqscan(List *qptlist,
return node;
}
+static SampleScan *
+make_samplescan(List *qptlist,
+ List *qpqual,
+ Index scanrelid)
+{
+ SampleScan *node = makeNode(SampleScan);
+ Plan *plan = &node->plan;
+
+ /* cost should be inserted by caller */
+ plan->targetlist = qptlist;
+ plan->qual = qpqual;
+ plan->lefttree = NULL;
+ plan->righttree = NULL;
+ node->scanrelid = scanrelid;
+
+ return node;
+}
+
static IndexScan *
make_indexscan(List *qptlist,
List *qpqual,
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 7703946..de33fc6 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -446,6 +446,17 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
fix_scan_list(root, splan->plan.qual, rtoffset);
}
break;
+ case T_SampleScan:
+ {
+ SampleScan *splan = (SampleScan *) plan;
+
+ splan->scanrelid += rtoffset;
+ splan->plan.targetlist =
+ fix_scan_list(root, splan->plan.targetlist, rtoffset);
+ splan->plan.qual =
+ fix_scan_list(root, splan->plan.qual, rtoffset);
+ }
+ break;
case T_IndexScan:
{
IndexScan *splan = (IndexScan *) plan;
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index 78fb6b1..191624c 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -2163,6 +2163,7 @@ finalize_plan(PlannerInfo *root, Plan *plan, Bitmapset *valid_params,
break;
case T_SeqScan:
+ case T_SampleScan:
context.paramids = bms_add_members(context.paramids, scan_params);
break;
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 1395a21..6206c60 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -706,6 +706,33 @@ create_seqscan_path(PlannerInfo *root, RelOptInfo *rel, Relids required_outer)
}
/*
+ * create_samplescan_path
+ * Like seqscan but uses sampling function while scanning.
+ */
+SamplePath *
+create_samplescan_path(PlannerInfo *root, RelOptInfo *rel, Relids required_outer)
+{
+ SamplePath *pathnode = makeNode(SamplePath);
+ RangeTblEntry *rte = planner_rt_fetch(rel->relid, root);
+ TableSampleClause *tablesample = rte->tablesample;
+
+ Assert(tablesample);
+
+ pathnode->path.pathtype = T_SampleScan;
+ pathnode->path.parent = rel;
+ pathnode->path.param_info = get_baserel_parampathinfo(root, rel,
+ required_outer);
+ pathnode->path.pathkeys = NIL; /* samplescan has unordered result */
+
+ pathnode->tsmcost = tablesample->tsmcost;
+ pathnode->tsmargs = tablesample->args;
+
+ cost_samplescan(pathnode, root, rel);
+
+ return pathnode;
+}
+
+/*
* create_index_path
* Creates a path node for an index scan.
*
@@ -1921,6 +1948,8 @@ reparameterize_path(PlannerInfo *root, Path *path,
case T_SubqueryScan:
return create_subqueryscan_path(root, rel, path->pathkeys,
required_outer);
+ case T_SampleScan:
+ return (Path *) create_samplescan_path(root, rel, required_outer);
default:
break;
}
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 679e1bb..01d72d4 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -447,6 +447,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
%type <range> relation_expr
%type <range> relation_expr_opt_alias
%type <target> target_el single_set_clause set_target insert_column_item
+%type <node> relation_expr_tablesample tablesample_clause opt_repeatable_clause
%type <str> generic_option_name
%type <node> generic_option_arg
@@ -611,8 +612,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
STATEMENT STATISTICS STDIN STDOUT STORAGE STRICT_P STRIP_P SUBSTRING
SYMMETRIC SYSID SYSTEM_P
- TABLE TABLES TABLESPACE TEMP TEMPLATE TEMPORARY TEXT_P THEN TIME TIMESTAMP
- TO TRAILING TRANSACTION TREAT TRIGGER TRIM TRUE_P
+ TABLE TABLES TABLESAMPLE TABLESPACE TEMP TEMPLATE TEMPORARY TEXT_P THEN
+ TIME TIMESTAMP TO TRAILING TRANSACTION TREAT TRIGGER TRIM TRUE_P
TRUNCATE TRUSTED TYPE_P TYPES_P
UNBOUNDED UNCOMMITTED UNENCRYPTED UNION UNIQUE UNKNOWN UNLISTEN UNLOGGED
@@ -10227,6 +10228,12 @@ table_ref: relation_expr opt_alias_clause
$1->alias = $2;
$$ = (Node *) $1;
}
+ | relation_expr_tablesample opt_alias_clause
+ {
+ RangeTableSample *n = (RangeTableSample *) $1;
+ n->relation->alias = $2;
+ $$ = (Node *) n;
+ }
| func_table func_alias_clause
{
RangeFunction *n = (RangeFunction *) $1;
@@ -10522,7 +10529,6 @@ relation_expr_list:
| relation_expr_list ',' relation_expr { $$ = lappend($1, $3); }
;
-
/*
* Given "UPDATE foo set set ...", we have to decide without looking any
* further ahead whether the first "set" is an alias or the UPDATE's SET
@@ -10552,6 +10558,31 @@ relation_expr_opt_alias: relation_expr %prec UMINUS
}
;
+
+relation_expr_tablesample: relation_expr tablesample_clause
+ {
+ RangeTableSample *n = (RangeTableSample *) $2;
+ n->relation = $1;
+ $$ = (Node *) n;
+ }
+ ;
+
+tablesample_clause:
+ TABLESAMPLE ColId '(' func_arg_list ')' opt_repeatable_clause
+ {
+ RangeTableSample *n = makeNode(RangeTableSample);
+ n->method = $2;
+ n->args = $4;
+ n->repeatable = $6;
+ $$ = (Node *) n;
+ }
+ ;
+
+opt_repeatable_clause:
+ REPEATABLE '(' AexprConst ')' { $$ = (Node *) $3; }
+ | /*EMPTY*/ { $$ = NULL; }
+ ;
+
/*
* func_table represents a function invocation in a FROM list. It can be
* a plain function call, like "foo(...)", or a ROWS FROM expression with
@@ -13334,7 +13365,6 @@ unreserved_keyword:
| RELATIVE_P
| RELEASE
| RENAME
- | REPEATABLE
| REPLACE
| REPLICA
| RESET
@@ -13509,6 +13539,7 @@ type_func_name_keyword:
| OVERLAPS
| RIGHT
| SIMILAR
+ | TABLESAMPLE
| VERBOSE
;
@@ -13577,6 +13608,7 @@ reserved_keyword:
| PLACING
| PRIMARY
| REFERENCES
+ | REPEATABLE
| RETURNING
| SELECT
| SESSION_USER
diff --git a/src/backend/parser/parse_clause.c b/src/backend/parser/parse_clause.c
index 654dce6..03632d2 100644
--- a/src/backend/parser/parse_clause.c
+++ b/src/backend/parser/parse_clause.c
@@ -16,6 +16,7 @@
#include "postgres.h"
#include "access/heapam.h"
+#include "access/htup_details.h"
#include "catalog/heap.h"
#include "catalog/pg_type.h"
#include "commands/defrem.h"
@@ -29,6 +30,7 @@
#include "parser/parse_coerce.h"
#include "parser/parse_collate.h"
#include "parser/parse_expr.h"
+#include "parser/parse_func.h"
#include "parser/parse_oper.h"
#include "parser/parse_relation.h"
#include "parser/parse_target.h"
@@ -36,6 +38,7 @@
#include "utils/guc.h"
#include "utils/lsyscache.h"
#include "utils/rel.h"
+#include "utils/syscache.h"
/* Convenience macro for the most common makeNamespaceItem() case */
@@ -413,6 +416,19 @@ transformJoinOnClause(ParseState *pstate, JoinExpr *j, List *namespace)
return result;
}
+static RangeTblEntry *
+transformTableSampleEntry(ParseState *pstate, RangeTableSample *r)
+{
+ RangeTblEntry *rte;
+ TableSampleClause *tablesample = NULL;
+
+ rte = transformTableEntry(pstate, r->relation);
+ tablesample = ParseTableSample(pstate, r->method, r->repeatable, r->args);
+ rte->tablesample = tablesample;
+
+ return rte;
+}
+
/*
* transformTableEntry --- transform a RangeVar (simple relation reference)
*/
@@ -421,7 +437,7 @@ transformTableEntry(ParseState *pstate, RangeVar *r)
{
RangeTblEntry *rte;
- /* We need only build a range table entry */
+ /* We first need to build a range table entry */
rte = addRangeTableEntry(pstate, r, r->alias,
interpretInhOption(r->inhOpt), true);
@@ -1121,6 +1137,26 @@ transformFromClauseItem(ParseState *pstate, Node *n,
return (Node *) j;
}
+ else if (IsA(n, RangeTableSample))
+ {
+ /* Tablesample reference */
+ RangeTableSample *rv = (RangeTableSample *) n;
+ RangeTblRef *rtr;
+ RangeTblEntry *rte = NULL;
+ int rtindex;
+
+ rte = transformTableSampleEntry(pstate, rv);
+
+ /* assume new rte is at end */
+ rtindex = list_length(pstate->p_rtable);
+ Assert(rte == rt_fetch(rtindex, pstate->p_rtable));
+ *top_rte = rte;
+ *top_rti = rtindex;
+ *namespace = list_make1(makeDefaultNSItem(rte));
+ rtr = makeNode(RangeTblRef);
+ rtr->rtindex = rtindex;
+ return (Node *) rtr;
+ }
else
elog(ERROR, "unrecognized node type: %d", (int) nodeTag(n));
return NULL; /* can't get here, keep compiler quiet */
diff --git a/src/backend/parser/parse_func.c b/src/backend/parser/parse_func.c
index a200804..690d0fa 100644
--- a/src/backend/parser/parse_func.c
+++ b/src/backend/parser/parse_func.c
@@ -18,6 +18,7 @@
#include "catalog/pg_aggregate.h"
#include "catalog/pg_proc.h"
#include "catalog/pg_type.h"
+#include "catalog/pg_tablesample_method.h"
#include "funcapi.h"
#include "lib/stringinfo.h"
#include "nodes/makefuncs.h"
@@ -26,6 +27,7 @@
#include "parser/parse_clause.h"
#include "parser/parse_coerce.h"
#include "parser/parse_func.h"
+#include "parser/parse_expr.h"
#include "parser/parse_relation.h"
#include "parser/parse_target.h"
#include "parser/parse_type.h"
@@ -760,6 +762,132 @@ ParseFuncOrColumn(ParseState *pstate, List *funcname, List *fargs,
}
+/*
+ * ParseTableSample
+ *
+ * Parse TABLESAMPLE clause and process the arguments
+ */
+extern TableSampleClause *
+ParseTableSample(ParseState *pstate, char *samplemethod, Node *repeatable,
+ List *sampleargs)
+{
+ HeapTuple tuple;
+ Form_pg_tablesample_method tsm;
+ Form_pg_proc procform;
+ TableSampleClause *tablesample;
+ List *fargs;
+ ListCell *larg;
+ int nargs, initnargs;
+ Oid actual_arg_types[FUNC_MAX_ARGS];
+ Oid init_arg_types[FUNC_MAX_ARGS];
+
+ /* Load the tablesample method */
+ tuple = SearchSysCache1(TABLESAMPLEMETHODNAME, PointerGetDatum(samplemethod));
+ if (!HeapTupleIsValid(tuple))
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_OBJECT),
+ errmsg("tablesample method \"%s\" does not exist",
+ samplemethod)));
+
+ tablesample = makeNode(TableSampleClause);
+ tablesample->tsmid = HeapTupleGetOid(tuple);
+
+ tsm = (Form_pg_tablesample_method) GETSTRUCT(tuple);
+
+ tablesample->tsminit = tsm->tsminit;
+ tablesample->tsmnextblock = tsm->tsmnextblock;
+ tablesample->tsmnexttuple = tsm->tsmnexttuple;
+ tablesample->tsmend = tsm->tsmend;
+ tablesample->tsmreset = tsm->tsmreset;
+ tablesample->tsmcost = tsm->tsmcost;
+
+ ReleaseSysCache(tuple);
+
+ /* Validate the parameters against init function definition. */
+ tuple = SearchSysCache1(PROCOID,
+ ObjectIdGetDatum(tablesample->tsminit));
+
+ if (!HeapTupleIsValid(tuple)) /* should not happen */
+ elog(ERROR, "cache lookup failed for function %u",
+ tablesample->tsminit);
+
+ procform = (Form_pg_proc) GETSTRUCT(tuple);
+ initnargs = procform->pronargs;
+ Assert(initnargs >= 3);
+
+ /*
+ * First parameter is used to pass the SampleScanState, second is
+ * seed (REPEATABLE), skip the processing for them here, just assert
+ * that the types are correct.
+ */
+ Assert(procform->proargtypes.values[0] == INTERNALOID);
+ Assert(procform->proargtypes.values[1] == INT4OID);
+ initnargs -= 2;
+ memcpy(init_arg_types, procform->proargtypes.values + 2,
+ initnargs * sizeof(Oid));
+
+ /* Now we are done with the catalog */
+ ReleaseSysCache(tuple);
+
+ /* Process repeatable (seed) */
+ if (repeatable != NULL)
+ {
+ Node *arg = repeatable;
+
+ if (arg && IsA(arg, A_Const))
+ {
+ A_Const *con = (A_Const *) arg;
+
+ if (con->val.type == T_Null)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("REPEATABLE clause must be NOT NULL numeric value"),
+ parser_errposition(pstate, con->location)));
+
+ }
+
+ arg = transformExpr(pstate, arg, EXPR_KIND_FROM_FUNCTION);
+ arg = coerce_to_specific_type(pstate, arg, INT4OID, "REPEATABLE");
+ tablesample->repeatable = arg;
+ }
+ else
+ tablesample->repeatable = NULL;
+
+ /* Transform the rest of arguments ... */
+ fargs = NIL;
+ nargs = 0;
+ foreach(larg, sampleargs)
+ {
+ Node *arg = transformExpr(pstate, (Node *) lfirst(larg), EXPR_KIND_FROM_FUNCTION);
+ Oid argtype = exprType(arg);
+
+ fargs = lappend(fargs, arg);
+
+ actual_arg_types[nargs++] = argtype;
+ }
+
+ /*
+ * Check if parameters are correct.
+ *
+ * XXX: can we do better at hinting here?
+ */
+ if (initnargs != nargs ||
+ !can_coerce_type(initnargs, actual_arg_types, init_arg_types,
+ COERCION_IMPLICIT))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("wrong parameters for tablesample method \"%s\"",
+ samplemethod)));
+
+ /* perform the necessary typecasting of arguments */
+ make_fn_arguments(pstate, fargs, actual_arg_types, init_arg_types);
+
+ /* Pass the arguments down */
+ tablesample->args = fargs;
+
+ return tablesample;
+}
+
/* func_match_argtypes()
*
* Given a list of candidate functions (having the right name and number
diff --git a/src/backend/utils/Makefile b/src/backend/utils/Makefile
index 8374533..9daa2ae 100644
--- a/src/backend/utils/Makefile
+++ b/src/backend/utils/Makefile
@@ -9,7 +9,8 @@ top_builddir = ../../..
include $(top_builddir)/src/Makefile.global
OBJS = fmgrtab.o
-SUBDIRS = adt cache error fmgr hash init mb misc mmgr resowner sort time
+SUBDIRS = adt cache error fmgr hash init mb misc mmgr resowner sort time \
+ tablesample
# location of Catalog.pm
catalogdir = $(top_srcdir)/src/backend/catalog
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index dd748ac..4f1c534 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -31,6 +31,7 @@
#include "catalog/pg_opclass.h"
#include "catalog/pg_operator.h"
#include "catalog/pg_proc.h"
+#include "catalog/pg_tablesample_method.h"
#include "catalog/pg_trigger.h"
#include "catalog/pg_type.h"
#include "commands/defrem.h"
@@ -343,6 +344,8 @@ static void make_ruledef(StringInfo buf, HeapTuple ruletup, TupleDesc rulettc,
int prettyFlags);
static void make_viewdef(StringInfo buf, HeapTuple ruletup, TupleDesc rulettc,
int prettyFlags, int wrapColumn);
+static void get_tablesample_def(TableSampleClause *tablesample,
+ deparse_context *context);
static void get_query_def(Query *query, StringInfo buf, List *parentnamespace,
TupleDesc resultDesc,
int prettyFlags, int wrapColumn, int startIndent);
@@ -4157,6 +4160,50 @@ make_viewdef(StringInfo buf, HeapTuple ruletup, TupleDesc rulettc,
heap_close(ev_relation, AccessShareLock);
}
+/* ----------
+ * get_tablesample_def - Convert TableSampleClause back to SQL
+ * ----------
+ */
+static void
+get_tablesample_def(TableSampleClause *tablesample, deparse_context *context)
+{
+ StringInfo buf = context->buf;
+ HeapTuple tuple;
+ Form_pg_tablesample_method tsm;
+ char *tsmname;
+ int nargs;
+ ListCell *l;
+
+ /* Load the tablesample method */
+ tuple = SearchSysCache1(TABLESAMPLEMETHODOID, ObjectIdGetDatum(tablesample->tsmid));
+ if (!HeapTupleIsValid(tuple))
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_OBJECT),
+ errmsg("cache lookup failed for tablesample method %u",
+ tablesample->tsmid)));
+
+ tsm = (Form_pg_tablesample_method) GETSTRUCT(tuple);
+ tsmname = NameStr(tsm->tsmname);
+ appendStringInfo(buf, " TABLESAMPLE %s (", quote_identifier(tsmname));
+
+ ReleaseSysCache(tuple);
+
+ nargs = 0;
+ foreach(l, tablesample->args)
+ {
+ if (nargs++ > 0)
+ appendStringInfoString(buf, ", ");
+ get_rule_expr((Node *) lfirst(l), context, true);
+ }
+ appendStringInfoChar(buf, ')');
+
+ if (tablesample->repeatable != NULL)
+ {
+ appendStringInfoString(buf, " REPEATABLE (");
+ get_rule_expr(tablesample->repeatable, context, true);
+ appendStringInfoChar(buf, ')');
+ }
+}
/* ----------
* get_query_def - Parse back one query parsetree
@@ -8384,6 +8431,9 @@ get_from_clause_item(Node *jtnode, Query *query, deparse_context *context)
only_marker(rte),
generate_relation_name(rte->relid,
context->namespaces));
+
+ if (rte->tablesample)
+ get_tablesample_def(rte->tablesample, context);
break;
case RTE_SUBQUERY:
/* Subquery RTE */
diff --git a/src/backend/utils/cache/syscache.c b/src/backend/utils/cache/syscache.c
index bd27168..3a8f01e 100644
--- a/src/backend/utils/cache/syscache.c
+++ b/src/backend/utils/cache/syscache.c
@@ -55,6 +55,7 @@
#include "catalog/pg_shdescription.h"
#include "catalog/pg_shseclabel.h"
#include "catalog/pg_statistic.h"
+#include "catalog/pg_tablesample_method.h"
#include "catalog/pg_tablespace.h"
#include "catalog/pg_ts_config.h"
#include "catalog/pg_ts_config_map.h"
@@ -642,6 +643,28 @@ static const struct cachedesc cacheinfo[] = {
},
128
},
+ {TableSampleMethodRelationId, /* TABLESAMPLEMETHODNAME */
+ TableSampleMethodNameIndexId,
+ 1,
+ {
+ Anum_pg_tablesample_method_tsmname,
+ 0,
+ 0,
+ 0,
+ },
+ 2
+ },
+ {TableSampleMethodRelationId, /* TABLESAMPLEMETHODOID */
+ TableSampleMethodOidIndexId,
+ 1,
+ {
+ ObjectIdAttributeNumber,
+ 0,
+ 0,
+ 0,
+ },
+ 2
+ },
{TableSpaceRelationId, /* TABLESPACEOID */
TablespaceOidIndexId,
1,
diff --git a/src/backend/utils/misc/sampling.c b/src/backend/utils/misc/sampling.c
index 1eeabaf..f213c46 100644
--- a/src/backend/utils/misc/sampling.c
+++ b/src/backend/utils/misc/sampling.c
@@ -46,6 +46,8 @@ BlockSampler_Init(BlockSampler bs, BlockNumber nblocks, int samplesize,
bs->n = samplesize;
bs->t = 0; /* blocks scanned so far */
bs->m = 0; /* blocks selected so far */
+
+ sampler_random_init_state(randseed, bs->randstate);
}
bool
@@ -92,7 +94,7 @@ BlockSampler_Next(BlockSampler bs)
* less than k, which means that we cannot fail to select enough blocks.
*----------
*/
- V = sampler_random_fract();
+ V = sampler_random_fract(bs->randstate);
p = 1.0 - (double) k / (double) K;
while (V < p)
{
@@ -126,8 +128,14 @@ BlockSampler_Next(BlockSampler bs)
void
reservoir_init_selection_state(ReservoirState rs, int n)
{
+ /*
+ * Reservoir sampling is not used anywhere where it would need to return
+ * repeatable results so we can initialize it randomly.
+ */
+ sampler_random_init_state(random(), rs->randstate);
+
/* Initial value of W (for use when Algorithm Z is first applied) */
- *rs = exp(-log(sampler_random_fract()) / n);
+ rs->W = exp(-log(sampler_random_fract(rs->randstate)) / n);
}
double
@@ -142,7 +150,7 @@ reservoir_get_next_S(ReservoirState rs, double t, int n)
double V,
quot;
- V = sampler_random_fract(); /* Generate V */
+ V = sampler_random_fract(rs->randstate); /* Generate V */
S = 0;
t += 1;
/* Note: "num" in Vitter's code is always equal to t - n */
@@ -158,7 +166,7 @@ reservoir_get_next_S(ReservoirState rs, double t, int n)
else
{
/* Now apply Algorithm Z */
- double W = *rs;
+ double W = rs->W;
double term = t - (double) n + 1;
for (;;)
@@ -174,7 +182,7 @@ reservoir_get_next_S(ReservoirState rs, double t, int n)
tmp;
/* Generate U and X */
- U = sampler_random_fract();
+ U = sampler_random_fract(rs->randstate);
X = t * (W - 1.0);
S = floor(X); /* S is tentatively set to floor(X) */
/* Test if U <= h(S)/cg(X) in the manner of (6.3) */
@@ -203,11 +211,11 @@ reservoir_get_next_S(ReservoirState rs, double t, int n)
y *= numer / denom;
denom -= 1;
}
- W = exp(-log(sampler_random_fract()) / n); /* Generate W in advance */
+ W = exp(-log(sampler_random_fract(rs->randstate)) / n); /* Generate W in advance */
if (exp(log(y) / n) <= (t + X) / t)
break;
}
- *rs = W;
+ rs->W = W;
}
return S;
}
@@ -217,10 +225,17 @@ reservoir_get_next_S(ReservoirState rs, double t, int n)
* Random number generator used by sampling
*----------
*/
+void
+sampler_random_init_state(long seed, SamplerRandomState randstate)
+{
+ randstate[0] = 0x330e;
+ randstate[1] = (unsigned short) seed;
+ randstate[2] = (unsigned short) (seed >> 16);
+}
/* Select a random value R uniformly distributed in (0 - 1) */
double
-sampler_random_fract()
+sampler_random_fract(SamplerRandomState randstate)
{
- return ((double) random() + 1) / ((double) MAX_RANDOM_VALUE + 2);
+ return pg_erand48(randstate);
}
diff --git a/src/backend/utils/tablesample/Makefile b/src/backend/utils/tablesample/Makefile
new file mode 100644
index 0000000..df92939
--- /dev/null
+++ b/src/backend/utils/tablesample/Makefile
@@ -0,0 +1,17 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+# Makefile for utils/tablesample
+#
+# IDENTIFICATION
+# src/backend/utils/tablesample/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/backend/utils/tablesample
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS = system.o bernoulli.o
+
+include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/utils/tablesample/bernoulli.c b/src/backend/utils/tablesample/bernoulli.c
new file mode 100644
index 0000000..3720b87
--- /dev/null
+++ b/src/backend/utils/tablesample/bernoulli.c
@@ -0,0 +1,203 @@
+/*-------------------------------------------------------------------------
+ *
+ * bernoulli.c
+ * interface routines for BERNOULLI tablesample method
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/utils/tablesample/bernoulli.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "fmgr.h"
+
+#include "nodes/execnodes.h"
+#include "nodes/relation.h"
+#include "optimizer/clauses.h"
+#include "storage/bufmgr.h"
+#include "utils/sampling.h"
+#include "utils/tablesample.h"
+
+
+/* State */
+typedef struct
+{
+ uint32 seed; /* random seed */
+ BlockNumber tblocks; /* total blocks in relation */
+ BlockNumber blockno; /* current block */
+ float4 samplesize; /* percentage of tuples to return (0.0-1.0) */
+ OffsetNumber lt; /* last tuple returned from current block */
+ SamplerRandomState randstate; /* random generator state */
+} BernoulliSamplerData;
+
+/*
+ * Initialize the state.
+ */
+Datum
+tsm_bernoulli_init(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ uint32 seed = PG_GETARG_UINT32(1);
+ float4 percent = PG_ARGISNULL(2) ? -1 : PG_GETARG_FLOAT4(2);
+ Relation rel = scanstate->ss.ss_currentRelation;
+ BernoulliSamplerData *sampler;
+
+ if (percent < 0 || percent > 100)
+ ereport(ERROR,
+ (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+ errmsg("invalid sample size"),
+ errhint("Sample size must be numeric value between 0 and 100 (inclusive).")));
+
+ sampler = palloc0(sizeof(BernoulliSamplerData));
+
+ /* Remember initial values for reinit */
+ sampler->seed = seed;
+ sampler->tblocks = RelationGetNumberOfBlocks(rel);
+ sampler->blockno = InvalidBlockNumber;
+ sampler->samplesize = percent / 100;
+ sampler->lt = InvalidOffsetNumber;
+ sampler_random_init_state(sampler->seed, sampler->randstate);
+
+ scanstate->tsmdata = (void *) sampler;
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Get next block number to read or InvalidBlockNumber if we are at the
+ * end of the relation.
+ */
+Datum
+tsm_bernoulli_nextblock(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ BernoulliSamplerData *sampler =
+ (BernoulliSamplerData *) scanstate->tsmdata;
+
+ if (sampler->blockno == InvalidBlockNumber)
+ sampler->blockno = 0;
+ else if (++sampler->blockno >= sampler->tblocks)
+ PG_RETURN_UINT32(InvalidBlockNumber);
+
+ PG_RETURN_UINT32(sampler->blockno);
+}
+
+/*
+ * Get next tuple from current block.
+ *
+ * This method implements the main logic in bernoulli sampling.
+ * The algorithm simply generates new random number (in 0.0-1.0 range) and if
+ * it falls within user specified probability (in the same range) return the
+ * tuple offset.
+ *
+ * If we reach end of the block return InvalidOffsetNumber which tells
+ * SampleScan to go to next block.
+ */
+Datum
+tsm_bernoulli_nexttuple(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ OffsetNumber maxoffset = PG_GETARG_UINT16(2);
+ BernoulliSamplerData *sampler =
+ (BernoulliSamplerData *) scanstate->tsmdata;
+ OffsetNumber tupoffset = sampler->lt;
+ double samplesize = sampler->samplesize;
+
+ if (tupoffset == InvalidOffsetNumber)
+ tupoffset = FirstOffsetNumber;
+ else
+ tupoffset++;
+
+ /* Every tuple has percent chance of being returned */
+ while (sampler_random_fract(sampler->randstate) > samplesize)
+ {
+ tupoffset++;
+
+ if (tupoffset > maxoffset)
+ break;
+ }
+
+ if (tupoffset > maxoffset)
+ /* Tell SampleScan that we want next block. */
+ tupoffset = InvalidOffsetNumber;
+
+ sampler->lt = tupoffset;
+
+ PG_RETURN_UINT16(tupoffset);
+}
+
+/*
+ * Cleanup method.
+ */
+Datum
+tsm_bernoulli_end(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+
+ pfree(scanstate->tsmdata);
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Reset state (called by ReScan).
+ */
+Datum
+tsm_bernoulli_reset(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ BernoulliSamplerData *sampler =
+ (BernoulliSamplerData *) scanstate->tsmdata;
+
+ sampler->blockno = InvalidBlockNumber;
+ sampler->lt = InvalidOffsetNumber;
+ sampler_random_init_state(sampler->seed, sampler->randstate);
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Costing function.
+ */
+Datum
+tsm_bernoulli_cost(PG_FUNCTION_ARGS)
+{
+ PlannerInfo *root = (PlannerInfo *) PG_GETARG_POINTER(0);
+ SamplePath *path = (SamplePath *) PG_GETARG_POINTER(1);
+ RelOptInfo *baserel = (RelOptInfo *) PG_GETARG_POINTER(2);
+ BlockNumber *pages = (BlockNumber *) PG_GETARG_POINTER(3);
+ double *tuples = (double *) PG_GETARG_POINTER(4);
+ List *args = path->tsmargs;
+ Node *pctnode;
+ float4 samplesize;
+
+ SamplerAccessStrategy *strategy =
+ (SamplerAccessStrategy *) PG_GETARG_POINTER(5);
+
+ *strategy = SAS_SEQUENTIAL;
+ *pages = baserel->pages;
+
+ pctnode = linitial(args);
+ pctnode = estimate_expression_value(root, pctnode);
+
+ if (IsA(pctnode, RelabelType))
+ pctnode = (Node *) ((RelabelType *) pctnode)->arg;
+
+ if (!IsA(pctnode, Const))
+ {
+ *tuples = baserel->tuples * 0.1;
+ PG_RETURN_VOID();
+ }
+
+ samplesize = DatumGetFloat4(((Const *) pctnode)->constvalue);
+ samplesize /= 100.0;
+
+ *tuples = baserel->tuples * samplesize;
+
+ PG_RETURN_VOID();
+}
diff --git a/src/backend/utils/tablesample/system.c b/src/backend/utils/tablesample/system.c
new file mode 100644
index 0000000..1899e84
--- /dev/null
+++ b/src/backend/utils/tablesample/system.c
@@ -0,0 +1,188 @@
+/*-------------------------------------------------------------------------
+ *
+ * system.c
+ * interface routines for system tablesample method
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/utils/tablesample/system.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "fmgr.h"
+
+#include "nodes/execnodes.h"
+#include "nodes/relation.h"
+#include "optimizer/clauses.h"
+#include "storage/bufmgr.h"
+#include "utils/sampling.h"
+#include "utils/tablesample.h"
+
+
+/*
+ * State
+ */
+typedef struct
+{
+ BlockSamplerData bs;
+ uint32 seed; /* random seed */
+ BlockNumber tblocks; /* total blocks in relation */
+ int samplesize; /* number of blocks to return */
+ OffsetNumber lt; /* last tuple returned from current block */
+} SystemSamplerData;
+
+
+/*
+ * Initializes the state.
+ */
+Datum
+tsm_system_init(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ uint32 seed = PG_GETARG_UINT32(1);
+ float4 percent = PG_ARGISNULL(2) ? -1 : PG_GETARG_FLOAT4(2);
+ SystemSamplerData *sampler;
+
+ if (percent < 0 || percent > 100)
+ ereport(ERROR,
+ (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+ errmsg("invalid sample size"),
+ errhint("Sample size must be numeric value between 0 and 100 (inclusive).")));
+
+ sampler = palloc0(sizeof(SystemSamplerData));
+
+ /* Remember initial values for reinit */
+ sampler->seed = seed;
+ sampler->tblocks =
+ RelationGetNumberOfBlocks(scanstate->ss.ss_currentRelation);
+ sampler->samplesize = 1 + (int) (sampler->tblocks * (percent / 100.0));
+ sampler->lt = InvalidOffsetNumber;
+
+ BlockSampler_Init(&sampler->bs, sampler->tblocks, sampler->samplesize,
+ sampler->seed);
+
+ scanstate->tsmdata = (void *) sampler;
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Get next block number or InvalidBlockNumber when we're done.
+ *
+ * Uses the same logic as VACUUM for picking the random blocks.
+ */
+Datum
+tsm_system_nextblock(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ SystemSamplerData *sampler = (SystemSamplerData *) scanstate->tsmdata;
+ BlockNumber blockno;
+
+ if (!BlockSampler_HasMore(&sampler->bs))
+ PG_RETURN_UINT32(InvalidBlockNumber);
+
+ blockno = BlockSampler_Next(&sampler->bs);
+
+ PG_RETURN_UINT32(blockno);
+}
+
+/*
+ * Get next tuple offset in current block or InvalidOffsetNumber if we are done
+ * with this block.
+ */
+Datum
+tsm_system_nexttuple(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ OffsetNumber maxoffset = PG_GETARG_UINT16(2);
+ SystemSamplerData *sampler = (SystemSamplerData *) scanstate->tsmdata;
+ OffsetNumber tupoffset = sampler->lt;
+
+ if (tupoffset == InvalidOffsetNumber)
+ tupoffset = FirstOffsetNumber;
+ else
+ tupoffset++;
+
+ if (tupoffset > maxoffset)
+ tupoffset = InvalidOffsetNumber;
+
+ sampler->lt = tupoffset;
+
+ PG_RETURN_UINT16(tupoffset);
+}
+
+/*
+ * Cleanup method.
+ */
+Datum
+tsm_system_end(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+
+ pfree(scanstate->tsmdata);
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Reset state (called by ReScan).
+ */
+Datum
+tsm_system_reset(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ SystemSamplerData *sampler = (SystemSamplerData *) scanstate->tsmdata;
+
+ sampler->lt = InvalidOffsetNumber;
+ BlockSampler_Init(&sampler->bs, sampler->tblocks, sampler->samplesize,
+ sampler->seed);
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Costing function.
+ */
+Datum
+tsm_system_cost(PG_FUNCTION_ARGS)
+{
+ PlannerInfo *root = (PlannerInfo *) PG_GETARG_POINTER(0);
+ SamplePath *path = (SamplePath *) PG_GETARG_POINTER(1);
+ RelOptInfo *baserel = (RelOptInfo *) PG_GETARG_POINTER(2);
+ BlockNumber *pages = (BlockNumber *) PG_GETARG_POINTER(3);
+ double *tuples = (double *) PG_GETARG_POINTER(4);
+ List *args = path->tsmargs;
+ Node *pctnode;
+ float4 percent;
+
+ SamplerAccessStrategy *strategy =
+ (SamplerAccessStrategy *) PG_GETARG_POINTER(5);
+
+ *strategy = SAS_RANDOM;
+
+ pctnode = linitial(args);
+ pctnode = estimate_expression_value(root, pctnode);
+
+ if (IsA(pctnode, RelabelType))
+ pctnode = (Node *) ((RelabelType *) pctnode)->arg;
+
+ if (!IsA(pctnode, Const))
+ {
+ *pages = baserel->pages * 0.1;
+ *tuples = baserel->tuples * 0.1;
+ PG_RETURN_VOID();
+ }
+
+ percent = DatumGetFloat4(((Const *) pctnode)->constvalue);
+ percent /= 100.0;
+
+ *pages = baserel->pages * percent;
+ *tuples = baserel->tuples * percent;
+
+ PG_RETURN_VOID();
+}
diff --git a/src/include/catalog/indexing.h b/src/include/catalog/indexing.h
index a680229..c711cca 100644
--- a/src/include/catalog/indexing.h
+++ b/src/include/catalog/indexing.h
@@ -305,6 +305,11 @@ DECLARE_UNIQUE_INDEX(pg_policy_oid_index, 3257, on pg_policy using btree(oid oid
DECLARE_UNIQUE_INDEX(pg_policy_polrelid_polname_index, 3258, on pg_policy using btree(polrelid oid_ops, polname name_ops));
#define PolicyPolrelidPolnameIndexId 3258
+DECLARE_UNIQUE_INDEX(pg_tablesample_method_name_index, 3281, on pg_tablesample_method using btree(tsmname name_ops));
+#define TableSampleMethodNameIndexId 3281
+DECLARE_UNIQUE_INDEX(pg_tablesample_method_oid_index, 3282, on pg_tablesample_method using btree(oid oid_ops));
+#define TableSampleMethodOidIndexId 3282
+
/* last step of initialization script: build the indexes declared above */
BUILD_INDICES
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 9edfdb8..a0f97ac 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -5143,6 +5143,31 @@ DESCR("rank of hypothetical row without gaps");
DATA(insert OID = 3993 ( dense_rank_final PGNSP PGUID 12 1 0 2276 0 f f f f f f i 2 0 20 "2281 2276" "{2281,2276}" "{i,v}" _null_ _null_ hypothetical_dense_rank_final _null_ _null_ _null_ ));
DESCR("aggregate final function");
+DATA(insert OID = 3285 ( tsm_system_init PGNSP PGUID 12 1 0 0 0 f f f f t f v 3 0 2278 "2281 23 700" _null_ _null_ _null_ _null_ tsm_system_init _null_ _null_ _null_ ));
+DESCR("tsm_system_init(internal)");
+DATA(insert OID = 3286 ( tsm_system_nextblock PGNSP PGUID 12 1 0 0 0 f f f f t f v 2 0 23 "2281 16" _null_ _null_ _null_ _null_ tsm_system_nextblock _null_ _null_ _null_ ));
+DESCR("tsm_system_nextblock(internal)");
+DATA(insert OID = 3287 ( tsm_system_nexttuple PGNSP PGUID 12 1 0 0 0 f f f f t f v 4 0 21 "2281 23 21 16" _null_ _null_ _null_ _null_ tsm_system_nexttuple _null_ _null_ _null_ ));
+DESCR("tsm_system_nexttuple(internal)");
+DATA(insert OID = 3288 ( tsm_system_end PGNSP PGUID 12 1 0 0 0 f f f f t f v 1 0 2278 "2281" _null_ _null_ _null_ _null_ tsm_system_end _null_ _null_ _null_ ));
+DESCR("tsm_system_end(internal)");
+DATA(insert OID = 3289 ( tsm_system_reset PGNSP PGUID 12 1 0 0 0 f f f f t f v 1 0 2278 "2281" _null_ _null_ _null_ _null_ tsm_system_reset _null_ _null_ _null_ ));
+DESCR("tsm_system_reset(internal)");
+DATA(insert OID = 3290 ( tsm_system_cost PGNSP PGUID 12 1 0 0 0 f f f f t f v 6 0 2278 "2281 2281 2281 2281 2281 2281" _null_ _null_ _null_ _null_ tsm_system_cost _null_ _null_ _null_ ));
+DESCR("tsm_system_cost(internal)");
+
+DATA(insert OID = 3291 ( tsm_bernoulli_init PGNSP PGUID 12 1 0 0 0 f f f f t f v 3 0 2278 "2281 23 700" _null_ _null_ _null_ _null_ tsm_bernoulli_init _null_ _null_ _null_ ));
+DESCR("tsm_bernoulli_init(internal)");
+DATA(insert OID = 3292 ( tsm_bernoulli_nextblock PGNSP PGUID 12 1 0 0 0 f f f f t f v 2 0 23 "2281 16" _null_ _null_ _null_ _null_ tsm_bernoulli_nextblock _null_ _null_ _null_ ));
+DESCR("tsm_bernoulli_nextblock(internal)");
+DATA(insert OID = 3293 ( tsm_bernoulli_nexttuple PGNSP PGUID 12 1 0 0 0 f f f f t f v 4 0 21 "2281 23 21 16" _null_ _null_ _null_ _null_ tsm_bernoulli_nexttuple _null_ _null_ _null_ ));
+DESCR("tsm_bernoulli_nexttuple(internal)");
+DATA(insert OID = 3294 ( tsm_bernoulli_end PGNSP PGUID 12 1 0 0 0 f f f f t f v 1 0 2278 "2281" _null_ _null_ _null_ _null_ tsm_bernoulli_end _null_ _null_ _null_ ));
+DESCR("tsm_bernoulli_end(internal)");
+DATA(insert OID = 3296 ( tsm_bernoulli_reset PGNSP PGUID 12 1 0 0 0 f f f f t f v 1 0 2278 "2281" _null_ _null_ _null_ _null_ tsm_bernoulli_reset _null_ _null_ _null_ ));
+DESCR("tsm_bernoulli_reset(internal)");
+DATA(insert OID = 3297 ( tsm_bernoulli_cost PGNSP PGUID 12 1 0 0 0 f f f f t f v 6 0 2278 "2281 2281 2281 2281 2281 2281" _null_ _null_ _null_ _null_ tsm_bernoulli_cost _null_ _null_ _null_ ));
+DESCR("tsm_bernoulli_cost(internal)");
/*
* Symbolic values for provolatile column: these indicate whether the result
diff --git a/src/include/catalog/pg_tablesample_method.h b/src/include/catalog/pg_tablesample_method.h
new file mode 100644
index 0000000..0e4a716
--- /dev/null
+++ b/src/include/catalog/pg_tablesample_method.h
@@ -0,0 +1,70 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_tablesample_method.h
+ * definition of the table scan methods.
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/catalog/pg_tablesample_method.h
+ *
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PG_TABLESAMPLE_METHOD_H
+#define PG_TABLESAMPLE_METHOD_H
+
+#include "catalog/genbki.h"
+
+/* ----------------
+ * pg_tablesample_method definition. cpp turns this into
+ * typedef struct FormData_pg_tablesample_method
+ * ----------------
+ */
+#define TableSampleMethodRelationId 3280
+
+CATALOG(pg_tablesample_method,3280)
+{
+ NameData tsmname; /* tablescan method name */
+ regproc tsminit; /* init scan function */
+ regproc tsmnextblock; /* function returning next block to sample
+ or InvalidBlockOffset if finished */
+ regproc tsmnexttuple; /* function returning next tuple offset from current block
+ or InvalidOffsetNumber if end of the block was reacher */
+ regproc tsmend; /* end scan function*/
+ regproc tsmreset; /* reset state - used by rescan */
+ regproc tsmcost; /* costing function */
+} FormData_pg_tablesample_method;
+
+/* ----------------
+ * Form_pg_tablesample_method corresponds to a pointer to a tuple with
+ * the format of pg_tablesample_method relation.
+ * ----------------
+ */
+typedef FormData_pg_tablesample_method *Form_pg_tablesample_method;
+
+/* ----------------
+ * compiler constants for pg_tablesample_method
+ * ----------------
+ */
+#define Natts_pg_tablesample_method 7
+#define Anum_pg_tablesample_method_tsmname 1
+#define Anum_pg_tablesample_method_tsminit 2
+#define Anum_pg_tablesample_method_tsmnextblock 3
+#define Anum_pg_tablesample_method_tsmnexttuple 4
+#define Anum_pg_tablesample_method_tsmend 5
+#define Anum_pg_tablesample_method_tsmreset 6
+#define Anum_pg_tablesample_method_tsmcost 7
+
+/* ----------------
+ * initial contents of pg_tablesample_method
+ * ----------------
+ */
+
+DATA(insert OID = 3283 ( system tsm_system_init tsm_system_nextblock tsm_system_nexttuple tsm_system_end tsm_system_reset tsm_system_cost ));
+DESCR("SYSTEM table sampling method");
+DATA(insert OID = 3284 ( bernoulli tsm_bernoulli_init tsm_bernoulli_nextblock tsm_bernoulli_nexttuple tsm_bernoulli_end tsm_bernoulli_reset tsm_bernoulli_cost ));
+DESCR("BERNOULLI table sampling method");
+
+#endif /* PG_TABLESAMPLE_METHOD_H */
diff --git a/src/include/executor/nodeSamplescan.h b/src/include/executor/nodeSamplescan.h
new file mode 100644
index 0000000..4b769da
--- /dev/null
+++ b/src/include/executor/nodeSamplescan.h
@@ -0,0 +1,24 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeSamplescan.h
+ *
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/executor/nodeSamplescan.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef NODESAMPLESCAN_H
+#define NODESAMPLESCAN_H
+
+#include "nodes/execnodes.h"
+
+extern SampleScanState *ExecInitSampleScan(SampleScan *node, EState *estate, int eflags);
+extern TupleTableSlot *ExecSampleScan(SampleScanState *node);
+extern void ExecEndSampleScan(SampleScanState *node);
+extern void ExecReScanSampleScan(SampleScanState *node);
+
+#endif /* NODESAMPLESCAN_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 41288ed..43e1a30 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1212,6 +1212,26 @@ typedef struct ScanState
typedef ScanState SeqScanState;
/*
+ * SampleScan
+ */
+typedef struct SampleScanState
+{
+ ScanState ss;
+
+ /* Sampling method functions. */
+ FmgrInfo tsminit;
+ FmgrInfo tsmnextblock;
+ FmgrInfo tsmnexttuple;
+ FmgrInfo tsmend;
+ FmgrInfo tsmreset;
+
+ Buffer openbuffer; /* currently open buffer */
+ HeapTupleData tup; /* last tuple */
+
+ void *tsmdata; /* for use by table scan method */
+} SampleScanState;
+
+/*
* These structs store information about index quals that don't have simple
* constant right-hand sides. See comments for ExecIndexBuildScanKeys()
* for discussion.
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 97ef0fc..99ac985 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -61,6 +61,7 @@ typedef enum NodeTag
T_ValuesScan,
T_CteScan,
T_WorkTableScan,
+ T_SampleScan,
T_ForeignScan,
T_CustomScan,
T_Join,
@@ -97,6 +98,7 @@ typedef enum NodeTag
T_BitmapOrState,
T_ScanState,
T_SeqScanState,
+ T_SampleScanState,
T_IndexScanState,
T_IndexOnlyScanState,
T_BitmapIndexScanState,
@@ -225,6 +227,7 @@ typedef enum NodeTag
T_MergePath,
T_HashPath,
T_TidPath,
+ T_SamplePath,
T_ForeignPath,
T_CustomPath,
T_AppendPath,
@@ -413,6 +416,8 @@ typedef enum NodeTag
T_XmlSerialize,
T_WithClause,
T_CommonTableExpr,
+ T_RangeTableSample,
+ T_TableSampleClause,
/*
* TAGS FOR REPLICATION GRAMMAR PARSE NODES (replnodes.h)
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index b1dfa85..d87343f 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -307,6 +307,23 @@ typedef struct FuncCall
} FuncCall;
/*
+ * TableSampleClause - a sampling method information
+ */
+typedef struct TableSampleClause
+{
+ NodeTag type;
+ Oid tsmid;
+ Oid tsminit;
+ Oid tsmnextblock;
+ Oid tsmnexttuple;
+ Oid tsmend;
+ Oid tsmreset;
+ Oid tsmcost;
+ Node *repeatable;
+ List *args;
+} TableSampleClause;
+
+/*
* A_Star - '*' representing all columns of a table or compound field
*
* This can appear within ColumnRef.fields, A_Indirection.indirection, and
@@ -507,6 +524,21 @@ typedef struct RangeFunction
} RangeFunction;
/*
+ * RangeTableSample - represents <table> TABLESAMPLE <method> (<params>) REPEATABLE (<num>)
+ *
+ * We are more generic than SQL Standard so we pass generic function
+ * arguments to the sampling method.
+ */
+typedef struct RangeTableSample
+{
+ NodeTag type;
+ RangeVar *relation;
+ char *method; /* sampling method */
+ Node *repeatable;
+ List *args; /* arguments for sampling method */
+} RangeTableSample;
+
+/*
* ColumnDef - column definition (used in various creates)
*
* If the column has a default value, we may have the value expression
@@ -751,6 +783,7 @@ typedef struct RangeTblEntry
*/
Oid relid; /* OID of the relation */
char relkind; /* relation kind (see pg_class.relkind) */
+ TableSampleClause *tablesample; /* sampling method and parameters */
/*
* Fields valid for a subquery RTE (else NULL):
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 316c9ce..8a2a146 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -278,6 +278,12 @@ typedef struct Scan
typedef Scan SeqScan;
/* ----------------
+ * table sample scan node
+ * ----------------
+ */
+typedef Scan SampleScan;
+
+/* ----------------
* index scan node
*
* indexqualorig is an implicitly-ANDed list of index qual expressions, each
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 6845a40..67c3b1f 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -870,6 +870,18 @@ typedef struct TidPath
} TidPath;
/*
+ * SamplePath represents a sample sacn
+ *
+ * args is list of parameters for the the TABLESAMPLE clause
+ */
+typedef struct SamplePath
+{
+ Path path;
+ Oid tsmcost; /* table sample method costing function */
+ List *tsmargs; /* arguments to a TABLESAMPLE clause */
+} SamplePath;
+
+/*
* ForeignPath represents a potential scan of a foreign table
*
* fdw_private stores FDW private data about the scan. While fdw_private is
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 9c2000b..3777054 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -68,6 +68,7 @@ extern double index_pages_fetched(double tuples_fetched, BlockNumber pages,
double index_pages, PlannerInfo *root);
extern void cost_seqscan(Path *path, PlannerInfo *root, RelOptInfo *baserel,
ParamPathInfo *param_info);
+extern void cost_samplescan(SamplePath *path, PlannerInfo *root, RelOptInfo *baserel);
extern void cost_index(IndexPath *path, PlannerInfo *root,
double loop_count);
extern void cost_bitmap_heap_scan(Path *path, PlannerInfo *root, RelOptInfo *baserel,
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 9923f0e..dfb580e 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -32,6 +32,8 @@ extern bool add_path_precheck(RelOptInfo *parent_rel,
extern Path *create_seqscan_path(PlannerInfo *root, RelOptInfo *rel,
Relids required_outer);
+extern SamplePath *create_samplescan_path(PlannerInfo *root, RelOptInfo *rel,
+ Relids required_outer);
extern IndexPath *create_index_path(PlannerInfo *root,
IndexOptInfo *index,
List *indexclauses,
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index 7c243ec..6ff7b44 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -312,7 +312,7 @@ PG_KEYWORD("reindex", REINDEX, UNRESERVED_KEYWORD)
PG_KEYWORD("relative", RELATIVE_P, UNRESERVED_KEYWORD)
PG_KEYWORD("release", RELEASE, UNRESERVED_KEYWORD)
PG_KEYWORD("rename", RENAME, UNRESERVED_KEYWORD)
-PG_KEYWORD("repeatable", REPEATABLE, UNRESERVED_KEYWORD)
+PG_KEYWORD("repeatable", REPEATABLE, RESERVED_KEYWORD)
PG_KEYWORD("replace", REPLACE, UNRESERVED_KEYWORD)
PG_KEYWORD("replica", REPLICA, UNRESERVED_KEYWORD)
PG_KEYWORD("reset", RESET, UNRESERVED_KEYWORD)
@@ -366,6 +366,7 @@ PG_KEYWORD("sysid", SYSID, UNRESERVED_KEYWORD)
PG_KEYWORD("system", SYSTEM_P, UNRESERVED_KEYWORD)
PG_KEYWORD("table", TABLE, RESERVED_KEYWORD)
PG_KEYWORD("tables", TABLES, UNRESERVED_KEYWORD)
+PG_KEYWORD("tablesample", TABLESAMPLE, TYPE_FUNC_NAME_KEYWORD)
PG_KEYWORD("tablespace", TABLESPACE, UNRESERVED_KEYWORD)
PG_KEYWORD("temp", TEMP, UNRESERVED_KEYWORD)
PG_KEYWORD("template", TEMPLATE, UNRESERVED_KEYWORD)
diff --git a/src/include/parser/parse_func.h b/src/include/parser/parse_func.h
index 3264691..6727e55 100644
--- a/src/include/parser/parse_func.h
+++ b/src/include/parser/parse_func.h
@@ -33,6 +33,10 @@ typedef enum
extern Node *ParseFuncOrColumn(ParseState *pstate, List *funcname, List *fargs,
FuncCall *fn, int location);
+extern TableSampleClause *ParseTableSample(ParseState *pstate,
+ char *samplemethod,
+ Node *repeatable, List *args);
+
extern FuncDetailCode func_get_detail(List *funcname,
List *fargs, List *fargnames,
int nargs, Oid *argtypes,
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index 6bd786d..185bd81 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -63,7 +63,6 @@ typedef struct RelationAmInfo
FmgrInfo amcanreturn;
} RelationAmInfo;
-
/*
* Here are the contents of a relation cache entry.
*/
diff --git a/src/include/utils/sampling.h b/src/include/utils/sampling.h
index e3e7f9c..4ac208d 100644
--- a/src/include/utils/sampling.h
+++ b/src/include/utils/sampling.h
@@ -15,7 +15,12 @@
#include "storage/bufmgr.h"
-extern double sampler_random_fract(void);
+/* Random generator for sampling code */
+typedef unsigned short SamplerRandomState[3];
+
+extern void sampler_random_init_state(long seed,
+ SamplerRandomState randstate);
+extern double sampler_random_fract(SamplerRandomState randstate);
/* Block sampling methods */
/* Data structure for Algorithm S from Knuth 3.4.2 */
@@ -25,6 +30,7 @@ typedef struct
int n; /* desired sample size */
BlockNumber t; /* current block number */
int m; /* blocks selected so far */
+ SamplerRandomState randstate; /* random generator state */
} BlockSamplerData;
typedef BlockSamplerData *BlockSampler;
@@ -35,7 +41,12 @@ extern bool BlockSampler_HasMore(BlockSampler bs);
extern BlockNumber BlockSampler_Next(BlockSampler bs);
/* Reservoid sampling methods */
-typedef double ReservoirStateData;
+typedef struct
+{
+ double W;
+ SamplerRandomState randstate; /* random generator state */
+} ReservoirStateData;
+
typedef ReservoirStateData *ReservoirState;
extern void reservoir_init_selection_state(ReservoirState rs, int n);
diff --git a/src/include/utils/syscache.h b/src/include/utils/syscache.h
index ba0b090..6b628f6 100644
--- a/src/include/utils/syscache.h
+++ b/src/include/utils/syscache.h
@@ -79,6 +79,8 @@ enum SysCacheIdentifier
RELOID,
RULERELNAME,
STATRELATTINH,
+ TABLESAMPLEMETHODNAME,
+ TABLESAMPLEMETHODOID,
TABLESPACEOID,
TSCONFIGMAP,
TSCONFIGNAMENSP,
diff --git a/src/include/utils/tablesample.h b/src/include/utils/tablesample.h
new file mode 100644
index 0000000..3c97a04
--- /dev/null
+++ b/src/include/utils/tablesample.h
@@ -0,0 +1,33 @@
+/*--------------------------------------------------------------------------
+ * tablesample.h
+ * Header file for builtin table sampling methods.
+ *
+ * Copyright (c) 2006-2014, PostgreSQL Global Development Group
+ *
+ * src/include/utils/tablesample.h
+ *--------------------------------------------------------------------------
+ */
+#ifndef TABLESAMPLE_H
+#define TABLESAMPLE_H
+
+typedef enum SamplerAccessStrategy
+{
+ SAS_RANDOM,
+ SAS_SEQUENTIAL
+} SamplerAccessStrategy;
+
+extern Datum tsm_system_init(PG_FUNCTION_ARGS);
+extern Datum tsm_system_nextblock(PG_FUNCTION_ARGS);
+extern Datum tsm_system_nexttuple(PG_FUNCTION_ARGS);
+extern Datum tsm_system_end(PG_FUNCTION_ARGS);
+extern Datum tsm_system_reset(PG_FUNCTION_ARGS);
+extern Datum tsm_system_cost(PG_FUNCTION_ARGS);
+
+extern Datum tsm_bernoulli_init(PG_FUNCTION_ARGS);
+extern Datum tsm_bernoulli_nextblock(PG_FUNCTION_ARGS);
+extern Datum tsm_bernoulli_nexttuple(PG_FUNCTION_ARGS);
+extern Datum tsm_bernoulli_end(PG_FUNCTION_ARGS);
+extern Datum tsm_bernoulli_reset(PG_FUNCTION_ARGS);
+extern Datum tsm_bernoulli_cost(PG_FUNCTION_ARGS);
+
+#endif /* TABLESAMPLE_H */
diff --git a/src/test/regress/expected/sanity_check.out b/src/test/regress/expected/sanity_check.out
index c7be273..5946edf 100644
--- a/src/test/regress/expected/sanity_check.out
+++ b/src/test/regress/expected/sanity_check.out
@@ -127,6 +127,7 @@ pg_shdepend|t
pg_shdescription|t
pg_shseclabel|t
pg_statistic|t
+pg_tablesample_method|t
pg_tablespace|t
pg_trigger|t
pg_ts_config|t
diff --git a/src/test/regress/expected/tablesample.out b/src/test/regress/expected/tablesample.out
new file mode 100644
index 0000000..9b387a2
--- /dev/null
+++ b/src/test/regress/expected/tablesample.out
@@ -0,0 +1,165 @@
+CREATE TABLE test_tablesample (id INT, name text) WITH (fillfactor=10); -- force smaller pages so we don't have to load too much data to get multiple pages
+INSERT INTO test_tablesample SELECT i, repeat(i::text, 200) FROM generate_series(0, 9) s(i) ORDER BY i;
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50) REPEATABLE (10);
+ id
+----
+ 0
+ 1
+ 2
+ 3
+ 4
+ 5
+ 9
+(7 rows)
+
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (100.0/11) REPEATABLE (9999);
+ id
+----
+ 6
+ 7
+ 8
+(3 rows)
+
+SELECT count(*) FROM test_tablesample TABLESAMPLE SYSTEM (100);
+ count
+-------
+ 10
+(1 row)
+
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50) REPEATABLE (100);
+ id
+----
+ 0
+ 1
+ 2
+ 6
+ 7
+ 8
+ 9
+(7 rows)
+
+SELECT id FROM test_tablesample TABLESAMPLE BERNOULLI (50) REPEATABLE (100);
+ id
+----
+ 0
+ 1
+ 3
+ 4
+ 5
+(5 rows)
+
+SELECT id FROM test_tablesample TABLESAMPLE BERNOULLI (5.5) REPEATABLE (1);
+ id
+----
+ 0
+ 5
+(2 rows)
+
+CREATE VIEW test_tablesample_v1 AS SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (10*2) REPEATABLE (2);
+CREATE VIEW test_tablesample_v2 AS SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (99);
+SELECT pg_get_viewdef('test_tablesample_v1'::regclass);
+ pg_get_viewdef
+--------------------------------------------------------------------------------
+ SELECT test_tablesample.id +
+ FROM test_tablesample TABLESAMPLE system (((10 * 2))::real) REPEATABLE (2);
+(1 row)
+
+SELECT pg_get_viewdef('test_tablesample_v2'::regclass);
+ pg_get_viewdef
+-----------------------------------------------------------
+ SELECT test_tablesample.id +
+ FROM test_tablesample TABLESAMPLE system ((99)::real);
+(1 row)
+
+BEGIN;
+DECLARE tablesample_cur CURSOR FOR SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50) REPEATABLE (100);
+FETCH FIRST FROM tablesample_cur;
+ id
+----
+ 0
+(1 row)
+
+FETCH NEXT FROM tablesample_cur;
+ id
+----
+ 1
+(1 row)
+
+FETCH NEXT FROM tablesample_cur;
+ id
+----
+ 2
+(1 row)
+
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50) REPEATABLE (10);
+ id
+----
+ 0
+ 1
+ 2
+ 3
+ 4
+ 5
+ 9
+(7 rows)
+
+FETCH NEXT FROM tablesample_cur;
+ id
+----
+ 6
+(1 row)
+
+FETCH NEXT FROM tablesample_cur;
+ id
+----
+ 7
+(1 row)
+
+FETCH NEXT FROM tablesample_cur;
+ id
+----
+ 8
+(1 row)
+
+FETCH FIRST FROM tablesample_cur;
+ id
+----
+ 0
+(1 row)
+
+FETCH NEXT FROM tablesample_cur;
+ id
+----
+ 1
+(1 row)
+
+FETCH NEXT FROM tablesample_cur;
+ id
+----
+ 2
+(1 row)
+
+FETCH NEXT FROM tablesample_cur;
+ id
+----
+ 6
+(1 row)
+
+FETCH NEXT FROM tablesample_cur;
+ id
+----
+ 7
+(1 row)
+
+FETCH NEXT FROM tablesample_cur;
+ id
+----
+ 8
+(1 row)
+
+CLOSE tablesample_cur;
+END;
+DROP TABLE test_tablesample CASCADE;
+NOTICE: drop cascades to 2 other objects
+DETAIL: drop cascades to view test_tablesample_v1
+drop cascades to view test_tablesample_v2
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index e0ae2f2..e0240ac 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -83,7 +83,7 @@ test: select_into select_distinct select_distinct_on select_implicit select_havi
# ----------
# Another group of parallel tests
# ----------
-test: brin gin gist spgist privileges security_label collate matview lock replica_identity rowsecurity object_address
+test: brin gin gist spgist privileges security_label collate matview lock replica_identity rowsecurity object_address tablesample
# ----------
# Another group of parallel tests
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 7f762bd..9a7611b 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -152,3 +152,4 @@ test: with
test: xml
test: event_trigger
test: stats
+test: tablesample
diff --git a/src/test/regress/sql/tablesample.sql b/src/test/regress/sql/tablesample.sql
new file mode 100644
index 0000000..2b89b55
--- /dev/null
+++ b/src/test/regress/sql/tablesample.sql
@@ -0,0 +1,39 @@
+CREATE TABLE test_tablesample (id INT, name text) WITH (fillfactor=10); -- force smaller pages so we don't have to load too much data to get multiple pages
+
+INSERT INTO test_tablesample SELECT i, repeat(i::text, 200) FROM generate_series(0, 9) s(i) ORDER BY i;
+
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50) REPEATABLE (10);
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (100.0/11) REPEATABLE (9999);
+SELECT count(*) FROM test_tablesample TABLESAMPLE SYSTEM (100);
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50) REPEATABLE (100);
+SELECT id FROM test_tablesample TABLESAMPLE BERNOULLI (50) REPEATABLE (100);
+SELECT id FROM test_tablesample TABLESAMPLE BERNOULLI (5.5) REPEATABLE (1);
+
+CREATE VIEW test_tablesample_v1 AS SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (10*2) REPEATABLE (2);
+CREATE VIEW test_tablesample_v2 AS SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (99);
+SELECT pg_get_viewdef('test_tablesample_v1'::regclass);
+SELECT pg_get_viewdef('test_tablesample_v2'::regclass);
+
+BEGIN;
+DECLARE tablesample_cur CURSOR FOR SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50) REPEATABLE (100);
+FETCH FIRST FROM tablesample_cur;
+FETCH NEXT FROM tablesample_cur;
+FETCH NEXT FROM tablesample_cur;
+
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50) REPEATABLE (10);
+
+FETCH NEXT FROM tablesample_cur;
+FETCH NEXT FROM tablesample_cur;
+FETCH NEXT FROM tablesample_cur;
+
+FETCH FIRST FROM tablesample_cur;
+FETCH NEXT FROM tablesample_cur;
+FETCH NEXT FROM tablesample_cur;
+FETCH NEXT FROM tablesample_cur;
+FETCH NEXT FROM tablesample_cur;
+FETCH NEXT FROM tablesample_cur;
+
+CLOSE tablesample_cur;
+END;
+
+DROP TABLE test_tablesample CASCADE;
--
1.9.1
0003-tablesample-ddl-v2.patchtext/x-diff; name=0003-tablesample-ddl-v2.patchDownload
>From 4c7282395650bd1f158fba4d0f8508b5eef84f9c Mon Sep 17 00:00:00 2001
From: Petr Jelinek <pjmodos@pjmodos.net>
Date: Wed, 7 Jan 2015 23:51:44 +0100
Subject: [PATCH 3/3] tablesample-ddl v2
---
doc/src/sgml/ref/allfiles.sgml | 2 +
doc/src/sgml/ref/create_tablesamplemethod.sgml | 149 ++++++++
doc/src/sgml/ref/drop_tablesamplemethod.sgml | 87 +++++
doc/src/sgml/reference.sgml | 2 +
src/backend/catalog/dependency.c | 15 +-
src/backend/catalog/objectaddress.c | 65 +++-
src/backend/commands/Makefile | 6 +-
src/backend/commands/dropcmds.c | 4 +
src/backend/commands/event_trigger.c | 3 +
src/backend/commands/tablecmds.c | 1 +
src/backend/commands/tablesample.c | 398 +++++++++++++++++++++
src/backend/parser/gram.y | 14 +-
src/backend/tcop/utility.c | 12 +
src/include/catalog/dependency.h | 1 +
src/include/catalog/pg_tablesample_method.h | 11 +
src/include/nodes/parsenodes.h | 1 +
src/include/nodes/relation.h | 2 +-
src/include/parser/kwlist.h | 1 +
src/test/modules/Makefile | 3 +-
src/test/modules/tablesample/.gitignore | 4 +
src/test/modules/tablesample/Makefile | 21 ++
.../modules/tablesample/expected/tablesample.out | 39 ++
src/test/modules/tablesample/sql/tablesample.sql | 14 +
src/test/modules/tablesample/tsm_test--1.0.sql | 44 +++
src/test/modules/tablesample/tsm_test.c | 180 ++++++++++
src/test/modules/tablesample/tsm_test.control | 5 +
26 files changed, 1075 insertions(+), 9 deletions(-)
create mode 100644 doc/src/sgml/ref/create_tablesamplemethod.sgml
create mode 100644 doc/src/sgml/ref/drop_tablesamplemethod.sgml
create mode 100644 src/backend/commands/tablesample.c
create mode 100644 src/test/modules/tablesample/.gitignore
create mode 100644 src/test/modules/tablesample/Makefile
create mode 100644 src/test/modules/tablesample/expected/tablesample.out
create mode 100644 src/test/modules/tablesample/sql/tablesample.sql
create mode 100644 src/test/modules/tablesample/tsm_test--1.0.sql
create mode 100644 src/test/modules/tablesample/tsm_test.c
create mode 100644 src/test/modules/tablesample/tsm_test.control
diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index 7aa3128..2fad084 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -78,6 +78,7 @@ Complete list of usable sgml source files in this directory.
<!ENTITY createServer SYSTEM "create_server.sgml">
<!ENTITY createTable SYSTEM "create_table.sgml">
<!ENTITY createTableAs SYSTEM "create_table_as.sgml">
+<!ENTITY createTablesampleMethod SYSTEM "create_tablesamplemethod.sgml">
<!ENTITY createTableSpace SYSTEM "create_tablespace.sgml">
<!ENTITY createTrigger SYSTEM "create_trigger.sgml">
<!ENTITY createTSConfig SYSTEM "create_tsconfig.sgml">
@@ -119,6 +120,7 @@ Complete list of usable sgml source files in this directory.
<!ENTITY dropSequence SYSTEM "drop_sequence.sgml">
<!ENTITY dropServer SYSTEM "drop_server.sgml">
<!ENTITY dropTable SYSTEM "drop_table.sgml">
+<!ENTITY dropTablesampleMethod SYSTEM "drop_tablesamplemethod.sgml">
<!ENTITY dropTableSpace SYSTEM "drop_tablespace.sgml">
<!ENTITY dropTrigger SYSTEM "drop_trigger.sgml">
<!ENTITY dropTSConfig SYSTEM "drop_tsconfig.sgml">
diff --git a/doc/src/sgml/ref/create_tablesamplemethod.sgml b/doc/src/sgml/ref/create_tablesamplemethod.sgml
new file mode 100644
index 0000000..70720e5
--- /dev/null
+++ b/doc/src/sgml/ref/create_tablesamplemethod.sgml
@@ -0,0 +1,149 @@
+<!--
+doc/src/sgml/ref/create_tablesamplemethod.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="SQL-CREATETABLESAMPLEMETHOD">
+ <indexterm zone="sql-createtablesamplemethod">
+ <primary>CREATE TABLESAMPLE METHOD</primary>
+ </indexterm>
+
+ <refmeta>
+ <refentrytitle>CREATE TABLESAMPLE METHOD</refentrytitle>
+ <manvolnum>7</manvolnum>
+ <refmiscinfo>SQL - Language Statements</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+ <refname>CREATE TABLESAMPLE METHOD</refname>
+ <refpurpose>define custom tablesample method</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+<synopsis>
+CREATE TABLESAMPLE METHOD <replaceable class="parameter">name</replaceable> (
+ INIT = <replaceable class="parameter">init_function</replaceable> ,
+ NEXTBLOCK = <replaceable class="parameter">nextblock_function</replaceable> ,
+ NEXTTUPLE = <replaceable class="parameter">nexttuple_function</replaceable> ,
+ END = <replaceable class="parameter">end_function</replaceable> ,
+ RESET = <replaceable class="parameter">reset_function</replaceable> ,
+ COST = <replaceable class="parameter">cost_function</replaceable>
+)
+</synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+ <title>Description</title>
+
+ <para>
+ <command>CREATE TABLESAMPLE METHOD</command> creates a tablesample method.
+ A tablesample method provides alrorithm for reading sample part of a table
+ when used in <command>TABLESAMPLE</> clause of a <command>SELECT</>
+ statement.
+ </para>
+
+ <para>
+ You must be a superuser to use <command>CREATE TABLESAMPLE METHOD</command>.
+ </para>
+ </refsect1>
+
+ <refsect1>
+ <title>Parameters</title>
+
+ <variablelist>
+ <varlistentry>
+ <term><replaceable class="parameter">name</replaceable></term>
+ <listitem>
+ <para>
+ The name of the tablesample method to be created. This name must be
+ unique within the database.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable class="parameter">init_function</replaceable></term>
+ <listitem>
+ <para>
+ The name of the init function for the tablesample method.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable class="parameter">nextblock_function</replaceable></term>
+ <listitem>
+ <para>
+ The name of the get-next-block function for the tablesample method.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable class="parameter">nexttuple_function</replaceable></term>
+ <listitem>
+ <para>
+ The name of the get-next-tuple function for the tablesample method.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable class="parameter">end_function</replaceable></term>
+ <listitem>
+ <para>
+ The name of the end function for the tablesample method.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable class="parameter">reset_function</replaceable></term>
+ <listitem>
+ <para>
+ The name of the reset function for the tablesample method.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable class="parameter">cost_function</replaceable></term>
+ <listitem>
+ <para>
+ The name of the costing function for the tablesample method.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ </variablelist>
+
+ <para>
+ The function names can be schema-qualified if necessary. Argument types
+ are not given, since the argument list for each type of function is
+ predetermined. All functions are required.
+ </para>
+
+ <para>
+ The arguments can appear in any order, not only the one shown above.
+ </para>
+ </refsect1>
+
+ <refsect1>
+ <title>Compatibility</title>
+
+ <para>
+ There is no
+ <command>CREATE TABLESAMPLE METHOD</command> statement in the SQL
+ standard.
+ </para>
+ </refsect1>
+
+ <refsect1>
+ <title>See Also</title>
+
+ <simplelist type="inline">
+ <member><xref linkend="sql-droptablesamplemethod"></member>
+ <member><xref linkend="sql-select"></member>
+ </simplelist>
+ </refsect1>
+</refentry>
diff --git a/doc/src/sgml/ref/drop_tablesamplemethod.sgml b/doc/src/sgml/ref/drop_tablesamplemethod.sgml
new file mode 100644
index 0000000..dffd2ec
--- /dev/null
+++ b/doc/src/sgml/ref/drop_tablesamplemethod.sgml
@@ -0,0 +1,87 @@
+<!--
+doc/src/sgml/ref/drop_tablesamplemethod.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="SQL-DROPTABLESAMPLEMETHOD">
+ <indexterm zone="sql-droptablesamplemethod">
+ <primary>DROP TABLESAMPLE METHOD</primary>
+ </indexterm>
+
+ <refmeta>
+ <refentrytitle>DROP TABLESAMPLE METHOD</refentrytitle>
+ <manvolnum>7</manvolnum>
+ <refmiscinfo>SQL - Language Statements</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+ <refname>DROP TABLESAMPLE METHOD</refname>
+ <refpurpose>remove a custom tablesample method</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+<synopsis>
+DROP TABLESAMPLE METHOD [ IF EXISTS ] <replaceable class="parameter">name</replaceable>
+</synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+ <title>Description</title>
+
+ <para>
+ <command>DROP TABLESAMPLE METHOD</command> drop an existing tablesample
+ method.
+ </para>
+
+ <para>
+ You must be a superuser to use <command>CREATE TABLESAMPLE METHOD</command>.
+ </para>
+ </refsect1>
+
+ <refsect1>
+ <title>Parameters</title>
+
+ <variablelist>
+
+ <varlistentry>
+ <term><literal>IF EXISTS</literal></term>
+ <listitem>
+ <para>
+ Do not throw an error if the tablesample method does not exist.
+ A notice is issued in this case.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable class="parameter">name</replaceable></term>
+ <listitem>
+ <para>
+ The name of an existing tablesample method to be removed.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ </variablelist>
+
+ </refsect1>
+
+ <refsect1>
+ <title>Compatibility</title>
+
+ <para>
+ There is no
+ <command>DROP TABLESAMPLE METHOD</command> statement in the SQL
+ standard.
+ </para>
+ </refsect1>
+
+ <refsect1>
+ <title>See Also</title>
+
+ <simplelist type="inline">
+ <member><xref linkend="sql-createtablesamplemethod"></member>
+ <member><xref linkend="sql-select"></member>
+ </simplelist>
+ </refsect1>
+</refentry>
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index 10c9a6d..2c09a3c 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -106,6 +106,7 @@
&createServer;
&createTable;
&createTableAs;
+ &createTablesampleMethod;
&createTableSpace;
&createTSConfig;
&createTSDictionary;
@@ -147,6 +148,7 @@
&dropSequence;
&dropServer;
&dropTable;
+ &dropTablesampleMethod;
&dropTableSpace;
&dropTSConfig;
&dropTSDictionary;
diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index bacb242..6acb5b3 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -46,6 +46,7 @@
#include "catalog/pg_policy.h"
#include "catalog/pg_proc.h"
#include "catalog/pg_rewrite.h"
+#include "catalog/pg_tablesample_method.h"
#include "catalog/pg_tablespace.h"
#include "catalog/pg_trigger.h"
#include "catalog/pg_ts_config.h"
@@ -157,7 +158,8 @@ static const Oid object_classes[MAX_OCLASS] = {
DefaultAclRelationId, /* OCLASS_DEFACL */
ExtensionRelationId, /* OCLASS_EXTENSION */
EventTriggerRelationId, /* OCLASS_EVENT_TRIGGER */
- PolicyRelationId /* OCLASS_POLICY */
+ PolicyRelationId, /* OCLASS_POLICY */
+ TableSampleMethodRelationId /* OCLASS_TABLESAMPLEMETHOD */
};
@@ -1265,6 +1267,10 @@ doDeletion(const ObjectAddress *object, int flags)
RemovePolicyById(object->objectId);
break;
+ case OCLASS_TABLESAMPLEMETHOD:
+ RemoveTablesampleMethodById(object->objectId);
+ break;
+
default:
elog(ERROR, "unrecognized object class: %u",
object->classId);
@@ -1794,6 +1800,10 @@ find_expr_references_walker(Node *node,
case RTE_RELATION:
add_object_address(OCLASS_CLASS, rte->relid, 0,
context->addrs);
+ if (rte->tablesample)
+ add_object_address(OCLASS_TABLESAMPLEMETHOD,
+ rte->tablesample->tsmid, 0,
+ context->addrs);
break;
default:
break;
@@ -2373,6 +2383,9 @@ getObjectClass(const ObjectAddress *object)
case PolicyRelationId:
return OCLASS_POLICY;
+
+ case TableSampleMethodRelationId:
+ return OCLASS_TABLESAMPLEMETHOD;
}
/* shouldn't get here */
diff --git a/src/backend/catalog/objectaddress.c b/src/backend/catalog/objectaddress.c
index 825d8b2..02edc0a 100644
--- a/src/backend/catalog/objectaddress.c
+++ b/src/backend/catalog/objectaddress.c
@@ -44,6 +44,7 @@
#include "catalog/pg_proc.h"
#include "catalog/pg_policy.h"
#include "catalog/pg_rewrite.h"
+#include "catalog/pg_tablesample_method.h"
#include "catalog/pg_tablespace.h"
#include "catalog/pg_trigger.h"
#include "catalog/pg_ts_config.h"
@@ -429,7 +430,19 @@ static const ObjectPropertyType ObjectProperty[] =
Anum_pg_type_typacl,
ACL_KIND_TYPE,
true
- }
+ },
+ {
+ TableSampleMethodRelationId,
+ TableSampleMethodOidIndexId,
+ TABLESAMPLEMETHODOID,
+ TABLESAMPLEMETHODNAME,
+ Anum_pg_tablesample_method_tsmname,
+ InvalidAttrNumber,
+ InvalidAttrNumber,
+ InvalidAttrNumber,
+ -1,
+ true
+ },
};
/*
@@ -528,7 +541,9 @@ ObjectTypeMap[] =
/* OCLASS_EVENT_TRIGGER */
{ "event trigger", OBJECT_EVENT_TRIGGER },
/* OCLASS_POLICY */
- { "policy", OBJECT_POLICY }
+ { "policy", OBJECT_POLICY },
+ /* OCLASS_TABLESAMPLEMETHOD */
+ { "tablesample method", OBJECT_TABLESAMPLEMETHOD }
};
@@ -670,6 +685,7 @@ get_object_address(ObjectType objtype, List *objname, List *objargs,
case OBJECT_FDW:
case OBJECT_FOREIGN_SERVER:
case OBJECT_EVENT_TRIGGER:
+ case OBJECT_TABLESAMPLEMETHOD:
address = get_object_address_unqualified(objtype,
objname, missing_ok);
break;
@@ -896,6 +912,9 @@ get_object_address_unqualified(ObjectType objtype,
case OBJECT_EVENT_TRIGGER:
msg = gettext_noop("event trigger name cannot be qualified");
break;
+ case OBJECT_TABLESAMPLEMETHOD:
+ msg = gettext_noop("tablesample method name cannot be qualified");
+ break;
default:
elog(ERROR, "unrecognized objtype: %d", (int) objtype);
msg = NULL; /* placate compiler */
@@ -956,6 +975,11 @@ get_object_address_unqualified(ObjectType objtype,
address.objectId = get_event_trigger_oid(name, missing_ok);
address.objectSubId = 0;
break;
+ case OBJECT_TABLESAMPLEMETHOD:
+ address.classId = TableSampleMethodRelationId;
+ address.objectId = get_tablesample_method_oid(name, missing_ok);
+ address.objectSubId = 0;
+ break;
default:
elog(ERROR, "unrecognized objtype: %d", (int) objtype);
/* placate compiler, which doesn't know elog won't return */
@@ -1720,6 +1744,7 @@ check_object_ownership(Oid roleid, ObjectType objtype, ObjectAddress address,
break;
case OBJECT_TSPARSER:
case OBJECT_TSTEMPLATE:
+ case OBJECT_TABLESAMPLEMETHOD:
/* We treat these object types as being owned by superusers */
if (!superuser_arg(roleid))
ereport(ERROR,
@@ -2654,6 +2679,21 @@ getObjectDescription(const ObjectAddress *object)
break;
}
+ case OCLASS_TABLESAMPLEMETHOD:
+ {
+ HeapTuple tup;
+
+ tup = SearchSysCache1(TABLESAMPLEMETHODOID,
+ ObjectIdGetDatum(object->objectId));
+ if (!HeapTupleIsValid(tup))
+ elog(ERROR, "cache lookup failed for tablesample method %u",
+ object->objectId);
+ appendStringInfo(&buffer, _("tablesample method %s"),
+ NameStr(((Form_pg_tablesample_method) GETSTRUCT(tup))->tsmname));
+ ReleaseSysCache(tup);
+ break;
+ }
+
default:
appendStringInfo(&buffer, "unrecognized object %u %u %d",
object->classId,
@@ -3131,6 +3171,10 @@ getObjectTypeDescription(const ObjectAddress *object)
appendStringInfoString(&buffer, "policy");
break;
+ case OCLASS_TABLESAMPLEMETHOD:
+ appendStringInfoString(&buffer, "tablesample method");
+ break;
+
default:
appendStringInfo(&buffer, "unrecognized %u", object->classId);
break;
@@ -4025,6 +4069,23 @@ getObjectIdentityParts(const ObjectAddress *object,
break;
}
+ case OCLASS_TABLESAMPLEMETHOD:
+ {
+ HeapTuple tup;
+ Form_pg_tablesample_method tsmForm;
+
+ tup = SearchSysCache1(TABLESAMPLEMETHODOID,
+ ObjectIdGetDatum(object->objectId));
+ if (!HeapTupleIsValid(tup))
+ elog(ERROR, "cache lookup failed for tablesample method %u",
+ object->objectId);
+ tsmForm = (Form_pg_tablesample_method) GETSTRUCT(tup);
+ appendStringInfoString(&buffer,
+ quote_identifier(NameStr(tsmForm->tsmname)));
+ ReleaseSysCache(tup);
+ break;
+ }
+
default:
appendStringInfo(&buffer, "unrecognized object %u %u %d",
object->classId,
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index b1ac704..04fcd8c 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -18,8 +18,8 @@ OBJS = aggregatecmds.o alter.o analyze.o async.o cluster.o comment.o \
event_trigger.o explain.o extension.o foreigncmds.o functioncmds.o \
indexcmds.o lockcmds.o matview.o operatorcmds.o opclasscmds.o \
policy.o portalcmds.o prepare.o proclang.o \
- schemacmds.o seclabel.o sequence.o tablecmds.o tablespace.o trigger.o \
- tsearchcmds.o typecmds.o user.o vacuum.o vacuumlazy.o \
- variable.o view.o
+ schemacmds.o seclabel.o sequence.o tablecmds.o tablesample.o \
+ tablespace.o trigger.o tsearchcmds.o typecmds.o user.o vacuum.o \
+ vacuumlazy.o variable.o view.o
include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/commands/dropcmds.c b/src/backend/commands/dropcmds.c
index e5185ba..04d29a2 100644
--- a/src/backend/commands/dropcmds.c
+++ b/src/backend/commands/dropcmds.c
@@ -421,6 +421,10 @@ does_not_exist_skipping(ObjectType objtype, List *objname, List *objargs)
args = strVal(linitial(objargs));
}
break;
+ case OBJECT_TABLESAMPLEMETHOD:
+ msg = gettext_noop("tablesample method \"%s\" does not exist, skipping");
+ name = NameListToString(objname);
+ break;
default:
elog(ERROR, "unexpected object type (%d)", (int) objtype);
break;
diff --git a/src/backend/commands/event_trigger.c b/src/backend/commands/event_trigger.c
index a33a5ad..f20e9f7 100644
--- a/src/backend/commands/event_trigger.c
+++ b/src/backend/commands/event_trigger.c
@@ -97,6 +97,7 @@ static event_trigger_support_data event_trigger_support[] = {
{"SEQUENCE", true},
{"SERVER", true},
{"TABLE", true},
+ {"TABLESAMPLE METHOD", true},
{"TABLESPACE", false},
{"TRIGGER", true},
{"TEXT SEARCH CONFIGURATION", true},
@@ -1078,6 +1079,7 @@ EventTriggerSupportsObjectType(ObjectType obtype)
case OBJECT_SEQUENCE:
case OBJECT_TABCONSTRAINT:
case OBJECT_TABLE:
+ case OBJECT_TABLESAMPLEMETHOD:
case OBJECT_TRIGGER:
case OBJECT_TSCONFIGURATION:
case OBJECT_TSDICTIONARY:
@@ -1134,6 +1136,7 @@ EventTriggerSupportsObjectClass(ObjectClass objclass)
case OCLASS_DEFACL:
case OCLASS_EXTENSION:
case OCLASS_POLICY:
+ case OCLASS_TABLESAMPLEMETHOD:
return true;
case MAX_OCLASS:
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 66d5083..b67c560 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -8059,6 +8059,7 @@ ATExecAlterColumnType(AlteredTableInfo *tab, Relation rel,
case OCLASS_USER_MAPPING:
case OCLASS_DEFACL:
case OCLASS_EXTENSION:
+ case OCLASS_TABLESAMPLEMETHOD:
/*
* We don't expect any of these sorts of objects to depend on
diff --git a/src/backend/commands/tablesample.c b/src/backend/commands/tablesample.c
new file mode 100644
index 0000000..3fcd0bd
--- /dev/null
+++ b/src/backend/commands/tablesample.c
@@ -0,0 +1,398 @@
+/*-------------------------------------------------------------------------
+ *
+ * tablesample.c
+ * Commands to manipulate tablesample methods
+ *
+ * Table sampling methods provide algorithms for doing sample scan over
+ * the table.
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ * src/backend/commands/tablesample.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <unistd.h>
+#include <dirent.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "catalog/dependency.h"
+#include "catalog/indexing.h"
+#include "catalog/objectaccess.h"
+#include "catalog/pg_proc.h"
+#include "catalog/pg_tablesample_method.h"
+#include "catalog/pg_type.h"
+#include "commands/defrem.h"
+#include "miscadmin.h"
+#include "parser/parse_func.h"
+#include "utils/builtins.h"
+#include "utils/rel.h"
+#include "utils/lsyscache.h"
+#include "utils/syscache.h"
+
+
+static Datum
+get_tabmesample_method_func(DefElem *defel, int attnum)
+{
+ List *funcName = defGetQualifiedName(defel);
+ /* Big enough size for our needs. */
+ Oid *typeId = palloc0(6 * sizeof(Oid));
+ Oid retTypeId;
+ int nargs;
+ Oid procOid = InvalidOid;
+ FuncCandidateList clist;
+
+ switch (attnum)
+ {
+ case Anum_pg_tablesample_method_tsminit:
+ /*
+ * tsminit needs special handling because it is defined as function
+ * with 3 or more arguments and only first two arguments must have
+ * specific type, the rest is up to the tablesample method creator.
+ */
+ {
+ nargs = 2;
+ typeId[0] = INTERNALOID;
+ typeId[1] = INT4OID;
+ retTypeId = VOIDOID;
+
+ clist = FuncnameGetCandidates(funcName, -1, NIL, false, false, false);
+
+ while (clist)
+ {
+ if (clist->nargs >= 3 &&
+ memcmp(typeId, clist->args, nargs * sizeof(Oid)) == 0)
+ {
+ procOid = clist->oid;
+ /* Save real function signature for future errors. */
+ nargs = clist->nargs;
+ pfree(typeId);
+ typeId = clist->args;
+ break;
+ }
+ clist = clist->next;
+ }
+
+ if (!OidIsValid(procOid))
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_FUNCTION),
+ errmsg("function \"%s\" does not exist or does not have valid signature",
+ NameListToString(funcName)),
+ errhint("The tamplesample method init function "
+ "must have at least 3 input parameters "
+ "with first one of type INTERNAL and second of type INTEGER.")));
+ }
+ break;
+
+ case Anum_pg_tablesample_method_tsmnextblock:
+ nargs = 2;
+ typeId[0] = INTERNALOID;
+ typeId[1] = BOOLOID;
+ retTypeId = INT4OID;
+ break;
+ case Anum_pg_tablesample_method_tsmnexttuple:
+ nargs = 4;
+ typeId[0] = INTERNALOID;
+ typeId[1] = INT4OID;
+ typeId[2] = INT2OID;
+ typeId[3] = BOOLOID;
+ retTypeId = INT2OID;
+ break;
+ case Anum_pg_tablesample_method_tsmend:
+ case Anum_pg_tablesample_method_tsmreset:
+ nargs = 1;
+ typeId[0] = INTERNALOID;
+ retTypeId = VOIDOID;
+ break;
+ case Anum_pg_tablesample_method_tsmcost:
+ nargs = 6;
+ typeId[0] = INTERNALOID;
+ typeId[1] = INTERNALOID;
+ typeId[2] = INTERNALOID;
+ typeId[3] = INTERNALOID;
+ typeId[4] = INTERNALOID;
+ typeId[5] = INTERNALOID;
+ retTypeId = VOIDOID;
+ break;
+ default:
+ /* should not be here */
+ elog(ERROR, "unrecognized attribute for tablesample method: %d",
+ attnum);
+ nargs = 0; /* keep compiler quiet */
+ }
+
+ if (!OidIsValid(procOid))
+ procOid = LookupFuncName(funcName, nargs, typeId, false);
+ if (get_func_rettype(procOid) != retTypeId)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("function %s should return type %s",
+ func_signature_string(funcName, nargs, NIL, typeId),
+ format_type_be(retTypeId))));
+
+ return ObjectIdGetDatum(procOid);
+}
+
+/*
+ * make pg_depend entries for a new pg_tablesample_method entry
+ */
+static void
+makeTablesampleMethodDeps(HeapTuple tuple)
+{
+ Form_pg_tablesample_method tsm = (Form_pg_tablesample_method) GETSTRUCT(tuple);
+ ObjectAddress myself,
+ referenced;
+
+ myself.classId = TableSampleMethodRelationId;
+ myself.objectId = HeapTupleGetOid(tuple);
+ myself.objectSubId = 0;
+
+ /* dependency on extension */
+ recordDependencyOnCurrentExtension(&myself, false);
+
+ /* dependencies on functions */
+ referenced.classId = ProcedureRelationId;
+ referenced.objectSubId = 0;
+
+ referenced.objectId = tsm->tsminit;
+ recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);
+
+ referenced.objectId = tsm->tsmnextblock;
+ recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);
+
+ referenced.objectId = tsm->tsmnexttuple;
+ recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);
+
+ referenced.objectId = tsm->tsmend;
+ recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);
+
+ referenced.objectId = tsm->tsmreset;
+ recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);
+
+ referenced.objectId = tsm->tsmcost;
+ recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);
+}
+
+/*
+ * Create a table sampling method
+ *
+ * Only superusers can create a table sampling methods.
+ */
+Oid
+DefineTablesampleMethod(List *names, List *parameters)
+{
+ char *tsmname = strVal(linitial(names));
+ Oid tsmoid;
+ ListCell *pl;
+ Relation rel;
+ Datum values[Natts_pg_tablesample_method];
+ bool nulls[Natts_pg_tablesample_method];
+ HeapTuple tuple;
+
+ /* Must be super user. */
+ if (!superuser())
+ ereport(ERROR,
+ (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+ errmsg("permission denied to create tablesample method \"%s\"",
+ tsmname),
+ errhint("Must be superuser to create a tablesample method.")));
+
+ /* Must not already exist. */
+ tsmoid = get_tablesample_method_oid(tsmname, true);
+ if (OidIsValid(tsmoid))
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_OBJECT),
+ errmsg("tablesample method \"%s\" already exists",
+ tsmname)));
+
+ /* Initialize the values. */
+ memset(values, 0, sizeof(values));
+ memset(nulls, false, sizeof(nulls));
+
+ values[Anum_pg_tablesample_method_tsmname - 1] =
+ DirectFunctionCall1(namein, CStringGetDatum(tsmname));
+
+ /*
+ * loop over the definition list and extract the information we need.
+ */
+ foreach(pl, parameters)
+ {
+ DefElem *defel = (DefElem *) lfirst(pl);
+
+ if (pg_strcasecmp(defel->defname, "init") == 0)
+ {
+ values[Anum_pg_tablesample_method_tsminit - 1] =
+ get_tabmesample_method_func(defel,
+ Anum_pg_tablesample_method_tsminit);
+ }
+ else if (pg_strcasecmp(defel->defname, "nextblock") == 0)
+ {
+ values[Anum_pg_tablesample_method_tsmnextblock - 1] =
+ get_tabmesample_method_func(defel,
+ Anum_pg_tablesample_method_tsmnextblock);
+ }
+ else if (pg_strcasecmp(defel->defname, "nexttuple") == 0)
+ {
+ values[Anum_pg_tablesample_method_tsmnexttuple - 1] =
+ get_tabmesample_method_func(defel,
+ Anum_pg_tablesample_method_tsmnexttuple);
+ }
+ else if (pg_strcasecmp(defel->defname, "end") == 0)
+ {
+ values[Anum_pg_tablesample_method_tsmend - 1] =
+ get_tabmesample_method_func(defel,
+ Anum_pg_tablesample_method_tsmend);
+ }
+ else if (pg_strcasecmp(defel->defname, "reset") == 0)
+ {
+ values[Anum_pg_tablesample_method_tsmreset - 1] =
+ get_tabmesample_method_func(defel,
+ Anum_pg_tablesample_method_tsmreset);
+ }
+ else if (pg_strcasecmp(defel->defname, "cost") == 0)
+ {
+ values[Anum_pg_tablesample_method_tsmcost - 1] =
+ get_tabmesample_method_func(defel,
+ Anum_pg_tablesample_method_tsmcost);
+ }
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("tablesample method parameter \"%s\" not recognized",
+ defel->defname)));
+ }
+
+ /*
+ * Validation.
+ */
+ if (!OidIsValid(DatumGetObjectId(values[Anum_pg_tablesample_method_tsminit - 1])))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("tablesample method init function is required")));
+
+ if (!OidIsValid(DatumGetObjectId(values[Anum_pg_tablesample_method_tsmnextblock - 1])))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("tablesample method nextblock function is required")));
+
+ if (!OidIsValid(DatumGetObjectId(values[Anum_pg_tablesample_method_tsmnexttuple - 1])))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("tablesample method nexttuple function is required")));
+
+ if (!OidIsValid(DatumGetObjectId(values[Anum_pg_tablesample_method_tsmend - 1])))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("tablesample method end function is required")));
+
+ if (!OidIsValid(DatumGetObjectId(values[Anum_pg_tablesample_method_tsmreset - 1])))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("tablesample method reset function is required")));
+
+ if (!OidIsValid(DatumGetObjectId(values[Anum_pg_tablesample_method_tsmcost - 1])))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("tablesample method cost function is required")));
+
+ /*
+ * Insert tuple into pg_tablesample_method.
+ */
+ rel = heap_open(TableSampleMethodRelationId, RowExclusiveLock);
+
+ tuple = heap_form_tuple(rel->rd_att, values, nulls);
+
+ tsmoid = simple_heap_insert(rel, tuple);
+
+ CatalogUpdateIndexes(rel, tuple);
+
+ makeTablesampleMethodDeps(tuple);
+
+ heap_freetuple(tuple);
+
+ /* Post creation hook for new tablesample method */
+ InvokeObjectPostCreateHook(TableSampleMethodRelationId, tsmoid, 0);
+
+ heap_close(rel, RowExclusiveLock);
+
+ return tsmoid;
+}
+
+/*
+ * Drop a tablesample method.
+ */
+void
+RemoveTablesampleMethodById(Oid tsmoid)
+{
+ Relation rel;
+ HeapTuple tuple;
+ Form_pg_tablesample_method tsm;
+
+ /*
+ * Find the target tuple
+ */
+ rel = heap_open(TableSampleMethodRelationId, RowExclusiveLock);
+
+ tuple = SearchSysCache1(TABLESAMPLEMETHODOID, ObjectIdGetDatum(tsmoid));
+ if (!HeapTupleIsValid(tuple))
+ elog(ERROR, "cache lookup failed for tablesample method %u",
+ tsmoid);
+
+ tsm = (Form_pg_tablesample_method) GETSTRUCT(tuple);
+ /* Can't drop builtin tablesample methods. */
+ if (tsmoid == TABLESAMPLE_METHOD_SYSTEM_OID ||
+ tsmoid == TABLESAMPLE_METHOD_BERNOULLI_OID)
+ ereport(ERROR,
+ (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+ errmsg("permission denied for tablesample method %s",
+ NameStr(tsm->tsmname))));
+
+ /*
+ * Remove the pg_tablespace tuple (this will roll back if we fail below)
+ */
+ simple_heap_delete(rel, &tuple->t_self);
+
+ ReleaseSysCache(tuple);
+
+ heap_close(rel, RowExclusiveLock);
+}
+
+/*
+ * get_tablesample_method_oid - given a tablesample method name,
+ * look up the OID
+ *
+ * If missing_ok is false, throw an error if tablesample method name not found.
+ * If true, just return InvalidOid.
+ */
+Oid
+get_tablesample_method_oid(const char *tsmname, bool missing_ok)
+{
+ Oid result;
+ HeapTuple tuple;
+
+ tuple = SearchSysCache1(TABLESAMPLEMETHODNAME, PointerGetDatum(tsmname));
+ if (HeapTupleIsValid(tuple))
+ {
+ result = HeapTupleGetOid(tuple);
+ ReleaseSysCache(tuple);
+ }
+ else
+ result = InvalidOid;
+
+ if (!OidIsValid(result) && !missing_ok)
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_OBJECT),
+ errmsg("tablesample method \"%s\" does not exist",
+ tsmname)));
+
+ return result;
+}
+
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 01d72d4..4bf4aff 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -586,7 +586,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
LEADING LEAKPROOF LEAST LEFT LEVEL LIKE LIMIT LISTEN LOAD LOCAL
LOCALTIME LOCALTIMESTAMP LOCATION LOCK_P LOCKED LOGGED
- MAPPING MATCH MATERIALIZED MAXVALUE MINUTE_P MINVALUE MODE MONTH_P MOVE
+ MAPPING MATCH MATERIALIZED MAXVALUE METHOD MINUTE_P MINVALUE MODE MONTH_P
+ MOVE
NAME_P NAMES NATIONAL NATURAL NCHAR NEXT NO NONE
NOT NOTHING NOTIFY NOTNULL NOWAIT NULL_P NULLIF
@@ -5094,6 +5095,15 @@ DefineStmt:
n->definition = list_make1(makeDefElem("from", (Node *) $5));
$$ = (Node *)n;
}
+ | CREATE TABLESAMPLE METHOD name definition
+ {
+ DefineStmt *n = makeNode(DefineStmt);
+ n->kind = OBJECT_TABLESAMPLEMETHOD;
+ n->args = NIL;
+ n->defnames = list_make1(makeString($4));
+ n->definition = $5;
+ $$ = (Node *)n;
+ }
;
definition: '(' def_list ')' { $$ = $2; }
@@ -5552,6 +5562,7 @@ drop_type: TABLE { $$ = OBJECT_TABLE; }
| TEXT_P SEARCH DICTIONARY { $$ = OBJECT_TSDICTIONARY; }
| TEXT_P SEARCH TEMPLATE { $$ = OBJECT_TSTEMPLATE; }
| TEXT_P SEARCH CONFIGURATION { $$ = OBJECT_TSCONFIGURATION; }
+ | TABLESAMPLE METHOD { $$ = OBJECT_TABLESAMPLEMETHOD; }
;
any_name_list:
@@ -13313,6 +13324,7 @@ unreserved_keyword:
| MATCH
| MATERIALIZED
| MAXVALUE
+ | METHOD
| MINUTE_P
| MINVALUE
| MODE
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 3533cfa..532256d 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -23,6 +23,7 @@
#include "access/xlog.h"
#include "catalog/catalog.h"
#include "catalog/namespace.h"
+#include "catalog/pg_tablesample_method.h"
#include "catalog/toasting.h"
#include "commands/alter.h"
#include "commands/async.h"
@@ -1106,6 +1107,11 @@ ProcessUtilitySlow(Node *parsetree,
Assert(stmt->args == NIL);
DefineCollation(stmt->defnames, stmt->definition);
break;
+ case OBJECT_TABLESAMPLEMETHOD:
+ Assert(stmt->args == NIL);
+ Assert(list_length(stmt->defnames) == 1);
+ DefineTablesampleMethod(stmt->defnames, stmt->definition);
+ break;
default:
elog(ERROR, "unrecognized define stmt type: %d",
(int) stmt->kind);
@@ -1960,6 +1966,9 @@ CreateCommandTag(Node *parsetree)
case OBJECT_POLICY:
tag = "DROP POLICY";
break;
+ case OBJECT_TABLESAMPLEMETHOD:
+ tag = "DROP TABLESAMPLE METHOD";
+ break;
default:
tag = "???";
}
@@ -2056,6 +2065,9 @@ CreateCommandTag(Node *parsetree)
case OBJECT_COLLATION:
tag = "CREATE COLLATION";
break;
+ case OBJECT_TABLESAMPLEMETHOD:
+ tag = "CREATE TABLESAMPLE METHOD";
+ break;
default:
tag = "???";
}
diff --git a/src/include/catalog/dependency.h b/src/include/catalog/dependency.h
index 6481ac8..30653f8 100644
--- a/src/include/catalog/dependency.h
+++ b/src/include/catalog/dependency.h
@@ -148,6 +148,7 @@ typedef enum ObjectClass
OCLASS_EXTENSION, /* pg_extension */
OCLASS_EVENT_TRIGGER, /* pg_event_trigger */
OCLASS_POLICY, /* pg_policy */
+ OCLASS_TABLESAMPLEMETHOD, /* pg_tablesample_method */
MAX_OCLASS /* MUST BE LAST */
} ObjectClass;
diff --git a/src/include/catalog/pg_tablesample_method.h b/src/include/catalog/pg_tablesample_method.h
index 0e4a716..4ae8364 100644
--- a/src/include/catalog/pg_tablesample_method.h
+++ b/src/include/catalog/pg_tablesample_method.h
@@ -64,7 +64,18 @@ typedef FormData_pg_tablesample_method *Form_pg_tablesample_method;
DATA(insert OID = 3283 ( system tsm_system_init tsm_system_nextblock tsm_system_nexttuple tsm_system_end tsm_system_reset tsm_system_cost ));
DESCR("SYSTEM table sampling method");
+#define TABLESAMPLE_METHOD_SYSTEM_OID 3283
DATA(insert OID = 3284 ( bernoulli tsm_bernoulli_init tsm_bernoulli_nextblock tsm_bernoulli_nexttuple tsm_bernoulli_end tsm_bernoulli_reset tsm_bernoulli_cost ));
DESCR("BERNOULLI table sampling method");
+#define TABLESAMPLE_METHOD_BERNOULLI_OID 3284
+
+/* ----------------
+ * functions for manipulation of pg_tablesample_method
+ * ----------------
+ */
+
+extern Oid DefineTablesampleMethod(List *names, List *parameters);
+extern void RemoveTablesampleMethodById(Oid tsmoid);
+extern Oid get_tablesample_method_oid(const char *tsmname, bool missing_ok);
#endif /* PG_TABLESAMPLE_METHOD_H */
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index d87343f..54c4ba5 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -1267,6 +1267,7 @@ typedef enum ObjectType
OBJECT_SEQUENCE,
OBJECT_TABCONSTRAINT,
OBJECT_TABLE,
+ OBJECT_TABLESAMPLEMETHOD,
OBJECT_TABLESPACE,
OBJECT_TRIGGER,
OBJECT_TSCONFIGURATION,
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 67c3b1f..33c0f3d 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -877,7 +877,7 @@ typedef struct TidPath
typedef struct SamplePath
{
Path path;
- Oid tsmcost; /* table sample method costing function */
+ Oid tsmcost; /* tablesample method costing function */
List *tsmargs; /* arguments to a TABLESAMPLE clause */
} SamplePath;
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index 6ff7b44..c3269c0 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -236,6 +236,7 @@ PG_KEYWORD("mapping", MAPPING, UNRESERVED_KEYWORD)
PG_KEYWORD("match", MATCH, UNRESERVED_KEYWORD)
PG_KEYWORD("materialized", MATERIALIZED, UNRESERVED_KEYWORD)
PG_KEYWORD("maxvalue", MAXVALUE, UNRESERVED_KEYWORD)
+PG_KEYWORD("method", METHOD, UNRESERVED_KEYWORD)
PG_KEYWORD("minute", MINUTE_P, UNRESERVED_KEYWORD)
PG_KEYWORD("minvalue", MINVALUE, UNRESERVED_KEYWORD)
PG_KEYWORD("mode", MODE, UNRESERVED_KEYWORD)
diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile
index 93d93af..37ea524 100644
--- a/src/test/modules/Makefile
+++ b/src/test/modules/Makefile
@@ -9,7 +9,8 @@ SUBDIRS = \
worker_spi \
dummy_seclabel \
test_shm_mq \
- test_parser
+ test_parser \
+ tablesample
all: submake-errcodes
diff --git a/src/test/modules/tablesample/.gitignore b/src/test/modules/tablesample/.gitignore
new file mode 100644
index 0000000..5dcb3ff
--- /dev/null
+++ b/src/test/modules/tablesample/.gitignore
@@ -0,0 +1,4 @@
+# Generated subdirectories
+/log/
+/results/
+/tmp_check/
diff --git a/src/test/modules/tablesample/Makefile b/src/test/modules/tablesample/Makefile
new file mode 100644
index 0000000..469b004
--- /dev/null
+++ b/src/test/modules/tablesample/Makefile
@@ -0,0 +1,21 @@
+# src/test/modules/tsm_test/Makefile
+
+MODULE_big = tsm_test
+OBJS = tsm_test.o $(WIN32RES)
+PGFILEDESC = "tsm_test - example of a custom tablesample method"
+
+EXTENSION = tsm_test
+DATA = tsm_test--1.0.sql
+
+REGRESS = tablesample
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/tablesample
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/src/test/modules/tablesample/expected/tablesample.out b/src/test/modules/tablesample/expected/tablesample.out
new file mode 100644
index 0000000..95c8036
--- /dev/null
+++ b/src/test/modules/tablesample/expected/tablesample.out
@@ -0,0 +1,39 @@
+CREATE EXTENSION tsm_test;
+CREATE TABLE test_tsm AS SELECT md5(i::text) a FROM generate_series(1,10) g(i);
+SELECT * FROM test_tsm TABLESAMPLE tsm_test(true);
+ a
+----------------------------------
+ c4ca4238a0b923820dcc509a6f75849b
+ c81e728d9d4c2f636f067f89cc14862c
+ eccbc87e4b5ce2fe28308fd9f2a7baf3
+ a87ff679a2f3e71d9181a67b7542122c
+ e4da3b7fbbce2345d7772b0674a318d5
+ 1679091c5a880faf6fb5e6087eb1b2dc
+ 8f14e45fceea167a5a36dedd4bea2543
+ c9f0f895fb98ab9159f51fd0297e236d
+ 45c48cce2e2d7fbdea1afc51c7c6ad26
+ d3d9446802a44259755d38e6d163e820
+(10 rows)
+
+SELECT * FROM test_tsm TABLESAMPLE tsm_test(false);
+ a
+---
+(0 rows)
+
+CREATE VIEW test_tsm_v AS SELECT * FROM test_tsm TABLESAMPLE tsm_test(true);
+DROP TABLESAMPLE METHOD tsm_test;
+ERROR: cannot drop tablesample method tsm_test because extension tsm_test requires it
+HINT: You can drop extension tsm_test instead.
+DROP EXTENSION tsm_test;
+ERROR: cannot drop extension tsm_test because other objects depend on it
+DETAIL: view test_tsm_v depends on tablesample method tsm_test
+HINT: Use DROP ... CASCADE to drop the dependent objects too.
+DROP EXTENSION tsm_test CASCADE;
+NOTICE: drop cascades to view test_tsm_v
+SELECT * FROM pg_tablesample_method;
+ tsmname | tsminit | tsmnextblock | tsmnexttuple | tsmend | tsmreset | tsmcost
+-----------+--------------------+-------------------------+-------------------------+-------------------+---------------------+--------------------
+ system | tsm_system_init | tsm_system_nextblock | tsm_system_nexttuple | tsm_system_end | tsm_system_reset | tsm_system_cost
+ bernoulli | tsm_bernoulli_init | tsm_bernoulli_nextblock | tsm_bernoulli_nexttuple | tsm_bernoulli_end | tsm_bernoulli_reset | tsm_bernoulli_cost
+(2 rows)
+
diff --git a/src/test/modules/tablesample/sql/tablesample.sql b/src/test/modules/tablesample/sql/tablesample.sql
new file mode 100644
index 0000000..70997bd
--- /dev/null
+++ b/src/test/modules/tablesample/sql/tablesample.sql
@@ -0,0 +1,14 @@
+CREATE EXTENSION tsm_test;
+
+CREATE TABLE test_tsm AS SELECT md5(i::text) a FROM generate_series(1,10) g(i);
+
+SELECT * FROM test_tsm TABLESAMPLE tsm_test(true);
+SELECT * FROM test_tsm TABLESAMPLE tsm_test(false);
+
+CREATE VIEW test_tsm_v AS SELECT * FROM test_tsm TABLESAMPLE tsm_test(true);
+
+DROP TABLESAMPLE METHOD tsm_test;
+DROP EXTENSION tsm_test;
+DROP EXTENSION tsm_test CASCADE;
+
+SELECT * FROM pg_tablesample_method;
diff --git a/src/test/modules/tablesample/tsm_test--1.0.sql b/src/test/modules/tablesample/tsm_test--1.0.sql
new file mode 100644
index 0000000..6cfa014
--- /dev/null
+++ b/src/test/modules/tablesample/tsm_test--1.0.sql
@@ -0,0 +1,44 @@
+/* src/test/modules/tablesample/tsm_test--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION tsm_test" to load this file. \quit
+
+CREATE FUNCTION tsm_test_init(internal, int4, bool)
+RETURNS void
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION tsm_test_nextblock(internal, bool)
+RETURNS int4
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION tsm_test_nexttuple(internal, int4, int2, bool)
+RETURNS int2
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION tsm_test_end(internal)
+RETURNS void
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION tsm_test_reset(internal)
+RETURNS void
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION tsm_test_cost(internal, internal, internal, internal, internal, internal)
+RETURNS void
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+
+CREATE TABLESAMPLE METHOD tsm_test (
+ INIT = tsm_test_init,
+ NEXTBLOCK = tsm_test_nextblock,
+ NEXTTUPLE = tsm_test_nexttuple,
+ END = tsm_test_end,
+ RESET = tsm_test_reset,
+ COST = tsm_test_cost
+);
diff --git a/src/test/modules/tablesample/tsm_test.c b/src/test/modules/tablesample/tsm_test.c
new file mode 100644
index 0000000..77016fd
--- /dev/null
+++ b/src/test/modules/tablesample/tsm_test.c
@@ -0,0 +1,180 @@
+/*-------------------------------------------------------------------------
+ *
+ * tsm_test.c
+ * Simple example of a custom tablesample method
+ *
+ * Copyright (c) 2007-2014, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/test/modules/tablesample/tsm_test.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "fmgr.h"
+
+#include "nodes/execnodes.h"
+#include "nodes/relation.h"
+#include "optimizer/clauses.h"
+#include "storage/bufmgr.h"
+#include "utils/tablesample.h"
+
+PG_MODULE_MAGIC;
+
+/* State */
+typedef struct
+{
+ bool ret;
+ BlockNumber tblocks; /* total blocks in relation */
+ BlockNumber blockno; /* current block */
+ OffsetNumber lt; /* last tuple returned from current block */
+} tsm_test_state;
+
+
+PG_FUNCTION_INFO_V1(tsm_test_init);
+PG_FUNCTION_INFO_V1(tsm_test_nextblock);
+PG_FUNCTION_INFO_V1(tsm_test_nexttuple);
+PG_FUNCTION_INFO_V1(tsm_test_end);
+PG_FUNCTION_INFO_V1(tsm_test_reset);
+PG_FUNCTION_INFO_V1(tsm_test_cost);
+
+/*
+ * Initialize the state.
+ */
+Datum
+tsm_test_init(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ bool ret;
+ Relation rel = scanstate->ss.ss_currentRelation;
+ tsm_test_state *state;
+
+ if (PG_ARGISNULL(2))
+ ereport(ERROR,
+ (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+ errmsg("invalid parameter for tablesample method tsm_test"),
+ errhint("Return has cannot be NULL.")));
+
+ ret = PG_GETARG_BOOL(2);
+
+ state = palloc0(sizeof(tsm_test_state));
+
+ /* Remember initial values for reinit */
+ state->ret = ret;
+ state->tblocks = RelationGetNumberOfBlocks(rel);
+ state->blockno = InvalidBlockNumber;
+ state->lt = InvalidOffsetNumber;
+ scanstate->tsmdata = (void *) state;
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Get next block number to read or InvalidBlockNumber if we are at the
+ * end of the relation.
+ */
+Datum
+tsm_test_nextblock(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ tsm_test_state *state = (tsm_test_state *) scanstate->tsmdata;
+
+ if (!state->ret)
+ PG_RETURN_UINT32(InvalidBlockNumber);
+
+ if (state->blockno == InvalidBlockNumber)
+ state->blockno = 0;
+ else if (++state->blockno >= state->tblocks)
+ PG_RETURN_UINT32(InvalidBlockNumber);
+
+ PG_RETURN_UINT32(state->blockno);
+}
+
+/*
+ * Get next tuple from current block.
+ */
+Datum
+tsm_test_nexttuple(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ OffsetNumber maxoffset = PG_GETARG_UINT16(2);
+ tsm_test_state *state = (tsm_test_state *) scanstate->tsmdata;
+
+ if (!state->ret)
+ PG_RETURN_UINT16(InvalidOffsetNumber);
+
+ if (state->lt == InvalidOffsetNumber)
+ state->lt = FirstOffsetNumber;
+ else if (++state->lt > maxoffset)
+ PG_RETURN_UINT16(InvalidOffsetNumber);
+
+ PG_RETURN_UINT16(state->lt);
+}
+
+/*
+ * Cleanup method.
+ */
+Datum
+tsm_test_end(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+
+ pfree(scanstate->tsmdata);
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Reset state (called by ReScan).
+ */
+Datum
+tsm_test_reset(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ tsm_test_state *state = (tsm_test_state *) scanstate->tsmdata;
+
+ state->blockno = InvalidBlockNumber;
+ state->lt = InvalidOffsetNumber;
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Costing function.
+ */
+Datum
+tsm_test_cost(PG_FUNCTION_ARGS)
+{
+ PlannerInfo *root = (PlannerInfo *) PG_GETARG_POINTER(0);
+ SamplePath *path = (SamplePath *) PG_GETARG_POINTER(1);
+ RelOptInfo *baserel = (RelOptInfo *) PG_GETARG_POINTER(2);
+ BlockNumber *pages = (BlockNumber *) PG_GETARG_POINTER(3);
+ double *tuples = (double *) PG_GETARG_POINTER(4);
+ List *args = path->tsmargs;
+ Node *pctnode;
+ bool ret;
+
+ SamplerAccessStrategy *strategy =
+ (SamplerAccessStrategy *) PG_GETARG_POINTER(5);
+
+ *strategy = SAS_SEQUENTIAL;
+
+ pctnode = linitial(args);
+ pctnode = estimate_expression_value(root, pctnode);
+
+ if (IsA(pctnode, RelabelType))
+ pctnode = (Node *) ((RelabelType *) pctnode)->arg;
+
+ if (IsA(pctnode, Const))
+ ret = DatumGetBool(((Const *) pctnode)->constvalue);
+ else
+ ret = true;
+
+ *pages = ret ? baserel->pages : 0;
+ *tuples = ret ? baserel->tuples : 0;
+
+ PG_RETURN_VOID();
+}
+
diff --git a/src/test/modules/tablesample/tsm_test.control b/src/test/modules/tablesample/tsm_test.control
new file mode 100644
index 0000000..a7b2741
--- /dev/null
+++ b/src/test/modules/tablesample/tsm_test.control
@@ -0,0 +1,5 @@
+# tsm_test extension
+comment = 'test module for custom tablesample method'
+default_version = '1.0'
+module_pathname = '$libdir/tsm_test'
+relocatable = true
--
1.9.1
On Sun, Jan 11, 2015 at 1:29 AM, Petr Jelinek <petr@2ndquadrant.com> wrote:
In second patch which implements the TABLESAMPLE itself I changed the
implementation of random generator because when I looked at the code again
I realized the old one would produce wrong results if there were multiple
TABLESAMPLE statements in same query or multiple cursors in same
transaction.
I have looked into this patch and would like to share my
findings with you.
1.
+static void
+set_tablesample_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
RangeTblEntry *rte)
{
..
+ /*
+ * There is only one plan to consider but we still need to set
+ * parameters for RelOptInfo.
+ */
+ set_cheapest(rel);
}
It seems we already call set_cheapest(rel) in set_rel_pathlist()
which is a caller of set_tablesample_rel_pathlist(), so why do
we need it inside set_tablesample_rel_pathlist()?
2.
+static void
+set_tablesample_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
RangeTblEntry *rte)
{
..
+ /* We only do sample scan if it was requested */
+ add_path(rel, (Path *) create_samplescan_path(root, rel, required_outer));
}
Do we need to add_path, if there is only one path?
3.
@@ -332,6 +334,11 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
/* Foreign table */
set_foreign_pathlist(root, rel, rte);
}
+ else if (rte->tablesample != NULL)
+ {
+ /* Build sample scan on relation */
+ set_tablesample_rel_pathlist(root, rel, rte);
+ }
Have you consider to have a different RTE for this?
4.
SampleNext()
a.
Isn't it better to code SampleNext() similar to SeqNext() and
IndexNext(), which encapsulate the logic to get the tuple in
another function(tbs_next() or something like that)?
b.
Also don't we want to handle pruning of page while
scanning (heap_page_prune_opt()) and I observed
in heap scan API's after visibility check we do check
for serializable conflict (CheckForSerializableConflictOut()).
Another thing is don't we want to do anything for sync scans
for these method's as they are doing heap scan?
c.
for bernoulli method, it will first get the tupoffset with
the help of function and then apply visibility check, it seems
that could reduce the sample set way lower than expected
in case lot of tuples are not visible, shouldn't it be done in reverse
way(first visibility check, then call function to see if we want to
include it in the sample)?
I think something similar is done in acquire_sample_rows which
seems right to me.
5.
CREATE TABLE test_tablesample (id INT, name text) WITH (fillfactor=10);
INSERT INTO test_tablesample SELECT i, repeat(i::text, 200) FROM
generate_series(0, 9) s(i) ORDER BY i;
postgres=# SELECT id FROM test_tablesample TABLESAMPLE BERNOULLI (80);
id
----
1
3
4
7
8
9
(6 rows)
postgres=# SELECT id FROM test_tablesample TABLESAMPLE BERNOULLI (80);
id
----
0
1
2
3
4
5
6
7
8
9
(10 rows)
So above test yield 60% rows first time and 100% rows next time,
when the test has requested 80%.
I understand that sample percentage will result an approximate
number of rows, however I just wanted that we should check if the
implementation has any problem or not?
In this regard, I have one question related to below code:
+Datum
+tsm_bernoulli_nexttuple(PG_FUNCTION_ARGS)
+{
..
+ /* Every tuple has percent chance of being returned */
+ while (sampler_random_fract(sampler->randstate) > samplesize)
+ {
+ tupoffset++;
+
+ if (tupoffset > maxoffset)
+ break;
+ }
Why are we not considering tuple in above code
if tupoffset is less than maxoffset?
6.
One larger question about the approach used in patch, why do you
think it is better to have a new node (SampleScan/SamplePath) like
you have used in patch instead of doing it as part of Sequence Scan.
I agree that we need some changes in different parts of Sequence Scan
execution, but still sounds like a viable way. One simple thing that
occurs to me is that in ExecSeqScan(), we can use something like
SampleSeqNext instead of SeqNext to scan heap in a slightly different
way, probably doing it as part of sequence scan will have advantage that
we can use most of the existing infrastructure (sequence node path)
and have less discrepancies as mentioned in point-4.
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On 17/01/15 13:46, Amit Kapila wrote:
On Sun, Jan 11, 2015 at 1:29 AM, Petr Jelinek <petr@2ndquadrant.com
<mailto:petr@2ndquadrant.com>> wrote:In second patch which implements the TABLESAMPLE itself I changed the
implementation of random generator because when I looked at the code
again I realized the old one would produce wrong results if there were
multiple TABLESAMPLE statements in same query or multiple cursors in
same transaction.I have looked into this patch and would like to share my
findings with you.
That's a lot for this.
1. +static void +set_tablesample_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte) { .. +/* +* There is only one plan to consider but we still need to set +* parameters for RelOptInfo. +*/ +set_cheapest(rel); }It seems we already call set_cheapest(rel) in set_rel_pathlist()
which is a caller of set_tablesample_rel_pathlist(), so why do
we need it inside set_tablesample_rel_pathlist()?
Ah, this changed after I started working on this patch and I didn't
notice - previously all the set_<something>_rel_pathlist called
set_cheapest() individually. I will change the code.
2. +static void +set_tablesample_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte) { .. +/* We only do sample scan if it was requested */ +add_path(rel, (Path *) create_samplescan_path(root, rel, required_outer)); }Do we need to add_path, if there is only one path?
Good point, we can probably just set the pathlist directly in this case.
3. @@ -332,6 +334,11 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, /* Foreign table */ set_foreign_pathlist(root, rel, rte); } +else if (rte->tablesample != NULL) +{ +/* Build sample scan on relation */ +set_tablesample_rel_pathlist(root, rel, rte); +}Have you consider to have a different RTE for this?
I assume you mean different RTEKind, yes I did, but the fact is that
it's still a relation, just with different scan type and I didn't want
to modify every piece of code which deals with RTE_RELATION to also deal
with the new RTE for sample scan as it seems unnecessary. I am not
strongly opinionated about this one though.
4.
SampleNext()
a.
Isn't it better to code SampleNext() similar to SeqNext() and
IndexNext(), which encapsulate the logic to get the tuple in
another function(tbs_next() or something like that)?
Possibly, my thinking was that unlike the index_getnext() and
heap_getnext(), this function would not be called from any other place
so adding one more layer of abstraction didn't seem useful. And it would
mean creating new ScanDesc struct, etc.
b.
Also don't we want to handle pruning of page while
scanning (heap_page_prune_opt()) and I observed
in heap scan API's after visibility check we do check
for serializable conflict (CheckForSerializableConflictOut()).
Both good points, will add.
Another thing is don't we want to do anything for sync scans
for these method's as they are doing heap scan?
I don't follow this one tbh.
c.
for bernoulli method, it will first get the tupoffset with
the help of function and then apply visibility check, it seems
that could reduce the sample set way lower than expected
in case lot of tuples are not visible, shouldn't it be done in reverse
way(first visibility check, then call function to see if we want to
include it in the sample)?
I think something similar is done in acquire_sample_rows which
seems right to me.
I don't think so, the way bernoulli works is that it returns every tuple
with equal probability, so the visible tuples have same chance of being
returned as the invisible ones so the issue should be smoothed away
automatically (IMHO).
The acquire_sample_rows has limit on number of rows it returns so that's
why it has to do the visibility check before as the problem you
described applies there.
The reason for using the probability instead of tuple limit is the fact
that there is no way to accurately guess number of tuples in the table
at the beginning of scan. This method should actually be better at
returning the correct number of tuples without dependence on how many of
them are visible or not and how much space in blocks is used.
5.
CREATE TABLE test_tablesample (id INT, name text) WITH (fillfactor=10);
INSERT INTO test_tablesample SELECT i, repeat(i::text, 200) FROM
generate_series(0, 9) s(i) ORDER BY i;postgres=# SELECT id FROM test_tablesample TABLESAMPLE BERNOULLI (80);
id
----
1
3
4
7
8
9
(6 rows)postgres=# SELECT id FROM test_tablesample TABLESAMPLE BERNOULLI (80);
id
----
0
1
2
3
4
5
6
7
8
9
(10 rows)So above test yield 60% rows first time and 100% rows next time,
when the test has requested 80%.
I understand that sample percentage will result an approximate
number of rows, however I just wanted that we should check if the
implementation has any problem or not?
I think this is caused by random generator not producing smooth enough
random distribution on such a small sample (or possibly in general?).
In this regard, I have one question related to below code:
+Datum +tsm_bernoulli_nexttuple(PG_FUNCTION_ARGS) +{ .. +/* Every tuple has percent chance of being returned */ +while (sampler_random_fract(sampler->randstate) > samplesize) +{ +tupoffset++; + +if (tupoffset > maxoffset) +break; +}Why are we not considering tuple in above code
if tupoffset is less than maxoffset?
We consider it, I will rename the samplesize to probability or something
to make it more clear what it actually is and maybe expand the comment
above the loop.
What the loop does is that it basically considers each offset on a page
and does "coin flip" - generates random number using
sampler_random_fract and if the random number falls within the
probability (= is smaller or equal to samplesize) it will be returned, the
if (tupoffset > maxoffset)
break;
is there just because we need to give control back to scan so it can
move to another block.
6.
One larger question about the approach used in patch, why do you
think it is better to have a new node (SampleScan/SamplePath) like
you have used in patch instead of doing it as part of Sequence Scan.
I agree that we need some changes in different parts of Sequence Scan
execution, but still sounds like a viable way. One simple thing that
occurs to me is that in ExecSeqScan(), we can use something like
SampleSeqNext instead of SeqNext to scan heap in a slightly different
way, probably doing it as part of sequence scan will have advantage that
we can use most of the existing infrastructure (sequence node path)
and have less discrepancies as mentioned in point-4.
I originally started from SeqScan but well, it requires completely
different State structure, it needs to call sampling methods in various
places (not just for next tuple), it needs different handling in explain
and in optimizer (costing). If we add all this to sequential scan it
will not be that much different from new scan node (we'd save the couple
of new copy functions and one struct, but I'd rather have clearly
distinct scan).
It would also not help with the discrepancies you mentioned because
those are in heapam and SampleScan would not use that even if it was
merged with SeqScan - I don't think we want to implement the sampling on
heapam level, it's too much of a hotspot to be good idea to add
additional cycles there.
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sun, Jan 18, 2015 at 12:46 AM, Petr Jelinek <petr@2ndquadrant.com> wrote:
On 17/01/15 13:46, Amit Kapila wrote:
3. @@ -332,6 +334,11 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, /* Foreign table */ set_foreign_pathlist(root, rel, rte); } +else if (rte->tablesample != NULL) +{ +/* Build sample scan on relation */ +set_tablesample_rel_pathlist(root, rel, rte); +}Have you consider to have a different RTE for this?
I assume you mean different RTEKind, yes I did, but the fact is that it's
still a relation, just with different scan type and I didn't want to modify
every piece of code which deals with RTE_RELATION to also deal with the new
RTE for sample scan as it seems unnecessary. I am not strongly opinionated
about this one though.
No issues, but it seems we should check other paths where
different handling could be required for tablesample scan.
In set_rel_size(), it uses normal path for heapscan (set_plain_rel_size())
for rel size estimates where it checks the presence of partial indexes
and that might effect the size estimates and that doesn't seem to
be required when we have to perform scan based on TableSample
method.
I think we can once check other places where some separate
handling for (rte->inh) is done to see if we need some different handing
for Tablesample scan.
4.
SampleNext()
a.
Isn't it better to code SampleNext() similar to SeqNext() and
IndexNext(), which encapsulate the logic to get the tuple in
another function(tbs_next() or something like that)?Possibly, my thinking was that unlike the index_getnext() and
heap_getnext(), this function would not be called from any other place so
adding one more layer of abstraction didn't seem useful. And it would mean
creating new ScanDesc struct, etc.
I think adding additional abstraction would simplify the function
and reduce the chance of discrepency, I think we need to almost
create a duplicate version of code for heapgetpage() method.
Yeah, I agree that we need to build structure like ScanDesc, but
still it will be worth to keep code consistent.
b.
Also don't we want to handle pruning of page while
scanning (heap_page_prune_opt()) and I observed
in heap scan API's after visibility check we do check
for serializable conflict (CheckForSerializableConflictOut()).Both good points, will add.
Another one is PageIsAllVisible() check.
Another thing is don't we want to do anything for sync scans
for these method's as they are doing heap scan?
I don't follow this one tbh.
Refer parameter (HeapScanDesc->rs_syncscan) and syncscan.c.
Basically during sequiantial scan on same table by different
backends, we attempt to keep them synchronized to reduce the I/O.
c.
for bernoulli method, it will first get the tupoffset with
the help of function and then apply visibility check, it seems
that could reduce the sample set way lower than expected
in case lot of tuples are not visible, shouldn't it be done in reverse
way(first visibility check, then call function to see if we want to
include it in the sample)?
I think something similar is done in acquire_sample_rows which
seems right to me.I don't think so, the way bernoulli works is that it returns every tuple
with equal probability, so the visible tuples have same chance of being
returned as the invisible ones so the issue should be smoothed away
automatically (IMHO).
How will it get smoothen for cases when let us say
more than 50% of tuples are not visible. I think its
due to Postgres non-overwritting storage architecture
that we maintain multiple version of rows in same heap,
otherwise for different kind of architecture (like mysql/oracle)
where multiple row versions are not maintained in same
heap, the same function (with same percentage) can return
entirely different number of rows.
The acquire_sample_rows has limit on number of rows it returns so that's
why it has to do the visibility check before as the problem you described
applies there.
Even if in tablesample method, the argument value is in
percentage that doesn't mean not to give any consideration to
number of rows returned. The only difference I could see is with
tablesample method the number of rows returned will not be accurate
number. I think it is better to consider only visible rows.
The reason for using the probability instead of tuple limit is the fact
that there is no way to accurately guess number of tuples in the table at
the beginning of scan. This method should actually be better at returning
the correct number of tuples without dependence on how many of them are
visible or not and how much space in blocks is used.5.
So above test yield 60% rows first time and 100% rows next time,
when the test has requested 80%.
I understand that sample percentage will result an approximate
number of rows, however I just wanted that we should check if the
implementation has any problem or not?I think this is caused by random generator not producing smooth enough
random distribution on such a small sample (or possibly in general?).
Yeah it could be possible, I feel we should check with large
sample of rows to see if there is any major difference?
In this regard, I have one question related to below code:
Why are we not considering tuple in above code
if tupoffset is less than maxoffset?We consider it, I will rename the samplesize to probability or something
to make it more clear what it actually is and maybe expand the comment
above the loop.
Makes sense.
6.
One larger question about the approach used in patch, why do you
think it is better to have a new node (SampleScan/SamplePath) like
you have used in patch instead of doing it as part of Sequence Scan.
I agree that we need some changes in different parts of Sequence Scan
execution, but still sounds like a viable way. One simple thing that
occurs to me is that in ExecSeqScan(), we can use something like
SampleSeqNext instead of SeqNext to scan heap in a slightly different
way, probably doing it as part of sequence scan will have advantage that
we can use most of the existing infrastructure (sequence node path)
and have less discrepancies as mentioned in point-4.I originally started from SeqScan but well, it requires completely
different State structure, it needs to call sampling methods in various
places (not just for next tuple), it needs different handling in explain
and in optimizer (costing). If we add all this to sequential scan it will
not be that much different from new scan node (we'd save the couple of new
copy functions and one struct, but I'd rather have clearly distinct scan).
I understand that point, but I think it is worth considering if
it can be done as SeqScan node especially because plan node
doesn't need to store any additional information for sample scan.
I think this point needs some more thoughts and if none of us
could come with a clean way to do it via seqscan node then we can
proceed with a separate node idea.
Another reason why I am asking to consider it is that after
spending effort on further review and making the current approach
robust, it should not happen that someone else (probably one
of the committers) should say that it can be done with sequence scan
node without much problems.
It would also not help with the discrepancies you mentioned because those
are in heapam and SampleScan would not use that even if it was merged with
SeqScan - I don't think we want to implement the sampling on heapam level,
it's too much of a hotspot to be good idea to add additional cycles there.
I have no intention in adding more cycles to heap layer, rather
try to use some of the existing API's if possible.
One another separate observation:
typedef struct SamplePath
{
Path path;
Oid tsmcost; /*
table sample method costing function */
List *tsmargs; /* arguments to a TABLESAMPLE clause
*/
} SamplePath;
a.
Do we really need to have tsmcost and tsmargs stored in SamplePath
when we don't want to maintain them in plan (SamplePlan), patch gets
all the info via RTE in executor, so it seems to me we can do without
them.
b.
* SamplePath represents a sample sacn
typo /sacn/scan
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On 19/01/15 07:08, Amit Kapila wrote:
On Sun, Jan 18, 2015 at 12:46 AM, Petr Jelinek <petr@2ndquadrant.com
<mailto:petr@2ndquadrant.com>> wrote:
No issues, but it seems we should check other paths where
different handling could be required for tablesample scan.
In set_rel_size(), it uses normal path for heapscan (set_plain_rel_size())
for rel size estimates where it checks the presence of partial indexes
and that might effect the size estimates and that doesn't seem to
be required when we have to perform scan based on TableSample
method.
I think that's actually good to have, because we still do costing and
the partial index might help produce better estimate of number of
returned rows for tablesample as well.
4.
SampleNext()
a.
Isn't it better to code SampleNext() similar to SeqNext() and
IndexNext(), which encapsulate the logic to get the tuple in
another function(tbs_next() or something like that)?Possibly, my thinking was that unlike the index_getnext() and
heap_getnext(), this function would not be called from any other
place so adding one more layer of abstraction didn't seem useful.
And it would mean creating new ScanDesc struct, etc.I think adding additional abstraction would simplify the function
and reduce the chance of discrepency, I think we need to almost
create a duplicate version of code for heapgetpage() method.
Yeah, I agree that we need to build structure like ScanDesc, but
still it will be worth to keep code consistent.
Well similar, not same as we are not always fetching whole page or doing
visibility checks on all tuples in the page, etc. But I don't see why it
can't be inside nodeSamplescan. If you look at bitmap heap scan, that
one is also essentially somewhat modified sequential scan and does
everything within the node nodeBitmapHeapscan because the algorithm is
not used anywhere else, just like sample scan.
Another one is PageIsAllVisible() check.
Done.
Another thing is don't we want to do anything for sync scans
for these method's as they are doing heap scan?I don't follow this one tbh.
Refer parameter (HeapScanDesc->rs_syncscan) and syncscan.c.
Basically during sequiantial scan on same table by different
backends, we attempt to keep them synchronized to reduce the I/O.
Ah this, yes, it makes sense for bernoulli, not for system though. I
guess it should be used for sampling methods that use SAS_SEQUENTIAL
strategy.
c.
for bernoulli method, it will first get the tupoffset with
the help of function and then apply visibility check, it seems
that could reduce the sample set way lower than expected
in case lot of tuples are not visible, shouldn't it be done in
reverse
way(first visibility check, then call function to see if we want to
include it in the sample)?
I think something similar is done in acquire_sample_rows which
seems right to me.I don't think so, the way bernoulli works is that it returns every
tuple with equal probability, so the visible tuples have same chance
of being returned as the invisible ones so the issue should be
smoothed away automatically (IMHO).How will it get smoothen for cases when let us say
more than 50% of tuples are not visible. I think its
due to Postgres non-overwritting storage architecture
that we maintain multiple version of rows in same heap,
otherwise for different kind of architecture (like mysql/oracle)
where multiple row versions are not maintained in same
heap, the same function (with same percentage) can return
entirely different number of rows.
I don't know how else to explain, if we loop over every tuple in the
relation and return it with equal probability then visibility checks
don't matter as the percentage of visible tuples will be same in the
result as in the relation.
For example if you have 30% visible tuples and you are interested in 10%
of tuples overall it will return 10% of all tuples and since every tuple
has 10% chance of being returned, 30% of those should be visible (if we
assume smooth distribution of random numbers generated). So in the end
you are getting 10% of visible tuples because the scan will obviously
skip the invisible ones and that's what you wanted.
As I said problem of analyze is that it uses tuple limit instead of
probability.
5.
So above test yield 60% rows first time and 100% rows next time,
when the test has requested 80%.
I understand that sample percentage will result an approximate
number of rows, however I just wanted that we should check if the
implementation has any problem or not?I think this is caused by random generator not producing smooth
enough random distribution on such a small sample (or possibly in
general?).Yeah it could be possible, I feel we should check with large
sample of rows to see if there is any major difference?In this regard, I have one question related to below code:
Why are we not considering tuple in above code
if tupoffset is less than maxoffset?We consider it, I will rename the samplesize to probability or
something to make it more clear what it actually is and maybe expand
the comment above the loop.
Yes the differences is smaller (in relative numbers) for bigger tables
when I test this. On 1k row table the spread of 20 runs was between
770-818 and on 100k row table it's between 79868-80154. I think that's
acceptable variance and it's imho indeed the random generator that
produces this.
Oh and BTW when I delete 50k of tuples (make them invisible) the results
of 20 runs are between 30880 and 40063 rows.
I originally started from SeqScan but well, it requires completely
different State structure, it needs to call sampling methods in
various places (not just for next tuple), it needs different
handling in explain and in optimizer (costing). If we add all this
to sequential scan it will not be that much different from new scan
node (we'd save the couple of new copy functions and one struct, but
I'd rather have clearly distinct scan).I understand that point, but I think it is worth considering if
it can be done as SeqScan node especially because plan node
doesn't need to store any additional information for sample scan.I think this point needs some more thoughts and if none of us
could come with a clean way to do it via seqscan node then we can
proceed with a separate node idea.Another reason why I am asking to consider it is that after
spending effort on further review and making the current approach
robust, it should not happen that someone else (probably one
of the committers) should say that it can be done with sequence scan
node without much problems.
I am sure it could be done with sequence scan. Not sure if it would be
pretty and more importantly, the TABLESAMPLE is *not* sequential scan.
The fact that bernoulli method looks similar should not make us assume
that every other method does as well, especially when system method is
completely different.
One another separate observation:
typedef struct SamplePath
{
Pathpath;
Oidtsmcost;/*
table sample method costing function */
List *tsmargs;/* arguments to a TABLESAMPLE clause
*/
} SamplePath;a.
Do we really need to have tsmcost and tsmargs stored in SamplePath
when we don't want to maintain them in plan (SamplePlan), patch gets
all the info via RTE in executor, so it seems to me we can do without
them.
Hmm yeah, we actually don't, I removed it.
Anyway, attached is new version with some updates that you mentioned
(all_visible, etc).
I also added optional interface for the sampling method to access the
tuple contents as I can imagine sampling methods that will want to do
that. And I updated the test/sample module to use this new api.
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
0003-tablesample-ddl-v3.patchtext/x-diff; name=0003-tablesample-ddl-v3.patchDownload
>From 78afb16082939aa284613a1d57f869a926decaf6 Mon Sep 17 00:00:00 2001
From: Petr Jelinek <pjmodos@pjmodos.net>
Date: Wed, 7 Jan 2015 23:51:44 +0100
Subject: [PATCH 3/3] tablesample-ddl v3
---
doc/src/sgml/ref/allfiles.sgml | 2 +
doc/src/sgml/ref/create_tablesamplemethod.sgml | 161 ++++++++
doc/src/sgml/ref/drop_tablesamplemethod.sgml | 87 +++++
doc/src/sgml/reference.sgml | 2 +
src/backend/catalog/dependency.c | 15 +-
src/backend/catalog/objectaddress.c | 65 +++-
src/backend/commands/Makefile | 6 +-
src/backend/commands/dropcmds.c | 4 +
src/backend/commands/event_trigger.c | 3 +
src/backend/commands/tablecmds.c | 1 +
src/backend/commands/tablesample.c | 419 +++++++++++++++++++++
src/backend/parser/gram.y | 14 +-
src/backend/tcop/utility.c | 12 +
src/include/catalog/dependency.h | 1 +
src/include/catalog/pg_tablesample_method.h | 11 +
src/include/nodes/parsenodes.h | 1 +
src/include/parser/kwlist.h | 1 +
src/test/modules/Makefile | 3 +-
src/test/modules/tablesample/.gitignore | 4 +
src/test/modules/tablesample/Makefile | 21 ++
.../modules/tablesample/expected/tablesample.out | 38 ++
src/test/modules/tablesample/sql/tablesample.sql | 14 +
src/test/modules/tablesample/tsm_test--1.0.sql | 50 +++
src/test/modules/tablesample/tsm_test.c | 218 +++++++++++
src/test/modules/tablesample/tsm_test.control | 5 +
25 files changed, 1150 insertions(+), 8 deletions(-)
create mode 100644 doc/src/sgml/ref/create_tablesamplemethod.sgml
create mode 100644 doc/src/sgml/ref/drop_tablesamplemethod.sgml
create mode 100644 src/backend/commands/tablesample.c
create mode 100644 src/test/modules/tablesample/.gitignore
create mode 100644 src/test/modules/tablesample/Makefile
create mode 100644 src/test/modules/tablesample/expected/tablesample.out
create mode 100644 src/test/modules/tablesample/sql/tablesample.sql
create mode 100644 src/test/modules/tablesample/tsm_test--1.0.sql
create mode 100644 src/test/modules/tablesample/tsm_test.c
create mode 100644 src/test/modules/tablesample/tsm_test.control
diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index 7aa3128..2fad084 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -78,6 +78,7 @@ Complete list of usable sgml source files in this directory.
<!ENTITY createServer SYSTEM "create_server.sgml">
<!ENTITY createTable SYSTEM "create_table.sgml">
<!ENTITY createTableAs SYSTEM "create_table_as.sgml">
+<!ENTITY createTablesampleMethod SYSTEM "create_tablesamplemethod.sgml">
<!ENTITY createTableSpace SYSTEM "create_tablespace.sgml">
<!ENTITY createTrigger SYSTEM "create_trigger.sgml">
<!ENTITY createTSConfig SYSTEM "create_tsconfig.sgml">
@@ -119,6 +120,7 @@ Complete list of usable sgml source files in this directory.
<!ENTITY dropSequence SYSTEM "drop_sequence.sgml">
<!ENTITY dropServer SYSTEM "drop_server.sgml">
<!ENTITY dropTable SYSTEM "drop_table.sgml">
+<!ENTITY dropTablesampleMethod SYSTEM "drop_tablesamplemethod.sgml">
<!ENTITY dropTableSpace SYSTEM "drop_tablespace.sgml">
<!ENTITY dropTrigger SYSTEM "drop_trigger.sgml">
<!ENTITY dropTSConfig SYSTEM "drop_tsconfig.sgml">
diff --git a/doc/src/sgml/ref/create_tablesamplemethod.sgml b/doc/src/sgml/ref/create_tablesamplemethod.sgml
new file mode 100644
index 0000000..566522e
--- /dev/null
+++ b/doc/src/sgml/ref/create_tablesamplemethod.sgml
@@ -0,0 +1,161 @@
+<!--
+doc/src/sgml/ref/create_tablesamplemethod.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="SQL-CREATETABLESAMPLEMETHOD">
+ <indexterm zone="sql-createtablesamplemethod">
+ <primary>CREATE TABLESAMPLE METHOD</primary>
+ </indexterm>
+
+ <refmeta>
+ <refentrytitle>CREATE TABLESAMPLE METHOD</refentrytitle>
+ <manvolnum>7</manvolnum>
+ <refmiscinfo>SQL - Language Statements</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+ <refname>CREATE TABLESAMPLE METHOD</refname>
+ <refpurpose>define custom tablesample method</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+<synopsis>
+CREATE TABLESAMPLE METHOD <replaceable class="parameter">name</replaceable> (
+ INIT = <replaceable class="parameter">init_function</replaceable> ,
+ NEXTBLOCK = <replaceable class="parameter">nextblock_function</replaceable> ,
+ NEXTTUPLE = <replaceable class="parameter">nexttuple_function</replaceable> ,
+ [ RETURNTUPLE = <replaceable class="parameter">returntuple_function</replaceable> , ]
+ END = <replaceable class="parameter">end_function</replaceable> ,
+ RESET = <replaceable class="parameter">reset_function</replaceable> ,
+ COST = <replaceable class="parameter">cost_function</replaceable>
+)
+</synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+ <title>Description</title>
+
+ <para>
+ <command>CREATE TABLESAMPLE METHOD</command> creates a tablesample method.
+ A tablesample method provides alrorithm for reading sample part of a table
+ when used in <command>TABLESAMPLE</> clause of a <command>SELECT</>
+ statement.
+ </para>
+
+ <para>
+ You must be a superuser to use <command>CREATE TABLESAMPLE METHOD</command>.
+ </para>
+ </refsect1>
+
+ <refsect1>
+ <title>Parameters</title>
+
+ <variablelist>
+ <varlistentry>
+ <term><replaceable class="parameter">name</replaceable></term>
+ <listitem>
+ <para>
+ The name of the tablesample method to be created. This name must be
+ unique within the database.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable class="parameter">init_function</replaceable></term>
+ <listitem>
+ <para>
+ The name of the init function for the tablesample method.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable class="parameter">nextblock_function</replaceable></term>
+ <listitem>
+ <para>
+ The name of the get-next-block function for the tablesample method.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable class="parameter">nexttuple_function</replaceable></term>
+ <listitem>
+ <para>
+ The name of the get-next-tuple function for the tablesample method.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable class="parameter">returntuple_function</replaceable></term>
+ <listitem>
+ <para>
+ The name of the function for inspecting the tuple contents in orde
+ to make decision if it should be returned or not. This parameter
+ is optional.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable class="parameter">end_function</replaceable></term>
+ <listitem>
+ <para>
+ The name of the end function for the tablesample method.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable class="parameter">reset_function</replaceable></term>
+ <listitem>
+ <para>
+ The name of the reset function for the tablesample method.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable class="parameter">cost_function</replaceable></term>
+ <listitem>
+ <para>
+ The name of the costing function for the tablesample method.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ </variablelist>
+
+ <para>
+ The function names can be schema-qualified if necessary. Argument types
+ are not given, since the argument list for each type of function is
+ predetermined. All functions are required.
+ </para>
+
+ <para>
+ The arguments can appear in any order, not only the one shown above.
+ </para>
+ </refsect1>
+
+ <refsect1>
+ <title>Compatibility</title>
+
+ <para>
+ There is no
+ <command>CREATE TABLESAMPLE METHOD</command> statement in the SQL
+ standard.
+ </para>
+ </refsect1>
+
+ <refsect1>
+ <title>See Also</title>
+
+ <simplelist type="inline">
+ <member><xref linkend="sql-droptablesamplemethod"></member>
+ <member><xref linkend="sql-select"></member>
+ </simplelist>
+ </refsect1>
+</refentry>
diff --git a/doc/src/sgml/ref/drop_tablesamplemethod.sgml b/doc/src/sgml/ref/drop_tablesamplemethod.sgml
new file mode 100644
index 0000000..dffd2ec
--- /dev/null
+++ b/doc/src/sgml/ref/drop_tablesamplemethod.sgml
@@ -0,0 +1,87 @@
+<!--
+doc/src/sgml/ref/drop_tablesamplemethod.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="SQL-DROPTABLESAMPLEMETHOD">
+ <indexterm zone="sql-droptablesamplemethod">
+ <primary>DROP TABLESAMPLE METHOD</primary>
+ </indexterm>
+
+ <refmeta>
+ <refentrytitle>DROP TABLESAMPLE METHOD</refentrytitle>
+ <manvolnum>7</manvolnum>
+ <refmiscinfo>SQL - Language Statements</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+ <refname>DROP TABLESAMPLE METHOD</refname>
+ <refpurpose>remove a custom tablesample method</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+<synopsis>
+DROP TABLESAMPLE METHOD [ IF EXISTS ] <replaceable class="parameter">name</replaceable>
+</synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+ <title>Description</title>
+
+ <para>
+ <command>DROP TABLESAMPLE METHOD</command> drop an existing tablesample
+ method.
+ </para>
+
+ <para>
+ You must be a superuser to use <command>CREATE TABLESAMPLE METHOD</command>.
+ </para>
+ </refsect1>
+
+ <refsect1>
+ <title>Parameters</title>
+
+ <variablelist>
+
+ <varlistentry>
+ <term><literal>IF EXISTS</literal></term>
+ <listitem>
+ <para>
+ Do not throw an error if the tablesample method does not exist.
+ A notice is issued in this case.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable class="parameter">name</replaceable></term>
+ <listitem>
+ <para>
+ The name of an existing tablesample method to be removed.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ </variablelist>
+
+ </refsect1>
+
+ <refsect1>
+ <title>Compatibility</title>
+
+ <para>
+ There is no
+ <command>DROP TABLESAMPLE METHOD</command> statement in the SQL
+ standard.
+ </para>
+ </refsect1>
+
+ <refsect1>
+ <title>See Also</title>
+
+ <simplelist type="inline">
+ <member><xref linkend="sql-createtablesamplemethod"></member>
+ <member><xref linkend="sql-select"></member>
+ </simplelist>
+ </refsect1>
+</refentry>
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index 10c9a6d..2c09a3c 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -106,6 +106,7 @@
&createServer;
&createTable;
&createTableAs;
+ &createTablesampleMethod;
&createTableSpace;
&createTSConfig;
&createTSDictionary;
@@ -147,6 +148,7 @@
&dropSequence;
&dropServer;
&dropTable;
+ &dropTablesampleMethod;
&dropTableSpace;
&dropTSConfig;
&dropTSDictionary;
diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index bacb242..6acb5b3 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -46,6 +46,7 @@
#include "catalog/pg_policy.h"
#include "catalog/pg_proc.h"
#include "catalog/pg_rewrite.h"
+#include "catalog/pg_tablesample_method.h"
#include "catalog/pg_tablespace.h"
#include "catalog/pg_trigger.h"
#include "catalog/pg_ts_config.h"
@@ -157,7 +158,8 @@ static const Oid object_classes[MAX_OCLASS] = {
DefaultAclRelationId, /* OCLASS_DEFACL */
ExtensionRelationId, /* OCLASS_EXTENSION */
EventTriggerRelationId, /* OCLASS_EVENT_TRIGGER */
- PolicyRelationId /* OCLASS_POLICY */
+ PolicyRelationId, /* OCLASS_POLICY */
+ TableSampleMethodRelationId /* OCLASS_TABLESAMPLEMETHOD */
};
@@ -1265,6 +1267,10 @@ doDeletion(const ObjectAddress *object, int flags)
RemovePolicyById(object->objectId);
break;
+ case OCLASS_TABLESAMPLEMETHOD:
+ RemoveTablesampleMethodById(object->objectId);
+ break;
+
default:
elog(ERROR, "unrecognized object class: %u",
object->classId);
@@ -1794,6 +1800,10 @@ find_expr_references_walker(Node *node,
case RTE_RELATION:
add_object_address(OCLASS_CLASS, rte->relid, 0,
context->addrs);
+ if (rte->tablesample)
+ add_object_address(OCLASS_TABLESAMPLEMETHOD,
+ rte->tablesample->tsmid, 0,
+ context->addrs);
break;
default:
break;
@@ -2373,6 +2383,9 @@ getObjectClass(const ObjectAddress *object)
case PolicyRelationId:
return OCLASS_POLICY;
+
+ case TableSampleMethodRelationId:
+ return OCLASS_TABLESAMPLEMETHOD;
}
/* shouldn't get here */
diff --git a/src/backend/catalog/objectaddress.c b/src/backend/catalog/objectaddress.c
index 825d8b2..02edc0a 100644
--- a/src/backend/catalog/objectaddress.c
+++ b/src/backend/catalog/objectaddress.c
@@ -44,6 +44,7 @@
#include "catalog/pg_proc.h"
#include "catalog/pg_policy.h"
#include "catalog/pg_rewrite.h"
+#include "catalog/pg_tablesample_method.h"
#include "catalog/pg_tablespace.h"
#include "catalog/pg_trigger.h"
#include "catalog/pg_ts_config.h"
@@ -429,7 +430,19 @@ static const ObjectPropertyType ObjectProperty[] =
Anum_pg_type_typacl,
ACL_KIND_TYPE,
true
- }
+ },
+ {
+ TableSampleMethodRelationId,
+ TableSampleMethodOidIndexId,
+ TABLESAMPLEMETHODOID,
+ TABLESAMPLEMETHODNAME,
+ Anum_pg_tablesample_method_tsmname,
+ InvalidAttrNumber,
+ InvalidAttrNumber,
+ InvalidAttrNumber,
+ -1,
+ true
+ },
};
/*
@@ -528,7 +541,9 @@ ObjectTypeMap[] =
/* OCLASS_EVENT_TRIGGER */
{ "event trigger", OBJECT_EVENT_TRIGGER },
/* OCLASS_POLICY */
- { "policy", OBJECT_POLICY }
+ { "policy", OBJECT_POLICY },
+ /* OCLASS_TABLESAMPLEMETHOD */
+ { "tablesample method", OBJECT_TABLESAMPLEMETHOD }
};
@@ -670,6 +685,7 @@ get_object_address(ObjectType objtype, List *objname, List *objargs,
case OBJECT_FDW:
case OBJECT_FOREIGN_SERVER:
case OBJECT_EVENT_TRIGGER:
+ case OBJECT_TABLESAMPLEMETHOD:
address = get_object_address_unqualified(objtype,
objname, missing_ok);
break;
@@ -896,6 +912,9 @@ get_object_address_unqualified(ObjectType objtype,
case OBJECT_EVENT_TRIGGER:
msg = gettext_noop("event trigger name cannot be qualified");
break;
+ case OBJECT_TABLESAMPLEMETHOD:
+ msg = gettext_noop("tablesample method name cannot be qualified");
+ break;
default:
elog(ERROR, "unrecognized objtype: %d", (int) objtype);
msg = NULL; /* placate compiler */
@@ -956,6 +975,11 @@ get_object_address_unqualified(ObjectType objtype,
address.objectId = get_event_trigger_oid(name, missing_ok);
address.objectSubId = 0;
break;
+ case OBJECT_TABLESAMPLEMETHOD:
+ address.classId = TableSampleMethodRelationId;
+ address.objectId = get_tablesample_method_oid(name, missing_ok);
+ address.objectSubId = 0;
+ break;
default:
elog(ERROR, "unrecognized objtype: %d", (int) objtype);
/* placate compiler, which doesn't know elog won't return */
@@ -1720,6 +1744,7 @@ check_object_ownership(Oid roleid, ObjectType objtype, ObjectAddress address,
break;
case OBJECT_TSPARSER:
case OBJECT_TSTEMPLATE:
+ case OBJECT_TABLESAMPLEMETHOD:
/* We treat these object types as being owned by superusers */
if (!superuser_arg(roleid))
ereport(ERROR,
@@ -2654,6 +2679,21 @@ getObjectDescription(const ObjectAddress *object)
break;
}
+ case OCLASS_TABLESAMPLEMETHOD:
+ {
+ HeapTuple tup;
+
+ tup = SearchSysCache1(TABLESAMPLEMETHODOID,
+ ObjectIdGetDatum(object->objectId));
+ if (!HeapTupleIsValid(tup))
+ elog(ERROR, "cache lookup failed for tablesample method %u",
+ object->objectId);
+ appendStringInfo(&buffer, _("tablesample method %s"),
+ NameStr(((Form_pg_tablesample_method) GETSTRUCT(tup))->tsmname));
+ ReleaseSysCache(tup);
+ break;
+ }
+
default:
appendStringInfo(&buffer, "unrecognized object %u %u %d",
object->classId,
@@ -3131,6 +3171,10 @@ getObjectTypeDescription(const ObjectAddress *object)
appendStringInfoString(&buffer, "policy");
break;
+ case OCLASS_TABLESAMPLEMETHOD:
+ appendStringInfoString(&buffer, "tablesample method");
+ break;
+
default:
appendStringInfo(&buffer, "unrecognized %u", object->classId);
break;
@@ -4025,6 +4069,23 @@ getObjectIdentityParts(const ObjectAddress *object,
break;
}
+ case OCLASS_TABLESAMPLEMETHOD:
+ {
+ HeapTuple tup;
+ Form_pg_tablesample_method tsmForm;
+
+ tup = SearchSysCache1(TABLESAMPLEMETHODOID,
+ ObjectIdGetDatum(object->objectId));
+ if (!HeapTupleIsValid(tup))
+ elog(ERROR, "cache lookup failed for tablesample method %u",
+ object->objectId);
+ tsmForm = (Form_pg_tablesample_method) GETSTRUCT(tup);
+ appendStringInfoString(&buffer,
+ quote_identifier(NameStr(tsmForm->tsmname)));
+ ReleaseSysCache(tup);
+ break;
+ }
+
default:
appendStringInfo(&buffer, "unrecognized object %u %u %d",
object->classId,
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index b1ac704..04fcd8c 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -18,8 +18,8 @@ OBJS = aggregatecmds.o alter.o analyze.o async.o cluster.o comment.o \
event_trigger.o explain.o extension.o foreigncmds.o functioncmds.o \
indexcmds.o lockcmds.o matview.o operatorcmds.o opclasscmds.o \
policy.o portalcmds.o prepare.o proclang.o \
- schemacmds.o seclabel.o sequence.o tablecmds.o tablespace.o trigger.o \
- tsearchcmds.o typecmds.o user.o vacuum.o vacuumlazy.o \
- variable.o view.o
+ schemacmds.o seclabel.o sequence.o tablecmds.o tablesample.o \
+ tablespace.o trigger.o tsearchcmds.o typecmds.o user.o vacuum.o \
+ vacuumlazy.o variable.o view.o
include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/commands/dropcmds.c b/src/backend/commands/dropcmds.c
index e5185ba..04d29a2 100644
--- a/src/backend/commands/dropcmds.c
+++ b/src/backend/commands/dropcmds.c
@@ -421,6 +421,10 @@ does_not_exist_skipping(ObjectType objtype, List *objname, List *objargs)
args = strVal(linitial(objargs));
}
break;
+ case OBJECT_TABLESAMPLEMETHOD:
+ msg = gettext_noop("tablesample method \"%s\" does not exist, skipping");
+ name = NameListToString(objname);
+ break;
default:
elog(ERROR, "unexpected object type (%d)", (int) objtype);
break;
diff --git a/src/backend/commands/event_trigger.c b/src/backend/commands/event_trigger.c
index a33a5ad..f20e9f7 100644
--- a/src/backend/commands/event_trigger.c
+++ b/src/backend/commands/event_trigger.c
@@ -97,6 +97,7 @@ static event_trigger_support_data event_trigger_support[] = {
{"SEQUENCE", true},
{"SERVER", true},
{"TABLE", true},
+ {"TABLESAMPLE METHOD", true},
{"TABLESPACE", false},
{"TRIGGER", true},
{"TEXT SEARCH CONFIGURATION", true},
@@ -1078,6 +1079,7 @@ EventTriggerSupportsObjectType(ObjectType obtype)
case OBJECT_SEQUENCE:
case OBJECT_TABCONSTRAINT:
case OBJECT_TABLE:
+ case OBJECT_TABLESAMPLEMETHOD:
case OBJECT_TRIGGER:
case OBJECT_TSCONFIGURATION:
case OBJECT_TSDICTIONARY:
@@ -1134,6 +1136,7 @@ EventTriggerSupportsObjectClass(ObjectClass objclass)
case OCLASS_DEFACL:
case OCLASS_EXTENSION:
case OCLASS_POLICY:
+ case OCLASS_TABLESAMPLEMETHOD:
return true;
case MAX_OCLASS:
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 66d5083..b67c560 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -8059,6 +8059,7 @@ ATExecAlterColumnType(AlteredTableInfo *tab, Relation rel,
case OCLASS_USER_MAPPING:
case OCLASS_DEFACL:
case OCLASS_EXTENSION:
+ case OCLASS_TABLESAMPLEMETHOD:
/*
* We don't expect any of these sorts of objects to depend on
diff --git a/src/backend/commands/tablesample.c b/src/backend/commands/tablesample.c
new file mode 100644
index 0000000..16882d4
--- /dev/null
+++ b/src/backend/commands/tablesample.c
@@ -0,0 +1,419 @@
+/*-------------------------------------------------------------------------
+ *
+ * tablesample.c
+ * Commands to manipulate tablesample methods
+ *
+ * Table sampling methods provide algorithms for doing sample scan over
+ * the table.
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ * src/backend/commands/tablesample.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <unistd.h>
+#include <dirent.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "catalog/dependency.h"
+#include "catalog/indexing.h"
+#include "catalog/objectaccess.h"
+#include "catalog/pg_proc.h"
+#include "catalog/pg_tablesample_method.h"
+#include "catalog/pg_type.h"
+#include "commands/defrem.h"
+#include "miscadmin.h"
+#include "parser/parse_func.h"
+#include "utils/builtins.h"
+#include "utils/rel.h"
+#include "utils/lsyscache.h"
+#include "utils/syscache.h"
+
+
+static Datum
+get_tabmesample_method_func(DefElem *defel, int attnum)
+{
+ List *funcName = defGetQualifiedName(defel);
+ /* Big enough size for our needs. */
+ Oid *typeId = palloc0(7 * sizeof(Oid));
+ Oid retTypeId;
+ int nargs;
+ Oid procOid = InvalidOid;
+ FuncCandidateList clist;
+
+ switch (attnum)
+ {
+ case Anum_pg_tablesample_method_tsminit:
+ /*
+ * tsminit needs special handling because it is defined as function
+ * with 3 or more arguments and only first two arguments must have
+ * specific type, the rest is up to the tablesample method creator.
+ */
+ {
+ nargs = 2;
+ typeId[0] = INTERNALOID;
+ typeId[1] = INT4OID;
+ retTypeId = VOIDOID;
+
+ clist = FuncnameGetCandidates(funcName, -1, NIL, false, false, false);
+
+ while (clist)
+ {
+ if (clist->nargs >= 3 &&
+ memcmp(typeId, clist->args, nargs * sizeof(Oid)) == 0)
+ {
+ procOid = clist->oid;
+ /* Save real function signature for future errors. */
+ nargs = clist->nargs;
+ pfree(typeId);
+ typeId = clist->args;
+ break;
+ }
+ clist = clist->next;
+ }
+
+ if (!OidIsValid(procOid))
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_FUNCTION),
+ errmsg("function \"%s\" does not exist or does not have valid signature",
+ NameListToString(funcName)),
+ errhint("The tamplesample method init function "
+ "must have at least 3 input parameters "
+ "with first one of type INTERNAL and second of type INTEGER.")));
+ }
+ break;
+
+ case Anum_pg_tablesample_method_tsmnextblock:
+ nargs = 2;
+ typeId[0] = INTERNALOID;
+ typeId[1] = BOOLOID;
+ retTypeId = INT4OID;
+ break;
+ case Anum_pg_tablesample_method_tsmnexttuple:
+ nargs = 4;
+ typeId[0] = INTERNALOID;
+ typeId[1] = INT4OID;
+ typeId[2] = INT2OID;
+ typeId[3] = BOOLOID;
+ retTypeId = INT2OID;
+ break;
+ case Anum_pg_tablesample_method_tsmreturntuple:
+ nargs = 4;
+ typeId[0] = INTERNALOID;
+ typeId[1] = INT4OID;
+ typeId[2] = INTERNALOID;
+ typeId[3] = BOOLOID;
+ retTypeId = BOOLOID;
+ break;
+ case Anum_pg_tablesample_method_tsmend:
+ case Anum_pg_tablesample_method_tsmreset:
+ nargs = 1;
+ typeId[0] = INTERNALOID;
+ retTypeId = VOIDOID;
+ break;
+ case Anum_pg_tablesample_method_tsmcost:
+ nargs = 7;
+ typeId[0] = INTERNALOID;
+ typeId[1] = INTERNALOID;
+ typeId[2] = INTERNALOID;
+ typeId[3] = INTERNALOID;
+ typeId[4] = INTERNALOID;
+ typeId[5] = INTERNALOID;
+ typeId[6] = INTERNALOID;
+ retTypeId = VOIDOID;
+ break;
+ default:
+ /* should not be here */
+ elog(ERROR, "unrecognized attribute for tablesample method: %d",
+ attnum);
+ nargs = 0; /* keep compiler quiet */
+ }
+
+ if (!OidIsValid(procOid))
+ procOid = LookupFuncName(funcName, nargs, typeId, false);
+ if (get_func_rettype(procOid) != retTypeId)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("function %s should return type %s",
+ func_signature_string(funcName, nargs, NIL, typeId),
+ format_type_be(retTypeId))));
+
+ return ObjectIdGetDatum(procOid);
+}
+
+/*
+ * make pg_depend entries for a new pg_tablesample_method entry
+ */
+static void
+makeTablesampleMethodDeps(HeapTuple tuple)
+{
+ Form_pg_tablesample_method tsm = (Form_pg_tablesample_method) GETSTRUCT(tuple);
+ ObjectAddress myself,
+ referenced;
+
+ myself.classId = TableSampleMethodRelationId;
+ myself.objectId = HeapTupleGetOid(tuple);
+ myself.objectSubId = 0;
+
+ /* dependency on extension */
+ recordDependencyOnCurrentExtension(&myself, false);
+
+ /* dependencies on functions */
+ referenced.classId = ProcedureRelationId;
+ referenced.objectSubId = 0;
+
+ referenced.objectId = tsm->tsminit;
+ recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);
+
+ referenced.objectId = tsm->tsmnextblock;
+ recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);
+
+ referenced.objectId = tsm->tsmnexttuple;
+ recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);
+
+ if (OidIsValid(tsm->tsmreturntuple))
+ {
+ referenced.objectId = tsm->tsmreturntuple;
+ recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);
+ }
+
+ referenced.objectId = tsm->tsmend;
+ recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);
+
+ referenced.objectId = tsm->tsmreset;
+ recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);
+
+ referenced.objectId = tsm->tsmcost;
+ recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);
+}
+
+/*
+ * Create a table sampling method
+ *
+ * Only superusers can create a table sampling methods.
+ */
+Oid
+DefineTablesampleMethod(List *names, List *parameters)
+{
+ char *tsmname = strVal(linitial(names));
+ Oid tsmoid;
+ ListCell *pl;
+ Relation rel;
+ Datum values[Natts_pg_tablesample_method];
+ bool nulls[Natts_pg_tablesample_method];
+ HeapTuple tuple;
+
+ /* Must be super user. */
+ if (!superuser())
+ ereport(ERROR,
+ (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+ errmsg("permission denied to create tablesample method \"%s\"",
+ tsmname),
+ errhint("Must be superuser to create a tablesample method.")));
+
+ /* Must not already exist. */
+ tsmoid = get_tablesample_method_oid(tsmname, true);
+ if (OidIsValid(tsmoid))
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_OBJECT),
+ errmsg("tablesample method \"%s\" already exists",
+ tsmname)));
+
+ /* Initialize the values. */
+ memset(values, 0, sizeof(values));
+ memset(nulls, false, sizeof(nulls));
+
+ values[Anum_pg_tablesample_method_tsmname - 1] =
+ DirectFunctionCall1(namein, CStringGetDatum(tsmname));
+
+ /*
+ * loop over the definition list and extract the information we need.
+ */
+ foreach(pl, parameters)
+ {
+ DefElem *defel = (DefElem *) lfirst(pl);
+
+ if (pg_strcasecmp(defel->defname, "init") == 0)
+ {
+ values[Anum_pg_tablesample_method_tsminit - 1] =
+ get_tabmesample_method_func(defel,
+ Anum_pg_tablesample_method_tsminit);
+ }
+ else if (pg_strcasecmp(defel->defname, "nextblock") == 0)
+ {
+ values[Anum_pg_tablesample_method_tsmnextblock - 1] =
+ get_tabmesample_method_func(defel,
+ Anum_pg_tablesample_method_tsmnextblock);
+ }
+ else if (pg_strcasecmp(defel->defname, "nexttuple") == 0)
+ {
+ values[Anum_pg_tablesample_method_tsmnexttuple - 1] =
+ get_tabmesample_method_func(defel,
+ Anum_pg_tablesample_method_tsmnexttuple);
+ }
+ else if (pg_strcasecmp(defel->defname, "returntuple") == 0)
+ {
+ values[Anum_pg_tablesample_method_tsmreturntuple - 1] =
+ get_tabmesample_method_func(defel,
+ Anum_pg_tablesample_method_tsmreturntuple);
+ }
+ else if (pg_strcasecmp(defel->defname, "end") == 0)
+ {
+ values[Anum_pg_tablesample_method_tsmend - 1] =
+ get_tabmesample_method_func(defel,
+ Anum_pg_tablesample_method_tsmend);
+ }
+ else if (pg_strcasecmp(defel->defname, "reset") == 0)
+ {
+ values[Anum_pg_tablesample_method_tsmreset - 1] =
+ get_tabmesample_method_func(defel,
+ Anum_pg_tablesample_method_tsmreset);
+ }
+ else if (pg_strcasecmp(defel->defname, "cost") == 0)
+ {
+ values[Anum_pg_tablesample_method_tsmcost - 1] =
+ get_tabmesample_method_func(defel,
+ Anum_pg_tablesample_method_tsmcost);
+ }
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("tablesample method parameter \"%s\" not recognized",
+ defel->defname)));
+ }
+
+ /*
+ * Validation.
+ */
+ if (!OidIsValid(DatumGetObjectId(values[Anum_pg_tablesample_method_tsminit - 1])))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("tablesample method init function is required")));
+
+ if (!OidIsValid(DatumGetObjectId(values[Anum_pg_tablesample_method_tsmnextblock - 1])))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("tablesample method nextblock function is required")));
+
+ if (!OidIsValid(DatumGetObjectId(values[Anum_pg_tablesample_method_tsmnexttuple - 1])))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("tablesample method nexttuple function is required")));
+
+ if (!OidIsValid(DatumGetObjectId(values[Anum_pg_tablesample_method_tsmend - 1])))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("tablesample method end function is required")));
+
+ if (!OidIsValid(DatumGetObjectId(values[Anum_pg_tablesample_method_tsmreset - 1])))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("tablesample method reset function is required")));
+
+ if (!OidIsValid(DatumGetObjectId(values[Anum_pg_tablesample_method_tsmcost - 1])))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("tablesample method cost function is required")));
+
+ /*
+ * Insert tuple into pg_tablesample_method.
+ */
+ rel = heap_open(TableSampleMethodRelationId, RowExclusiveLock);
+
+ tuple = heap_form_tuple(rel->rd_att, values, nulls);
+
+ tsmoid = simple_heap_insert(rel, tuple);
+
+ CatalogUpdateIndexes(rel, tuple);
+
+ makeTablesampleMethodDeps(tuple);
+
+ heap_freetuple(tuple);
+
+ /* Post creation hook for new tablesample method */
+ InvokeObjectPostCreateHook(TableSampleMethodRelationId, tsmoid, 0);
+
+ heap_close(rel, RowExclusiveLock);
+
+ return tsmoid;
+}
+
+/*
+ * Drop a tablesample method.
+ */
+void
+RemoveTablesampleMethodById(Oid tsmoid)
+{
+ Relation rel;
+ HeapTuple tuple;
+ Form_pg_tablesample_method tsm;
+
+ /*
+ * Find the target tuple
+ */
+ rel = heap_open(TableSampleMethodRelationId, RowExclusiveLock);
+
+ tuple = SearchSysCache1(TABLESAMPLEMETHODOID, ObjectIdGetDatum(tsmoid));
+ if (!HeapTupleIsValid(tuple))
+ elog(ERROR, "cache lookup failed for tablesample method %u",
+ tsmoid);
+
+ tsm = (Form_pg_tablesample_method) GETSTRUCT(tuple);
+ /* Can't drop builtin tablesample methods. */
+ if (tsmoid == TABLESAMPLE_METHOD_SYSTEM_OID ||
+ tsmoid == TABLESAMPLE_METHOD_BERNOULLI_OID)
+ ereport(ERROR,
+ (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+ errmsg("permission denied for tablesample method %s",
+ NameStr(tsm->tsmname))));
+
+ /*
+ * Remove the pg_tablespace tuple (this will roll back if we fail below)
+ */
+ simple_heap_delete(rel, &tuple->t_self);
+
+ ReleaseSysCache(tuple);
+
+ heap_close(rel, RowExclusiveLock);
+}
+
+/*
+ * get_tablesample_method_oid - given a tablesample method name,
+ * look up the OID
+ *
+ * If missing_ok is false, throw an error if tablesample method name not found.
+ * If true, just return InvalidOid.
+ */
+Oid
+get_tablesample_method_oid(const char *tsmname, bool missing_ok)
+{
+ Oid result;
+ HeapTuple tuple;
+
+ tuple = SearchSysCache1(TABLESAMPLEMETHODNAME, PointerGetDatum(tsmname));
+ if (HeapTupleIsValid(tuple))
+ {
+ result = HeapTupleGetOid(tuple);
+ ReleaseSysCache(tuple);
+ }
+ else
+ result = InvalidOid;
+
+ if (!OidIsValid(result) && !missing_ok)
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_OBJECT),
+ errmsg("tablesample method \"%s\" does not exist",
+ tsmname)));
+
+ return result;
+}
+
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index ac5e095..4578b5e 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -586,7 +586,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
LEADING LEAKPROOF LEAST LEFT LEVEL LIKE LIMIT LISTEN LOAD LOCAL
LOCALTIME LOCALTIMESTAMP LOCATION LOCK_P LOCKED LOGGED
- MAPPING MATCH MATERIALIZED MAXVALUE MINUTE_P MINVALUE MODE MONTH_P MOVE
+ MAPPING MATCH MATERIALIZED MAXVALUE METHOD MINUTE_P MINVALUE MODE MONTH_P
+ MOVE
NAME_P NAMES NATIONAL NATURAL NCHAR NEXT NO NONE
NOT NOTHING NOTIFY NOTNULL NOWAIT NULL_P NULLIF
@@ -5094,6 +5095,15 @@ DefineStmt:
n->definition = list_make1(makeDefElem("from", (Node *) $5));
$$ = (Node *)n;
}
+ | CREATE TABLESAMPLE METHOD name definition
+ {
+ DefineStmt *n = makeNode(DefineStmt);
+ n->kind = OBJECT_TABLESAMPLEMETHOD;
+ n->args = NIL;
+ n->defnames = list_make1(makeString($4));
+ n->definition = $5;
+ $$ = (Node *)n;
+ }
;
definition: '(' def_list ')' { $$ = $2; }
@@ -5552,6 +5562,7 @@ drop_type: TABLE { $$ = OBJECT_TABLE; }
| TEXT_P SEARCH DICTIONARY { $$ = OBJECT_TSDICTIONARY; }
| TEXT_P SEARCH TEMPLATE { $$ = OBJECT_TSTEMPLATE; }
| TEXT_P SEARCH CONFIGURATION { $$ = OBJECT_TSCONFIGURATION; }
+ | TABLESAMPLE METHOD { $$ = OBJECT_TABLESAMPLEMETHOD; }
;
any_name_list:
@@ -13313,6 +13324,7 @@ unreserved_keyword:
| MATCH
| MATERIALIZED
| MAXVALUE
+ | METHOD
| MINUTE_P
| MINVALUE
| MODE
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 3533cfa..532256d 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -23,6 +23,7 @@
#include "access/xlog.h"
#include "catalog/catalog.h"
#include "catalog/namespace.h"
+#include "catalog/pg_tablesample_method.h"
#include "catalog/toasting.h"
#include "commands/alter.h"
#include "commands/async.h"
@@ -1106,6 +1107,11 @@ ProcessUtilitySlow(Node *parsetree,
Assert(stmt->args == NIL);
DefineCollation(stmt->defnames, stmt->definition);
break;
+ case OBJECT_TABLESAMPLEMETHOD:
+ Assert(stmt->args == NIL);
+ Assert(list_length(stmt->defnames) == 1);
+ DefineTablesampleMethod(stmt->defnames, stmt->definition);
+ break;
default:
elog(ERROR, "unrecognized define stmt type: %d",
(int) stmt->kind);
@@ -1960,6 +1966,9 @@ CreateCommandTag(Node *parsetree)
case OBJECT_POLICY:
tag = "DROP POLICY";
break;
+ case OBJECT_TABLESAMPLEMETHOD:
+ tag = "DROP TABLESAMPLE METHOD";
+ break;
default:
tag = "???";
}
@@ -2056,6 +2065,9 @@ CreateCommandTag(Node *parsetree)
case OBJECT_COLLATION:
tag = "CREATE COLLATION";
break;
+ case OBJECT_TABLESAMPLEMETHOD:
+ tag = "CREATE TABLESAMPLE METHOD";
+ break;
default:
tag = "???";
}
diff --git a/src/include/catalog/dependency.h b/src/include/catalog/dependency.h
index 6481ac8..30653f8 100644
--- a/src/include/catalog/dependency.h
+++ b/src/include/catalog/dependency.h
@@ -148,6 +148,7 @@ typedef enum ObjectClass
OCLASS_EXTENSION, /* pg_extension */
OCLASS_EVENT_TRIGGER, /* pg_event_trigger */
OCLASS_POLICY, /* pg_policy */
+ OCLASS_TABLESAMPLEMETHOD, /* pg_tablesample_method */
MAX_OCLASS /* MUST BE LAST */
} ObjectClass;
diff --git a/src/include/catalog/pg_tablesample_method.h b/src/include/catalog/pg_tablesample_method.h
index ea11f45..db1e7eb 100644
--- a/src/include/catalog/pg_tablesample_method.h
+++ b/src/include/catalog/pg_tablesample_method.h
@@ -67,7 +67,18 @@ typedef FormData_pg_tablesample_method *Form_pg_tablesample_method;
DATA(insert OID = 3283 ( system tsm_system_init tsm_system_nextblock tsm_system_nexttuple - tsm_system_end tsm_system_reset tsm_system_cost ));
DESCR("SYSTEM table sampling method");
+#define TABLESAMPLE_METHOD_SYSTEM_OID 3283
DATA(insert OID = 3284 ( bernoulli tsm_bernoulli_init tsm_bernoulli_nextblock tsm_bernoulli_nexttuple - tsm_bernoulli_end tsm_bernoulli_reset tsm_bernoulli_cost ));
DESCR("BERNOULLI table sampling method");
+#define TABLESAMPLE_METHOD_BERNOULLI_OID 3284
+
+/* ----------------
+ * functions for manipulation of pg_tablesample_method
+ * ----------------
+ */
+
+extern Oid DefineTablesampleMethod(List *names, List *parameters);
+extern void RemoveTablesampleMethodById(Oid tsmoid);
+extern Oid get_tablesample_method_oid(const char *tsmname, bool missing_ok);
#endif /* PG_TABLESAMPLE_METHOD_H */
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 7927a82..a3cf011 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -1268,6 +1268,7 @@ typedef enum ObjectType
OBJECT_SEQUENCE,
OBJECT_TABCONSTRAINT,
OBJECT_TABLE,
+ OBJECT_TABLESAMPLEMETHOD,
OBJECT_TABLESPACE,
OBJECT_TRIGGER,
OBJECT_TSCONFIGURATION,
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index 6ff7b44..c3269c0 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -236,6 +236,7 @@ PG_KEYWORD("mapping", MAPPING, UNRESERVED_KEYWORD)
PG_KEYWORD("match", MATCH, UNRESERVED_KEYWORD)
PG_KEYWORD("materialized", MATERIALIZED, UNRESERVED_KEYWORD)
PG_KEYWORD("maxvalue", MAXVALUE, UNRESERVED_KEYWORD)
+PG_KEYWORD("method", METHOD, UNRESERVED_KEYWORD)
PG_KEYWORD("minute", MINUTE_P, UNRESERVED_KEYWORD)
PG_KEYWORD("minvalue", MINVALUE, UNRESERVED_KEYWORD)
PG_KEYWORD("mode", MODE, UNRESERVED_KEYWORD)
diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile
index 93d93af..37ea524 100644
--- a/src/test/modules/Makefile
+++ b/src/test/modules/Makefile
@@ -9,7 +9,8 @@ SUBDIRS = \
worker_spi \
dummy_seclabel \
test_shm_mq \
- test_parser
+ test_parser \
+ tablesample
all: submake-errcodes
diff --git a/src/test/modules/tablesample/.gitignore b/src/test/modules/tablesample/.gitignore
new file mode 100644
index 0000000..5dcb3ff
--- /dev/null
+++ b/src/test/modules/tablesample/.gitignore
@@ -0,0 +1,4 @@
+# Generated subdirectories
+/log/
+/results/
+/tmp_check/
diff --git a/src/test/modules/tablesample/Makefile b/src/test/modules/tablesample/Makefile
new file mode 100644
index 0000000..469b004
--- /dev/null
+++ b/src/test/modules/tablesample/Makefile
@@ -0,0 +1,21 @@
+# src/test/modules/tsm_test/Makefile
+
+MODULE_big = tsm_test
+OBJS = tsm_test.o $(WIN32RES)
+PGFILEDESC = "tsm_test - example of a custom tablesample method"
+
+EXTENSION = tsm_test
+DATA = tsm_test--1.0.sql
+
+REGRESS = tablesample
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/tablesample
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/src/test/modules/tablesample/expected/tablesample.out b/src/test/modules/tablesample/expected/tablesample.out
new file mode 100644
index 0000000..970b765
--- /dev/null
+++ b/src/test/modules/tablesample/expected/tablesample.out
@@ -0,0 +1,38 @@
+CREATE EXTENSION tsm_test;
+CREATE TABLE test_tsm AS SELECT md5(i::text) a, 0.5::float b FROM generate_series(1,10) g(i);
+SELECT * FROM test_tsm TABLESAMPLE tsm_test('b') REPEATABLE (1);
+ a | b
+----------------------------------+-----
+ c4ca4238a0b923820dcc509a6f75849b | 0.5
+ c81e728d9d4c2f636f067f89cc14862c | 0.5
+ a87ff679a2f3e71d9181a67b7542122c | 0.5
+ 1679091c5a880faf6fb5e6087eb1b2dc | 0.5
+ 8f14e45fceea167a5a36dedd4bea2543 | 0.5
+ d3d9446802a44259755d38e6d163e820 | 0.5
+(6 rows)
+
+CREATE VIEW test_tsm_v AS SELECT * FROM test_tsm TABLESAMPLE tsm_test('b') REPEATABLE (9999);
+SELECT * FROM test_tsm_v;
+ a | b
+----------------------------------+-----
+ c4ca4238a0b923820dcc509a6f75849b | 0.5
+ e4da3b7fbbce2345d7772b0674a318d5 | 0.5
+ 1679091c5a880faf6fb5e6087eb1b2dc | 0.5
+ 8f14e45fceea167a5a36dedd4bea2543 | 0.5
+ c9f0f895fb98ab9159f51fd0297e236d | 0.5
+(5 rows)
+
+DROP TABLESAMPLE METHOD tsm_test;
+ERROR: cannot drop tablesample method tsm_test because extension tsm_test requires it
+HINT: You can drop extension tsm_test instead.
+DROP EXTENSION tsm_test;
+ERROR: cannot drop extension tsm_test because other objects depend on it
+DETAIL: view test_tsm_v depends on tablesample method tsm_test
+HINT: Use DROP ... CASCADE to drop the dependent objects too.
+DROP EXTENSION tsm_test CASCADE;
+NOTICE: drop cascades to view test_tsm_v
+SELECT * FROM pg_tablesample_method WHERE tsmname = 'tsm_test';
+ tsmname | tsminit | tsmnextblock | tsmnexttuple | tsmreturntuple | tsmend | tsmreset | tsmcost
+---------+---------+--------------+--------------+----------------+--------+----------+---------
+(0 rows)
+
diff --git a/src/test/modules/tablesample/sql/tablesample.sql b/src/test/modules/tablesample/sql/tablesample.sql
new file mode 100644
index 0000000..b1104d6
--- /dev/null
+++ b/src/test/modules/tablesample/sql/tablesample.sql
@@ -0,0 +1,14 @@
+CREATE EXTENSION tsm_test;
+
+CREATE TABLE test_tsm AS SELECT md5(i::text) a, 0.5::float b FROM generate_series(1,10) g(i);
+
+SELECT * FROM test_tsm TABLESAMPLE tsm_test('b') REPEATABLE (1);
+
+CREATE VIEW test_tsm_v AS SELECT * FROM test_tsm TABLESAMPLE tsm_test('b') REPEATABLE (9999);
+SELECT * FROM test_tsm_v;
+
+DROP TABLESAMPLE METHOD tsm_test;
+DROP EXTENSION tsm_test;
+DROP EXTENSION tsm_test CASCADE;
+
+SELECT * FROM pg_tablesample_method WHERE tsmname = 'tsm_test';
diff --git a/src/test/modules/tablesample/tsm_test--1.0.sql b/src/test/modules/tablesample/tsm_test--1.0.sql
new file mode 100644
index 0000000..2814e37
--- /dev/null
+++ b/src/test/modules/tablesample/tsm_test--1.0.sql
@@ -0,0 +1,50 @@
+/* src/test/modules/tablesample/tsm_test--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION tsm_test" to load this file. \quit
+
+CREATE FUNCTION tsm_test_init(internal, int4, text)
+RETURNS void
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION tsm_test_nextblock(internal, bool)
+RETURNS int4
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION tsm_test_nexttuple(internal, int4, int2, bool)
+RETURNS int2
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION tsm_test_returntuple(internal, int4, internal, bool)
+RETURNS bool
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION tsm_test_end(internal)
+RETURNS void
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION tsm_test_reset(internal)
+RETURNS void
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION tsm_test_cost(internal, internal, internal, internal, internal, internal, internal)
+RETURNS void
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+
+CREATE TABLESAMPLE METHOD tsm_test (
+ INIT = tsm_test_init,
+ NEXTBLOCK = tsm_test_nextblock,
+ NEXTTUPLE = tsm_test_nexttuple,
+ RETURNTUPLE = tsm_test_returntuple,
+ END = tsm_test_end,
+ RESET = tsm_test_reset,
+ COST = tsm_test_cost
+);
diff --git a/src/test/modules/tablesample/tsm_test.c b/src/test/modules/tablesample/tsm_test.c
new file mode 100644
index 0000000..110322a
--- /dev/null
+++ b/src/test/modules/tablesample/tsm_test.c
@@ -0,0 +1,218 @@
+/*-------------------------------------------------------------------------
+ *
+ * tsm_test.c
+ * Simple example of a custom tablesample method
+ *
+ * Copyright (c) 2007-2014, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/test/modules/tablesample/tsm_test.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "fmgr.h"
+
+#include "access/htup_details.h"
+#include "access/tupdesc.h"
+#include "catalog/pg_type.h"
+#include "nodes/execnodes.h"
+#include "nodes/relation.h"
+#include "optimizer/clauses.h"
+#include "storage/bufmgr.h"
+#include "utils/builtins.h"
+#include "utils/lsyscache.h"
+#include "utils/rel.h"
+#include "utils/sampling.h"
+#include "utils/tablesample.h"
+
+PG_MODULE_MAGIC;
+
+/* State */
+typedef struct
+{
+ uint32 seed; /* random seed */
+ AttrNumber attnum; /* column to check */
+ TupleDesc tupDesc; /* tuple descriptor of table */
+ BlockNumber tblocks; /* total blocks in relation */
+ BlockNumber blockno; /* current block */
+ OffsetNumber lt; /* last tuple returned from current block */
+ SamplerRandomState randstate; /* random generator state */
+} tsm_test_state;
+
+
+PG_FUNCTION_INFO_V1(tsm_test_init);
+PG_FUNCTION_INFO_V1(tsm_test_nextblock);
+PG_FUNCTION_INFO_V1(tsm_test_nexttuple);
+PG_FUNCTION_INFO_V1(tsm_test_returntuple);
+PG_FUNCTION_INFO_V1(tsm_test_end);
+PG_FUNCTION_INFO_V1(tsm_test_reset);
+PG_FUNCTION_INFO_V1(tsm_test_cost);
+
+/*
+ * Initialize the state.
+ */
+Datum
+tsm_test_init(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ uint32 seed = PG_GETARG_UINT32(1);
+ char *attname;
+ AttrNumber attnum;
+ Oid atttype;
+ Relation rel = scanstate->ss.ss_currentRelation;
+ tsm_test_state *state;
+ TupleDesc tupDesc = RelationGetDescr(rel);
+
+ if (PG_ARGISNULL(2))
+ ereport(ERROR,
+ (errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED),
+ errmsg("invalid parameter for tablesample method tsm_test"),
+ errhint("attnum cannot be NULL.")));
+
+ attname = text_to_cstring(PG_GETARG_TEXT_P(2));
+
+ attnum = get_attnum(rel->rd_id, attname);
+ if (!AttrNumberIsForUserDefinedAttr(attnum))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid parameter for tablesample method tsm_test"),
+ errhint("column %s does not exist", attname)));
+
+ atttype = get_atttype(rel->rd_id, attnum);
+ if (atttype != FLOAT8OID)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid parameter for tablesample method tsm_test"),
+ errhint("column %s is not of type float.", attname)));
+
+ state = palloc0(sizeof(tsm_test_state));
+
+ /* Remember initial values for reinit */
+ state->seed = seed;
+ state->attnum = attnum;
+ state->tupDesc = tupDesc;
+ state->tblocks = RelationGetNumberOfBlocks(rel);
+ state->blockno = InvalidBlockNumber;
+ state->lt = InvalidOffsetNumber;
+ sampler_random_init_state(state->seed, state->randstate);
+
+ scanstate->tsmdata = (void *) state;
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Get next block number to read or InvalidBlockNumber if we are at the
+ * end of the relation.
+ */
+Datum
+tsm_test_nextblock(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ tsm_test_state *state = (tsm_test_state *) scanstate->tsmdata;
+
+ if (state->blockno == InvalidBlockNumber)
+ state->blockno = 0;
+ else if (++state->blockno >= state->tblocks)
+ PG_RETURN_UINT32(InvalidBlockNumber);
+
+ PG_RETURN_UINT32(state->blockno);
+}
+
+/*
+ * Get next tuple from current block.
+ */
+Datum
+tsm_test_nexttuple(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ OffsetNumber maxoffset = PG_GETARG_UINT16(2);
+ tsm_test_state *state = (tsm_test_state *) scanstate->tsmdata;
+
+ if (state->lt == InvalidOffsetNumber)
+ state->lt = FirstOffsetNumber;
+ else if (++state->lt > maxoffset)
+ PG_RETURN_UINT16(InvalidOffsetNumber);
+
+ PG_RETURN_UINT16(state->lt);
+}
+
+/*
+ * Examine tuple and decide if it should be returned.
+ */
+Datum
+tsm_test_returntuple(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ HeapTuple tuple = (HeapTuple) PG_GETARG_POINTER(2);
+ bool visible = PG_GETARG_BOOL(3);
+ tsm_test_state *state = (tsm_test_state *) scanstate->tsmdata;
+ bool isnull;
+ float8 val, rand;
+
+ if (!visible)
+ PG_RETURN_BOOL(false);
+
+ val = DatumGetFloat8(heap_getattr(tuple, state->attnum, state->tupDesc, &isnull));
+ rand = sampler_random_fract(state->randstate);
+ if (isnull || val < rand)
+ PG_RETURN_BOOL(false);
+ else
+ PG_RETURN_BOOL(true);
+}
+
+/*
+ * Cleanup method.
+ */
+Datum
+tsm_test_end(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+
+ pfree(scanstate->tsmdata);
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Reset state (called by ReScan).
+ */
+Datum
+tsm_test_reset(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ tsm_test_state *state = (tsm_test_state *) scanstate->tsmdata;
+
+ state->blockno = InvalidBlockNumber;
+ state->lt = InvalidOffsetNumber;
+
+ sampler_random_init_state(state->seed, state->randstate);
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Costing function.
+ */
+Datum
+tsm_test_cost(PG_FUNCTION_ARGS)
+{
+ RelOptInfo *baserel = (RelOptInfo *) PG_GETARG_POINTER(2);
+ BlockNumber *pages = (BlockNumber *) PG_GETARG_POINTER(3);
+ double *tuples = (double *) PG_GETARG_POINTER(4);
+
+ SamplerAccessStrategy *strategy =
+ (SamplerAccessStrategy *) PG_GETARG_POINTER(5);
+
+ *strategy = SAS_SEQUENTIAL;
+
+ *pages = baserel->pages;
+ /* This is very bad estimation */
+ *tuples = baserel->tuples/2;
+
+ PG_RETURN_VOID();
+}
+
diff --git a/src/test/modules/tablesample/tsm_test.control b/src/test/modules/tablesample/tsm_test.control
new file mode 100644
index 0000000..a7b2741
--- /dev/null
+++ b/src/test/modules/tablesample/tsm_test.control
@@ -0,0 +1,5 @@
+# tsm_test extension
+comment = 'test module for custom tablesample method'
+default_version = '1.0'
+module_pathname = '$libdir/tsm_test'
+relocatable = true
--
1.9.1
0001-separate-block-sampling-functions-v2.patchtext/x-diff; name=0001-separate-block-sampling-functions-v2.patchDownload
>From 92959256c7dd892e9aa0f6921ffb2dfd552e28b4 Mon Sep 17 00:00:00 2001
From: Petr Jelinek <pjmodos@pjmodos.net>
Date: Wed, 7 Jan 2015 23:36:56 +0100
Subject: [PATCH 1/3] separate block sampling functions v2
---
contrib/file_fdw/file_fdw.c | 9 +-
contrib/postgres_fdw/postgres_fdw.c | 10 +-
src/backend/commands/analyze.c | 225 +----------------------------------
src/backend/utils/misc/Makefile | 2 +-
src/backend/utils/misc/sampling.c | 226 ++++++++++++++++++++++++++++++++++++
src/include/commands/vacuum.h | 3 -
src/include/utils/sampling.h | 44 +++++++
7 files changed, 287 insertions(+), 232 deletions(-)
create mode 100644 src/backend/utils/misc/sampling.c
create mode 100644 src/include/utils/sampling.h
diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index d569760..df732c0 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -34,6 +34,7 @@
#include "optimizer/var.h"
#include "utils/memutils.h"
#include "utils/rel.h"
+#include "utils/sampling.h"
PG_MODULE_MAGIC;
@@ -1005,7 +1006,7 @@ file_acquire_sample_rows(Relation onerel, int elevel,
{
int numrows = 0;
double rowstoskip = -1; /* -1 means not set yet */
- double rstate;
+ ReservoirStateData rstate;
TupleDesc tupDesc;
Datum *values;
bool *nulls;
@@ -1043,7 +1044,7 @@ file_acquire_sample_rows(Relation onerel, int elevel,
ALLOCSET_DEFAULT_MAXSIZE);
/* Prepare for sampling rows */
- rstate = anl_init_selection_state(targrows);
+ reservoir_init_selection_state(&rstate, targrows);
/* Set up callback to identify error line number. */
errcallback.callback = CopyFromErrorCallback;
@@ -1087,7 +1088,7 @@ file_acquire_sample_rows(Relation onerel, int elevel,
* not-yet-incremented value of totalrows as t.
*/
if (rowstoskip < 0)
- rowstoskip = anl_get_next_S(*totalrows, targrows, &rstate);
+ rowstoskip = reservoir_get_next_S(&rstate, *totalrows, targrows);
if (rowstoskip <= 0)
{
@@ -1095,7 +1096,7 @@ file_acquire_sample_rows(Relation onerel, int elevel,
* Found a suitable tuple, so save it, replacing one old tuple
* at random
*/
- int k = (int) (targrows * anl_random_fract());
+ int k = (int) (targrows * sampler_random_fract());
Assert(k >= 0 && k < targrows);
heap_freetuple(rows[k]);
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index d76e739..cbcba6e 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -37,6 +37,7 @@
#include "utils/lsyscache.h"
#include "utils/memutils.h"
#include "utils/rel.h"
+#include "utils/sampling.h"
PG_MODULE_MAGIC;
@@ -202,7 +203,7 @@ typedef struct PgFdwAnalyzeState
/* for random sampling */
double samplerows; /* # of rows fetched */
double rowstoskip; /* # of rows to skip before next sample */
- double rstate; /* random state */
+ ReservoirStateData rstate; /* state for reservoir sampling*/
/* working memory contexts */
MemoryContext anl_cxt; /* context for per-analyze lifespan data */
@@ -2393,7 +2394,7 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,
astate.numrows = 0;
astate.samplerows = 0;
astate.rowstoskip = -1; /* -1 means not set yet */
- astate.rstate = anl_init_selection_state(targrows);
+ reservoir_init_selection_state(&astate.rstate, targrows);
/* Remember ANALYZE context, and create a per-tuple temp context */
astate.anl_cxt = CurrentMemoryContext;
@@ -2533,13 +2534,12 @@ analyze_row_processor(PGresult *res, int row, PgFdwAnalyzeState *astate)
* analyze.c; see Jeff Vitter's paper.
*/
if (astate->rowstoskip < 0)
- astate->rowstoskip = anl_get_next_S(astate->samplerows, targrows,
- &astate->rstate);
+ astate->rowstoskip = reservoir_get_next_S(&astate->rstate, astate->samplerows, targrows);
if (astate->rowstoskip <= 0)
{
/* Choose a random reservoir element to replace. */
- pos = (int) (targrows * anl_random_fract());
+ pos = (int) (targrows * sampler_random_fract());
Assert(pos >= 0 && pos < targrows);
heap_freetuple(astate->rows[pos]);
}
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index d2856a3..fc9dd44 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -50,23 +50,13 @@
#include "utils/lsyscache.h"
#include "utils/memutils.h"
#include "utils/pg_rusage.h"
+#include "utils/sampling.h"
#include "utils/sortsupport.h"
#include "utils/syscache.h"
#include "utils/timestamp.h"
#include "utils/tqual.h"
-/* Data structure for Algorithm S from Knuth 3.4.2 */
-typedef struct
-{
- BlockNumber N; /* number of blocks, known in advance */
- int n; /* desired sample size */
- BlockNumber t; /* current block number */
- int m; /* blocks selected so far */
-} BlockSamplerData;
-
-typedef BlockSamplerData *BlockSampler;
-
/* Per-index data for ANALYZE */
typedef struct AnlIndexData
{
@@ -88,10 +78,6 @@ static BufferAccessStrategy vac_strategy;
static void do_analyze_rel(Relation onerel, VacuumStmt *vacstmt,
AcquireSampleRowsFunc acquirefunc, BlockNumber relpages,
bool inh, bool in_outer_xact, int elevel);
-static void BlockSampler_Init(BlockSampler bs, BlockNumber nblocks,
- int samplesize);
-static bool BlockSampler_HasMore(BlockSampler bs);
-static BlockNumber BlockSampler_Next(BlockSampler bs);
static void compute_index_stats(Relation onerel, double totalrows,
AnlIndexData *indexdata, int nindexes,
HeapTuple *rows, int numrows,
@@ -947,94 +933,6 @@ examine_attribute(Relation onerel, int attnum, Node *index_expr)
}
/*
- * BlockSampler_Init -- prepare for random sampling of blocknumbers
- *
- * BlockSampler is used for stage one of our new two-stage tuple
- * sampling mechanism as discussed on pgsql-hackers 2004-04-02 (subject
- * "Large DB"). It selects a random sample of samplesize blocks out of
- * the nblocks blocks in the table. If the table has less than
- * samplesize blocks, all blocks are selected.
- *
- * Since we know the total number of blocks in advance, we can use the
- * straightforward Algorithm S from Knuth 3.4.2, rather than Vitter's
- * algorithm.
- */
-static void
-BlockSampler_Init(BlockSampler bs, BlockNumber nblocks, int samplesize)
-{
- bs->N = nblocks; /* measured table size */
-
- /*
- * If we decide to reduce samplesize for tables that have less or not much
- * more than samplesize blocks, here is the place to do it.
- */
- bs->n = samplesize;
- bs->t = 0; /* blocks scanned so far */
- bs->m = 0; /* blocks selected so far */
-}
-
-static bool
-BlockSampler_HasMore(BlockSampler bs)
-{
- return (bs->t < bs->N) && (bs->m < bs->n);
-}
-
-static BlockNumber
-BlockSampler_Next(BlockSampler bs)
-{
- BlockNumber K = bs->N - bs->t; /* remaining blocks */
- int k = bs->n - bs->m; /* blocks still to sample */
- double p; /* probability to skip block */
- double V; /* random */
-
- Assert(BlockSampler_HasMore(bs)); /* hence K > 0 and k > 0 */
-
- if ((BlockNumber) k >= K)
- {
- /* need all the rest */
- bs->m++;
- return bs->t++;
- }
-
- /*----------
- * It is not obvious that this code matches Knuth's Algorithm S.
- * Knuth says to skip the current block with probability 1 - k/K.
- * If we are to skip, we should advance t (hence decrease K), and
- * repeat the same probabilistic test for the next block. The naive
- * implementation thus requires an anl_random_fract() call for each block
- * number. But we can reduce this to one anl_random_fract() call per
- * selected block, by noting that each time the while-test succeeds,
- * we can reinterpret V as a uniform random number in the range 0 to p.
- * Therefore, instead of choosing a new V, we just adjust p to be
- * the appropriate fraction of its former value, and our next loop
- * makes the appropriate probabilistic test.
- *
- * We have initially K > k > 0. If the loop reduces K to equal k,
- * the next while-test must fail since p will become exactly zero
- * (we assume there will not be roundoff error in the division).
- * (Note: Knuth suggests a "<=" loop condition, but we use "<" just
- * to be doubly sure about roundoff error.) Therefore K cannot become
- * less than k, which means that we cannot fail to select enough blocks.
- *----------
- */
- V = anl_random_fract();
- p = 1.0 - (double) k / (double) K;
- while (V < p)
- {
- /* skip */
- bs->t++;
- K--; /* keep K == N - t */
-
- /* adjust p to be new cutoff point in reduced range */
- p *= 1.0 - (double) k / (double) K;
- }
-
- /* select */
- bs->m++;
- return bs->t++;
-}
-
-/*
* acquire_sample_rows -- acquire a random sample of rows from the table
*
* Selected rows are returned in the caller-allocated array rows[], which
@@ -1080,7 +978,7 @@ acquire_sample_rows(Relation onerel, int elevel,
BlockNumber totalblocks;
TransactionId OldestXmin;
BlockSamplerData bs;
- double rstate;
+ ReservoirStateData rstate;
Assert(targrows > 0);
@@ -1090,9 +988,9 @@ acquire_sample_rows(Relation onerel, int elevel,
OldestXmin = GetOldestXmin(onerel, true);
/* Prepare for sampling block numbers */
- BlockSampler_Init(&bs, totalblocks, targrows);
+ BlockSampler_Init(&bs, totalblocks, targrows, random());
/* Prepare for sampling rows */
- rstate = anl_init_selection_state(targrows);
+ reservoir_init_selection_state(&rstate, targrows);
/* Outer loop over blocks to sample */
while (BlockSampler_HasMore(&bs))
@@ -1240,8 +1138,7 @@ acquire_sample_rows(Relation onerel, int elevel,
* t.
*/
if (rowstoskip < 0)
- rowstoskip = anl_get_next_S(samplerows, targrows,
- &rstate);
+ rowstoskip = reservoir_get_next_S(&rstate, samplerows, targrows);
if (rowstoskip <= 0)
{
@@ -1249,7 +1146,7 @@ acquire_sample_rows(Relation onerel, int elevel,
* Found a suitable tuple, so save it, replacing one
* old tuple at random
*/
- int k = (int) (targrows * anl_random_fract());
+ int k = (int) (targrows * sampler_random_fract());
Assert(k >= 0 && k < targrows);
heap_freetuple(rows[k]);
@@ -1308,116 +1205,6 @@ acquire_sample_rows(Relation onerel, int elevel,
return numrows;
}
-/* Select a random value R uniformly distributed in (0 - 1) */
-double
-anl_random_fract(void)
-{
- return ((double) random() + 1) / ((double) MAX_RANDOM_VALUE + 2);
-}
-
-/*
- * These two routines embody Algorithm Z from "Random sampling with a
- * reservoir" by Jeffrey S. Vitter, in ACM Trans. Math. Softw. 11, 1
- * (Mar. 1985), Pages 37-57. Vitter describes his algorithm in terms
- * of the count S of records to skip before processing another record.
- * It is computed primarily based on t, the number of records already read.
- * The only extra state needed between calls is W, a random state variable.
- *
- * anl_init_selection_state computes the initial W value.
- *
- * Given that we've already read t records (t >= n), anl_get_next_S
- * determines the number of records to skip before the next record is
- * processed.
- */
-double
-anl_init_selection_state(int n)
-{
- /* Initial value of W (for use when Algorithm Z is first applied) */
- return exp(-log(anl_random_fract()) / n);
-}
-
-double
-anl_get_next_S(double t, int n, double *stateptr)
-{
- double S;
-
- /* The magic constant here is T from Vitter's paper */
- if (t <= (22.0 * n))
- {
- /* Process records using Algorithm X until t is large enough */
- double V,
- quot;
-
- V = anl_random_fract(); /* Generate V */
- S = 0;
- t += 1;
- /* Note: "num" in Vitter's code is always equal to t - n */
- quot = (t - (double) n) / t;
- /* Find min S satisfying (4.1) */
- while (quot > V)
- {
- S += 1;
- t += 1;
- quot *= (t - (double) n) / t;
- }
- }
- else
- {
- /* Now apply Algorithm Z */
- double W = *stateptr;
- double term = t - (double) n + 1;
-
- for (;;)
- {
- double numer,
- numer_lim,
- denom;
- double U,
- X,
- lhs,
- rhs,
- y,
- tmp;
-
- /* Generate U and X */
- U = anl_random_fract();
- X = t * (W - 1.0);
- S = floor(X); /* S is tentatively set to floor(X) */
- /* Test if U <= h(S)/cg(X) in the manner of (6.3) */
- tmp = (t + 1) / term;
- lhs = exp(log(((U * tmp * tmp) * (term + S)) / (t + X)) / n);
- rhs = (((t + X) / (term + S)) * term) / t;
- if (lhs <= rhs)
- {
- W = rhs / lhs;
- break;
- }
- /* Test if U <= f(S)/cg(X) */
- y = (((U * (t + 1)) / term) * (t + S + 1)) / (t + X);
- if ((double) n < S)
- {
- denom = t;
- numer_lim = term + S;
- }
- else
- {
- denom = t - (double) n + S;
- numer_lim = t + 1;
- }
- for (numer = t + S; numer >= numer_lim; numer -= 1)
- {
- y *= numer / denom;
- denom -= 1;
- }
- W = exp(-log(anl_random_fract()) / n); /* Generate W in advance */
- if (exp(log(y) / n) <= (t + X) / t)
- break;
- }
- *stateptr = W;
- }
- return S;
-}
-
/*
* qsort comparator for sorting rows[] array
*/
diff --git a/src/backend/utils/misc/Makefile b/src/backend/utils/misc/Makefile
index 449d5b4..848ba29 100644
--- a/src/backend/utils/misc/Makefile
+++ b/src/backend/utils/misc/Makefile
@@ -15,7 +15,7 @@ include $(top_builddir)/src/Makefile.global
override CPPFLAGS := -I. -I$(srcdir) $(CPPFLAGS)
OBJS = guc.o help_config.o pg_rusage.o ps_status.o \
- superuser.o timeout.o tzparser.o
+ sampling.o superuser.o timeout.o tzparser.o
# This location might depend on the installation directories. Therefore
# we can't subsitute it into pg_config.h.
diff --git a/src/backend/utils/misc/sampling.c b/src/backend/utils/misc/sampling.c
new file mode 100644
index 0000000..1eeabaf
--- /dev/null
+++ b/src/backend/utils/misc/sampling.c
@@ -0,0 +1,226 @@
+/*-------------------------------------------------------------------------
+ *
+ * sampling.c
+ * Relation block sampling routines.
+ *
+ * Portions Copyright (c) 1996-2012, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ * src/backend/utils/misc/sampling.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include <math.h>
+
+#include "utils/sampling.h"
+
+
+/*
+ * BlockSampler_Init -- prepare for random sampling of blocknumbers
+ *
+ * BlockSampler provides algorithm for block level sampling of a relation
+ * as discussed on pgsql-hackers 2004-04-02 (subject "Large DB")
+ * It selects a random sample of samplesize blocks out of
+ * the nblocks blocks in the table. If the table has less than
+ * samplesize blocks, all blocks are selected.
+ *
+ * Since we know the total number of blocks in advance, we can use the
+ * straightforward Algorithm S from Knuth 3.4.2, rather than Vitter's
+ * algorithm.
+ */
+void
+BlockSampler_Init(BlockSampler bs, BlockNumber nblocks, int samplesize,
+ long randseed)
+{
+ bs->N = nblocks; /* measured table size */
+
+ /*
+ * If we decide to reduce samplesize for tables that have less or not much
+ * more than samplesize blocks, here is the place to do it.
+ */
+ bs->n = samplesize;
+ bs->t = 0; /* blocks scanned so far */
+ bs->m = 0; /* blocks selected so far */
+}
+
+bool
+BlockSampler_HasMore(BlockSampler bs)
+{
+ return (bs->t < bs->N) && (bs->m < bs->n);
+}
+
+BlockNumber
+BlockSampler_Next(BlockSampler bs)
+{
+ BlockNumber K = bs->N - bs->t; /* remaining blocks */
+ int k = bs->n - bs->m; /* blocks still to sample */
+ double p; /* probability to skip block */
+ double V; /* random */
+
+ Assert(BlockSampler_HasMore(bs)); /* hence K > 0 and k > 0 */
+
+ if ((BlockNumber) k >= K)
+ {
+ /* need all the rest */
+ bs->m++;
+ return bs->t++;
+ }
+
+ /*----------
+ * It is not obvious that this code matches Knuth's Algorithm S.
+ * Knuth says to skip the current block with probability 1 - k/K.
+ * If we are to skip, we should advance t (hence decrease K), and
+ * repeat the same probabilistic test for the next block. The naive
+ * implementation thus requires an sampler_random_fract() call for each
+ * block number. But we can reduce this to one sampler_random_fract()
+ * call per selected block, by noting that each time the while-test
+ * succeeds, we can reinterpret V as a uniform random number in the range
+ * 0 to p. Therefore, instead of choosing a new V, we just adjust p to be
+ * the appropriate fraction of its former value, and our next loop
+ * makes the appropriate probabilistic test.
+ *
+ * We have initially K > k > 0. If the loop reduces K to equal k,
+ * the next while-test must fail since p will become exactly zero
+ * (we assume there will not be roundoff error in the division).
+ * (Note: Knuth suggests a "<=" loop condition, but we use "<" just
+ * to be doubly sure about roundoff error.) Therefore K cannot become
+ * less than k, which means that we cannot fail to select enough blocks.
+ *----------
+ */
+ V = sampler_random_fract();
+ p = 1.0 - (double) k / (double) K;
+ while (V < p)
+ {
+ /* skip */
+ bs->t++;
+ K--; /* keep K == N - t */
+
+ /* adjust p to be new cutoff point in reduced range */
+ p *= 1.0 - (double) k / (double) K;
+ }
+
+ /* select */
+ bs->m++;
+ return bs->t++;
+}
+
+/*
+ * These two routines embody Algorithm Z from "Random sampling with a
+ * reservoir" by Jeffrey S. Vitter, in ACM Trans. Math. Softw. 11, 1
+ * (Mar. 1985), Pages 37-57. Vitter describes his algorithm in terms
+ * of the count S of records to skip before processing another record.
+ * It is computed primarily based on t, the number of records already read.
+ * The only extra state needed between calls is W, a random state variable.
+ *
+ * reservoir_init_selection_state computes the initial W value.
+ *
+ * Given that we've already read t records (t >= n), reservoir_get_next_S
+ * determines the number of records to skip before the next record is
+ * processed.
+ */
+void
+reservoir_init_selection_state(ReservoirState rs, int n)
+{
+ /* Initial value of W (for use when Algorithm Z is first applied) */
+ *rs = exp(-log(sampler_random_fract()) / n);
+}
+
+double
+reservoir_get_next_S(ReservoirState rs, double t, int n)
+{
+ double S;
+
+ /* The magic constant here is T from Vitter's paper */
+ if (t <= (22.0 * n))
+ {
+ /* Process records using Algorithm X until t is large enough */
+ double V,
+ quot;
+
+ V = sampler_random_fract(); /* Generate V */
+ S = 0;
+ t += 1;
+ /* Note: "num" in Vitter's code is always equal to t - n */
+ quot = (t - (double) n) / t;
+ /* Find min S satisfying (4.1) */
+ while (quot > V)
+ {
+ S += 1;
+ t += 1;
+ quot *= (t - (double) n) / t;
+ }
+ }
+ else
+ {
+ /* Now apply Algorithm Z */
+ double W = *rs;
+ double term = t - (double) n + 1;
+
+ for (;;)
+ {
+ double numer,
+ numer_lim,
+ denom;
+ double U,
+ X,
+ lhs,
+ rhs,
+ y,
+ tmp;
+
+ /* Generate U and X */
+ U = sampler_random_fract();
+ X = t * (W - 1.0);
+ S = floor(X); /* S is tentatively set to floor(X) */
+ /* Test if U <= h(S)/cg(X) in the manner of (6.3) */
+ tmp = (t + 1) / term;
+ lhs = exp(log(((U * tmp * tmp) * (term + S)) / (t + X)) / n);
+ rhs = (((t + X) / (term + S)) * term) / t;
+ if (lhs <= rhs)
+ {
+ W = rhs / lhs;
+ break;
+ }
+ /* Test if U <= f(S)/cg(X) */
+ y = (((U * (t + 1)) / term) * (t + S + 1)) / (t + X);
+ if ((double) n < S)
+ {
+ denom = t;
+ numer_lim = term + S;
+ }
+ else
+ {
+ denom = t - (double) n + S;
+ numer_lim = t + 1;
+ }
+ for (numer = t + S; numer >= numer_lim; numer -= 1)
+ {
+ y *= numer / denom;
+ denom -= 1;
+ }
+ W = exp(-log(sampler_random_fract()) / n); /* Generate W in advance */
+ if (exp(log(y) / n) <= (t + X) / t)
+ break;
+ }
+ *rs = W;
+ }
+ return S;
+}
+
+
+/*----------
+ * Random number generator used by sampling
+ *----------
+ */
+
+/* Select a random value R uniformly distributed in (0 - 1) */
+double
+sampler_random_fract()
+{
+ return ((double) random() + 1) / ((double) MAX_RANDOM_VALUE + 2);
+}
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index 4275484..d38fead 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -178,8 +178,5 @@ extern void lazy_vacuum_rel(Relation onerel, VacuumStmt *vacstmt,
extern void analyze_rel(Oid relid, VacuumStmt *vacstmt,
bool in_outer_xact, BufferAccessStrategy bstrategy);
extern bool std_typanalyze(VacAttrStats *stats);
-extern double anl_random_fract(void);
-extern double anl_init_selection_state(int n);
-extern double anl_get_next_S(double t, int n, double *stateptr);
#endif /* VACUUM_H */
diff --git a/src/include/utils/sampling.h b/src/include/utils/sampling.h
new file mode 100644
index 0000000..e3e7f9c
--- /dev/null
+++ b/src/include/utils/sampling.h
@@ -0,0 +1,44 @@
+/*-------------------------------------------------------------------------
+ *
+ * sampling.h
+ * definitions for sampling functions
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/utils/sampling.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef SAMPLING_H
+#define SAMPLING_H
+
+#include "storage/bufmgr.h"
+
+extern double sampler_random_fract(void);
+
+/* Block sampling methods */
+/* Data structure for Algorithm S from Knuth 3.4.2 */
+typedef struct
+{
+ BlockNumber N; /* number of blocks, known in advance */
+ int n; /* desired sample size */
+ BlockNumber t; /* current block number */
+ int m; /* blocks selected so far */
+} BlockSamplerData;
+
+typedef BlockSamplerData *BlockSampler;
+
+extern void BlockSampler_Init(BlockSampler bs, BlockNumber nblocks,
+ int samplesize, long randseed);
+extern bool BlockSampler_HasMore(BlockSampler bs);
+extern BlockNumber BlockSampler_Next(BlockSampler bs);
+
+/* Reservoid sampling methods */
+typedef double ReservoirStateData;
+typedef ReservoirStateData *ReservoirState;
+
+extern void reservoir_init_selection_state(ReservoirState rs, int n);
+extern double reservoir_get_next_S(ReservoirState rs, double t, int n);
+
+#endif /* SAMPLING_H */
--
1.9.1
0002-tablesample-v7.patchtext/x-diff; name=0002-tablesample-v7.patchDownload
>From f7a4c015a74ff0c8cfe5bc76b3a67d134d277d6b Mon Sep 17 00:00:00 2001
From: Petr Jelinek <pjmodos@pjmodos.net>
Date: Wed, 7 Jan 2015 23:37:55 +0100
Subject: [PATCH 2/3] tablesample v7
---
contrib/file_fdw/file_fdw.c | 2 +-
contrib/postgres_fdw/postgres_fdw.c | 2 +-
doc/src/sgml/ref/select.sgml | 38 ++-
src/backend/access/Makefile | 3 +-
src/backend/catalog/Makefile | 2 +-
src/backend/commands/analyze.c | 2 +-
src/backend/commands/explain.c | 7 +
src/backend/executor/Makefile | 2 +-
src/backend/executor/execAmi.c | 8 +
src/backend/executor/execCurrent.c | 1 +
src/backend/executor/execProcnode.c | 14 +
src/backend/executor/nodeSamplescan.c | 448 ++++++++++++++++++++++++++++
src/backend/nodes/copyfuncs.c | 58 ++++
src/backend/nodes/equalfuncs.c | 35 +++
src/backend/nodes/nodeFuncs.c | 12 +
src/backend/nodes/outfuncs.c | 46 +++
src/backend/nodes/readfuncs.c | 43 +++
src/backend/optimizer/path/allpaths.c | 31 ++
src/backend/optimizer/path/costsize.c | 70 +++++
src/backend/optimizer/plan/createplan.c | 69 +++++
src/backend/optimizer/plan/setrefs.c | 11 +
src/backend/optimizer/plan/subselect.c | 1 +
src/backend/optimizer/util/pathnode.c | 22 ++
src/backend/parser/gram.y | 40 ++-
src/backend/parser/parse_clause.c | 38 ++-
src/backend/parser/parse_func.c | 129 ++++++++
src/backend/utils/Makefile | 3 +-
src/backend/utils/adt/ruleutils.c | 50 ++++
src/backend/utils/cache/syscache.c | 23 ++
src/backend/utils/misc/sampling.c | 33 +-
src/backend/utils/tablesample/Makefile | 17 ++
src/backend/utils/tablesample/bernoulli.c | 206 +++++++++++++
src/backend/utils/tablesample/system.c | 187 ++++++++++++
src/include/catalog/indexing.h | 5 +
src/include/catalog/pg_proc.h | 25 ++
src/include/catalog/pg_tablesample_method.h | 73 +++++
src/include/executor/nodeSamplescan.h | 24 ++
src/include/nodes/execnodes.h | 21 ++
src/include/nodes/nodes.h | 4 +
src/include/nodes/parsenodes.h | 34 +++
src/include/nodes/plannodes.h | 6 +
src/include/optimizer/cost.h | 1 +
src/include/optimizer/pathnode.h | 2 +
src/include/parser/kwlist.h | 3 +-
src/include/parser/parse_func.h | 4 +
src/include/utils/rel.h | 1 -
src/include/utils/sampling.h | 15 +-
src/include/utils/syscache.h | 2 +
src/include/utils/tablesample.h | 33 ++
src/test/regress/expected/sanity_check.out | 1 +
src/test/regress/expected/tablesample.out | 165 ++++++++++
src/test/regress/parallel_schedule | 2 +-
src/test/regress/serial_schedule | 1 +
src/test/regress/sql/tablesample.sql | 39 +++
54 files changed, 2087 insertions(+), 27 deletions(-)
create mode 100644 src/backend/executor/nodeSamplescan.c
create mode 100644 src/backend/utils/tablesample/Makefile
create mode 100644 src/backend/utils/tablesample/bernoulli.c
create mode 100644 src/backend/utils/tablesample/system.c
create mode 100644 src/include/catalog/pg_tablesample_method.h
create mode 100644 src/include/executor/nodeSamplescan.h
create mode 100644 src/include/utils/tablesample.h
create mode 100644 src/test/regress/expected/tablesample.out
create mode 100644 src/test/regress/sql/tablesample.sql
diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index df732c0..3fc3962 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -1096,7 +1096,7 @@ file_acquire_sample_rows(Relation onerel, int elevel,
* Found a suitable tuple, so save it, replacing one old tuple
* at random
*/
- int k = (int) (targrows * sampler_random_fract());
+ int k = (int) (targrows * sampler_random_fract(rstate.randstate));
Assert(k >= 0 && k < targrows);
heap_freetuple(rows[k]);
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index cbcba6e..95f196e 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -2539,7 +2539,7 @@ analyze_row_processor(PGresult *res, int row, PgFdwAnalyzeState *astate)
if (astate->rowstoskip <= 0)
{
/* Choose a random reservoir element to replace. */
- pos = (int) (targrows * sampler_random_fract());
+ pos = (int) (targrows * sampler_random_fract(astate->rstate.randstate));
Assert(pos >= 0 && pos < targrows);
heap_freetuple(astate->rows[pos]);
}
diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml
index 01d24a5..407bf9d 100644
--- a/doc/src/sgml/ref/select.sgml
+++ b/doc/src/sgml/ref/select.sgml
@@ -49,7 +49,7 @@ SELECT [ ALL | DISTINCT [ ON ( <replaceable class="parameter">expression</replac
<phrase>where <replaceable class="parameter">from_item</replaceable> can be one of:</phrase>
- [ ONLY ] <replaceable class="parameter">table_name</replaceable> [ * ] [ [ AS ] <replaceable class="parameter">alias</replaceable> [ ( <replaceable class="parameter">column_alias</replaceable> [, ...] ) ] ]
+ [ ONLY ] <replaceable class="parameter">table_name</replaceable> [ * ] [ TABLESAMPLE <replaceable class="parameter">sampling_method</replaceable> ( <replaceable class="parameter">argument</replaceable> [, ...] ) [ REPEATABLE ( <replaceable class="parameter">seed</replaceable> ) ] ] [ [ AS ] <replaceable class="parameter">alias</replaceable> [ ( <replaceable class="parameter">column_alias</replaceable> [, ...] ) ] ]
[ LATERAL ] ( <replaceable class="parameter">select</replaceable> ) [ AS ] <replaceable class="parameter">alias</replaceable> [ ( <replaceable class="parameter">column_alias</replaceable> [, ...] ) ]
<replaceable class="parameter">with_query_name</replaceable> [ [ AS ] <replaceable class="parameter">alias</replaceable> [ ( <replaceable class="parameter">column_alias</replaceable> [, ...] ) ] ]
[ LATERAL ] <replaceable class="parameter">function_name</replaceable> ( [ <replaceable class="parameter">argument</replaceable> [, ...] ] )
@@ -317,6 +317,42 @@ TABLE [ ONLY ] <replaceable class="parameter">table_name</replaceable> [ * ]
</varlistentry>
<varlistentry>
+ <term>TABLESAMPLE <replaceable class="parameter">sampling_method</replaceable> ( <replaceable class="parameter">argument</replaceable> [, ...] ) [ REPEATABLE ( <replaceable class="parameter">seed</replaceable> ) ]</term>
+ <listitem>
+ <para>
+ Table sample clause after
+ <replaceable class="parameter">table_name</replaceable> indicates that
+ a <replaceable class="parameter">sampling_method</replaceable> should
+ be used to retrieve subset of rows in the table.
+ The <replaceable class="parameter">sampling_method</replaceable> can be
+ one of:
+ <itemizedlist>
+ <listitem>
+ <para><literal>SYSTEM</literal></para>
+ </listitem>
+ <listitem>
+ <para><literal>BERNOULLI</literal></para>
+ </listitem>
+ </itemizedlist>
+ Both of those sampling methods currently accept only single argument
+ which is the percent (floating point from 0 to 100) of the rows to
+ be returned.
+ The <literal>SYSTEM</literal> sampling method does block level
+ sampling with each block having same chance of being selected and
+ returns all rows from each selected block.
+ The <literal>BERNOULLI</literal> scans whole table and returns
+ individual rows with equal probability.
+ The optional numeric parameter <literal>REPEATABLE</literal> is used
+ as random seed for sampling. Note that subsequent commands may return
+ different results even if same <literal>REPEATABLE</literal> clause
+ was specified. This happens because <acronym>DML</acronym> statements
+ and maintenance operations such as <command>VACUUM</> affect physical
+ distribution of data.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><replaceable class="parameter">alias</replaceable></term>
<listitem>
<para>
diff --git a/src/backend/access/Makefile b/src/backend/access/Makefile
index 21721b4..238057a 100644
--- a/src/backend/access/Makefile
+++ b/src/backend/access/Makefile
@@ -8,6 +8,7 @@ subdir = src/backend/access
top_builddir = ../../..
include $(top_builddir)/src/Makefile.global
-SUBDIRS = brin common gin gist hash heap index nbtree rmgrdesc spgist transam
+SUBDIRS = brin common gin gist hash heap index nbtree rmgrdesc spgist \
+ transam
include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/catalog/Makefile b/src/backend/catalog/Makefile
index a403c64..34db3e6 100644
--- a/src/backend/catalog/Makefile
+++ b/src/backend/catalog/Makefile
@@ -39,7 +39,7 @@ POSTGRES_BKI_SRCS = $(addprefix $(top_srcdir)/src/include/catalog/,\
pg_ts_config.h pg_ts_config_map.h pg_ts_dict.h \
pg_ts_parser.h pg_ts_template.h pg_extension.h \
pg_foreign_data_wrapper.h pg_foreign_server.h pg_user_mapping.h \
- pg_foreign_table.h pg_policy.h \
+ pg_foreign_table.h pg_policy.h pg_tablesample_method.h \
pg_default_acl.h pg_seclabel.h pg_shseclabel.h pg_collation.h pg_range.h \
toasting.h indexing.h \
)
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index fc9dd44..63feb07 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -1146,7 +1146,7 @@ acquire_sample_rows(Relation onerel, int elevel,
* Found a suitable tuple, so save it, replacing one
* old tuple at random
*/
- int k = (int) (targrows * sampler_random_fract());
+ int k = (int) (targrows * sampler_random_fract(rstate.randstate));
Assert(k >= 0 && k < targrows);
heap_freetuple(rows[k]);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 7cfc9bb..22525af 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -732,6 +732,7 @@ ExplainPreScanNode(PlanState *planstate, Bitmapset **rels_used)
case T_WorkTableScan:
case T_ForeignScan:
case T_CustomScan:
+ case T_SampleScan:
*rels_used = bms_add_member(*rels_used,
((Scan *) plan)->scanrelid);
break;
@@ -958,6 +959,9 @@ ExplainNode(PlanState *planstate, List *ancestors,
else
pname = sname;
break;
+ case T_SampleScan:
+ pname = sname = "Sample Scan";
+ break;
case T_Material:
pname = sname = "Materialize";
break;
@@ -1075,6 +1079,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
case T_WorkTableScan:
case T_ForeignScan:
case T_CustomScan:
+ case T_SampleScan:
ExplainScanTarget((Scan *) plan, es);
break;
case T_IndexScan:
@@ -1327,6 +1332,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
case T_CteScan:
case T_WorkTableScan:
case T_SubqueryScan:
+ case T_SampleScan:
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
@@ -2230,6 +2236,7 @@ ExplainTargetRel(Plan *plan, Index rti, ExplainState *es)
case T_TidScan:
case T_ForeignScan:
case T_CustomScan:
+ case T_SampleScan:
case T_ModifyTable:
/* Assert it's on a real relation */
Assert(rte->rtekind == RTE_RELATION);
diff --git a/src/backend/executor/Makefile b/src/backend/executor/Makefile
index af707b0..75f799c 100644
--- a/src/backend/executor/Makefile
+++ b/src/backend/executor/Makefile
@@ -21,7 +21,7 @@ OBJS = execAmi.o execCurrent.o execGrouping.o execJunk.o execMain.o \
nodeLimit.o nodeLockRows.o \
nodeMaterial.o nodeMergeAppend.o nodeMergejoin.o nodeModifyTable.o \
nodeNestloop.o nodeFunctionscan.o nodeRecursiveunion.o nodeResult.o \
- nodeSeqscan.o nodeSetOp.o nodeSort.o nodeUnique.o \
+ nodeSamplescan.o nodeSeqscan.o nodeSetOp.o nodeSort.o nodeUnique.o \
nodeValuesscan.o nodeCtescan.o nodeWorktablescan.o \
nodeGroup.o nodeSubplan.o nodeSubqueryscan.o nodeTidscan.o \
nodeForeignscan.o nodeWindowAgg.o tstoreReceiver.o spi.o
diff --git a/src/backend/executor/execAmi.c b/src/backend/executor/execAmi.c
index 6ebad2f..4948a26 100644
--- a/src/backend/executor/execAmi.c
+++ b/src/backend/executor/execAmi.c
@@ -39,6 +39,7 @@
#include "executor/nodeNestloop.h"
#include "executor/nodeRecursiveunion.h"
#include "executor/nodeResult.h"
+#include "executor/nodeSamplescan.h"
#include "executor/nodeSeqscan.h"
#include "executor/nodeSetOp.h"
#include "executor/nodeSort.h"
@@ -155,6 +156,10 @@ ExecReScan(PlanState *node)
ExecReScanSeqScan((SeqScanState *) node);
break;
+ case T_SampleScanState:
+ ExecReScanSampleScan((SampleScanState *) node);
+ break;
+
case T_IndexScanState:
ExecReScanIndexScan((IndexScanState *) node);
break;
@@ -480,6 +485,9 @@ ExecSupportsBackwardScan(Plan *node)
}
return false;
+ case T_SampleScan:
+ return false;
+
case T_Material:
case T_Sort:
/* these don't evaluate tlist */
diff --git a/src/backend/executor/execCurrent.c b/src/backend/executor/execCurrent.c
index 1c8be25..5cfe549 100644
--- a/src/backend/executor/execCurrent.c
+++ b/src/backend/executor/execCurrent.c
@@ -261,6 +261,7 @@ search_plan_tree(PlanState *node, Oid table_oid)
* Relation scan nodes can all be treated alike
*/
case T_SeqScanState:
+ case T_SampleScanState:
case T_IndexScanState:
case T_IndexOnlyScanState:
case T_BitmapHeapScanState:
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 9892499..03c2feb 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -102,6 +102,7 @@
#include "executor/nodeNestloop.h"
#include "executor/nodeRecursiveunion.h"
#include "executor/nodeResult.h"
+#include "executor/nodeSamplescan.h"
#include "executor/nodeSeqscan.h"
#include "executor/nodeSetOp.h"
#include "executor/nodeSort.h"
@@ -190,6 +191,11 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
estate, eflags);
break;
+ case T_SampleScan:
+ result = (PlanState *) ExecInitSampleScan((SampleScan *) node,
+ estate, eflags);
+ break;
+
case T_IndexScan:
result = (PlanState *) ExecInitIndexScan((IndexScan *) node,
estate, eflags);
@@ -406,6 +412,10 @@ ExecProcNode(PlanState *node)
result = ExecSeqScan((SeqScanState *) node);
break;
+ case T_SampleScanState:
+ result = ExecSampleScan((SampleScanState *) node);
+ break;
+
case T_IndexScanState:
result = ExecIndexScan((IndexScanState *) node);
break;
@@ -644,6 +654,10 @@ ExecEndNode(PlanState *node)
ExecEndSeqScan((SeqScanState *) node);
break;
+ case T_SampleScanState:
+ ExecEndSampleScan((SampleScanState *) node);
+ break;
+
case T_IndexScanState:
ExecEndIndexScan((IndexScanState *) node);
break;
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
new file mode 100644
index 0000000..1b9f366
--- /dev/null
+++ b/src/backend/executor/nodeSamplescan.c
@@ -0,0 +1,448 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeSamplescan.c
+ * Support routines for sample scans of relations (table sampling).
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ * src/backend/executor/nodeSamplescan.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/relscan.h"
+#include "catalog/pg_tablesample_method.h"
+#include "executor/executor.h"
+#include "executor/nodeSamplescan.h"
+#include "miscadmin.h"
+#include "parser/parsetree.h"
+#include "storage/bufmgr.h"
+#include "storage/predicate.h"
+#include "utils/rel.h"
+#include "utils/syscache.h"
+#include "utils/tqual.h"
+
+static void InitScanRelation(SampleScanState *node, EState *estate, int eflags);
+static TupleTableSlot *SampleNext(SampleScanState *node);
+
+/*
+ * Initialize the sampling method - loads function info and
+ * calls the tsminit function.
+ *
+ * We need special handling for this because the tsminit function
+ * is allowed to take optional additional arguments.
+ */
+static void
+InitSamplingMethod(SampleScanState *scanstate, TableSampleClause *tablesample)
+{
+ FunctionCallInfoData fcinfo;
+ int i;
+ List *args = tablesample->args;
+ ListCell *arg;
+ ExprContext *econtext = scanstate->ss.ps.ps_ExprContext;
+
+ /* Load functions */
+ fmgr_info(tablesample->tsminit, &(scanstate->tsminit));
+ fmgr_info(tablesample->tsmnextblock, &(scanstate->tsmnextblock));
+ fmgr_info(tablesample->tsmnexttuple, &(scanstate->tsmnexttuple));
+ if (OidIsValid(tablesample->tsmreturntuple))
+ fmgr_info(tablesample->tsmreturntuple, &(scanstate->tsmreturntuple));
+ else
+ scanstate->tsmreturntuple.fn_oid = InvalidOid;
+ fmgr_info(tablesample->tsmend, &(scanstate->tsmend));
+ fmgr_info(tablesample->tsmreset, &(scanstate->tsmreset));
+
+ InitFunctionCallInfoData(fcinfo, &scanstate->tsminit,
+ list_length(args) + 2,
+ InvalidOid, NULL, NULL);
+
+ /* First arg is always SampleScanState */
+ fcinfo.arg[0] = PointerGetDatum(scanstate);
+ fcinfo.argnull[0] = false;
+
+ /*
+ * Second arg is always REPEATABLE
+ * When tablesample->repeatable is NULL then REPEATABLE clause was not
+ * specified.
+ * When specified, the expression cannot evaluate to NULL.
+ */
+ if (tablesample->repeatable)
+ {
+ ExprState *argstate = ExecInitExpr((Expr *) tablesample->repeatable,
+ (PlanState *) scanstate);
+ fcinfo.arg[1] = ExecEvalExpr(argstate, econtext,
+ &fcinfo.argnull[1], NULL);
+ if (fcinfo.argnull[1])
+ ereport(ERROR,
+ (errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED),
+ errmsg("REPEATABLE clause cannot be NULL")));
+ }
+ else
+ {
+ fcinfo.arg[1] = UInt32GetDatum(random());
+ fcinfo.argnull[1] = false;
+ }
+
+
+ i = 2;
+ foreach(arg, args)
+ {
+ Expr *argexpr = (Expr *) lfirst(arg);
+ ExprState *argstate = ExecInitExpr(argexpr, (PlanState *) scanstate);
+
+ if (argstate == NULL)
+ {
+ fcinfo.argnull[i] = true;
+ fcinfo.arg[i] = (Datum) 0;;
+ }
+
+ fcinfo.arg[i] = ExecEvalExpr(argstate, econtext,
+ &fcinfo.argnull[i], NULL);
+ i++;
+ }
+ Assert(i == fcinfo.nargs);
+
+ (void) FunctionCallInvoke(&fcinfo);
+}
+
+
+/* ----------------------------------------------------------------
+ * Scan Support
+ * ----------------------------------------------------------------
+ */
+
+/* ----------------------------------------------------------------
+ * SampleNext
+ *
+ * This is a workhorse for ExecSampleScan
+ * ----------------------------------------------------------------
+ */
+static TupleTableSlot *
+SampleNext(SampleScanState *node)
+{
+ EState *estate;
+ TupleTableSlot *slot;
+ BlockNumber blockno = InvalidBlockNumber;
+ Snapshot snapshot;
+ Relation relation;
+ bool found = false;
+ bool retry = false;
+ Buffer buffer;
+ Page page;
+ HeapTuple tuple = &(node->tup);
+ OffsetNumber tupoffset,
+ maxoffset,
+ all_visible;
+
+ /*
+ * get information from the estate and scan state
+ */
+ estate = node->ss.ps.state;
+ snapshot = estate->es_snapshot;
+ slot = node->ss.ss_ScanTupleSlot;
+ relation = node->ss.ss_currentRelation;
+ buffer = node->openbuffer;
+
+ if (BufferIsValid(buffer))
+ {
+ blockno = BufferGetBlockNumber(buffer);
+ page = BufferGetPage(buffer);
+ maxoffset = PageGetMaxOffsetNumber(page);
+ all_visible = PageIsAllVisible(page) && !snapshot->takenDuringRecovery;
+ }
+
+ /*
+ * get the next tuple from the table
+ */
+ for (;;)
+ {
+ ItemId itemid;
+ bool visible;
+
+ CHECK_FOR_INTERRUPTS();
+
+ /* Load next block if needed. */
+ if (!BufferIsValid(buffer))
+ {
+ blockno = DatumGetInt32(FunctionCall2(&node->tsmnextblock,
+ PointerGetDatum(node),
+ BoolGetDatum(retry)));
+ /* No more blocks to fetch */
+ if (!BlockNumberIsValid(blockno))
+ break;
+
+ buffer = ReadBufferExtended(relation, MAIN_FORKNUM, blockno,
+ RBM_NORMAL, NULL);
+ /*
+ * Prune and repair fragmentation for the whole page, if possible.
+ */
+ heap_page_prune_opt(relation, buffer);
+
+ /*
+ * Lock the buffer so we can safely assess tuple
+ * visibility later.
+ */
+ LockBuffer(buffer, BUFFER_LOCK_SHARE);
+
+ node->openbuffer = buffer;
+ page = BufferGetPage(buffer);
+ maxoffset = PageGetMaxOffsetNumber(page);
+ all_visible = PageIsAllVisible(page) && !snapshot->takenDuringRecovery;
+ }
+
+ tupoffset = DatumGetUInt16(FunctionCall4(&node->tsmnexttuple,
+ PointerGetDatum(node),
+ UInt32GetDatum(blockno),
+ UInt16GetDatum(maxoffset),
+ BoolGetDatum(retry)));
+ /* Go to next block. */
+ if (!OffsetNumberIsValid(tupoffset))
+ {
+ UnlockReleaseBuffer(buffer);
+ node->openbuffer = buffer = InvalidBuffer;
+ continue;
+ }
+ retry = true;
+
+ /* Skip invalid tuple pointers. */
+ itemid = PageGetItemId(page, tupoffset);
+ if (!ItemIdIsNormal(itemid))
+ continue;
+
+ tuple->t_tableOid = RelationGetRelid(relation);
+ tuple->t_data = (HeapTupleHeader) PageGetItem(page, itemid);
+ tuple->t_len = ItemIdGetLength(itemid);
+ ItemPointerSet(&tuple->t_self, blockno, tupoffset);
+
+ /* Check visibility. */
+ if (all_visible)
+ visible = true;
+ else
+ visible = HeapTupleSatisfiesVisibility(tuple, snapshot, buffer);
+
+ CheckForSerializableConflictOut(found, relation, tuple, buffer, snapshot);
+
+ /*
+ * Let the sampling method examine the actual tuple and decide if we
+ * should return it.
+ *
+ * Note that we let it examine even invisible tuples.
+ */
+ if (OidIsValid(node->tsmreturntuple.fn_oid))
+ {
+ found = DatumGetBool(FunctionCall4(&node->tsmreturntuple,
+ PointerGetDatum(node),
+ UInt32GetDatum(blockno),
+ PointerGetDatum(tuple),
+ BoolGetDatum(visible)));
+ /* XXX: better error */
+ if (found && !visible)
+ elog(ERROR, "Sampling method wanted to return invisible tuple");
+ }
+ else
+ found = visible;
+
+ /* Found visible tuple, return it. */
+ if (found)
+ break;
+ }
+
+ if (found)
+ ExecStoreTuple(tuple, /* tuple to store */
+ slot, /* slot to store in */
+ buffer, /* buffer associated with this tuple */
+ false); /* don't pfree this pointer */
+ else
+ ExecClearTuple(slot);
+
+ return slot;
+}
+
+/*
+ * SampleRecheck -- access method routine to recheck a tuple in EvalPlanQual
+ */
+static bool
+SampleRecheck(SampleScanState *node, TupleTableSlot *slot)
+{
+ /* No need to recheck for SampleScan */
+ return true;
+}
+
+/* ----------------------------------------------------------------
+ * ExecSampleScan(node)
+ *
+ * Scans the relation using the sampling method and returns
+ * the next qualifying tuple.
+ * We call the ExecScan() routine and pass it the appropriate
+ * access method functions.
+ * ----------------------------------------------------------------
+ */
+TupleTableSlot *
+ExecSampleScan(SampleScanState *node)
+{
+ return ExecScan((ScanState *) node,
+ (ExecScanAccessMtd) SampleNext,
+ (ExecScanRecheckMtd) SampleRecheck);
+}
+
+/* ----------------------------------------------------------------
+ * InitScanRelation
+ *
+ * Set up to access the scan relation.
+ * ----------------------------------------------------------------
+ */
+static void
+InitScanRelation(SampleScanState *node, EState *estate, int eflags)
+{
+ Relation currentRelation;
+
+ /*
+ * get the relation object id from the relid'th entry in the range table,
+ * open that relation and acquire appropriate lock on it.
+ */
+ currentRelation = ExecOpenScanRelation(estate,
+ ((SampleScan *) node->ss.ps.plan)->scanrelid,
+ eflags);
+
+ node->ss.ss_currentRelation = currentRelation;
+ node->ss.ss_currentScanDesc = NULL;
+
+ /* and report the scan tuple slot's rowtype */
+ ExecAssignScanType(&node->ss, RelationGetDescr(currentRelation));
+}
+
+
+/* ----------------------------------------------------------------
+ * ExecInitSampleScan
+ * ----------------------------------------------------------------
+ */
+SampleScanState *
+ExecInitSampleScan(SampleScan *node, EState *estate, int eflags)
+{
+ SampleScanState *scanstate;
+ RangeTblEntry *rte = rt_fetch(node->scanrelid,
+ estate->es_range_table);
+
+ Assert(outerPlan(node) == NULL);
+ Assert(innerPlan(node) == NULL);
+ Assert(rte->tablesample != NULL);
+
+ /*
+ * create state structure
+ */
+ scanstate = makeNode(SampleScanState);
+ scanstate->ss.ps.plan = (Plan *) node;
+ scanstate->ss.ps.state = estate;
+
+ /*
+ * Miscellaneous initialization
+ *
+ * create expression context for node
+ */
+ ExecAssignExprContext(estate, &scanstate->ss.ps);
+
+ /*
+ * initialize child expressions
+ */
+ scanstate->ss.ps.targetlist = (List *)
+ ExecInitExpr((Expr *) node->plan.targetlist,
+ (PlanState *) scanstate);
+ scanstate->ss.ps.qual = (List *)
+ ExecInitExpr((Expr *) node->plan.qual,
+ (PlanState *) scanstate);
+
+ /*
+ * tuple table initialization
+ */
+ ExecInitResultTupleSlot(estate, &scanstate->ss.ps);
+ ExecInitScanTupleSlot(estate, &scanstate->ss);
+
+ /*
+ * initialize scan relation
+ */
+ InitScanRelation(scanstate, estate, eflags);
+
+ scanstate->ss.ps.ps_TupFromTlist = false;
+
+ /*
+ * Initialize result tuple type and projection info.
+ */
+ ExecAssignResultTypeFromTL(&scanstate->ss.ps);
+ ExecAssignScanProjectionInfo(&scanstate->ss);
+
+ scanstate->openbuffer = InvalidBuffer;
+
+ InitSamplingMethod(scanstate, rte->tablesample);
+
+ return scanstate;
+}
+
+/* ----------------------------------------------------------------
+ * ExecEndSampleScan
+ *
+ * frees any storage allocated through C routines.
+ * ----------------------------------------------------------------
+ */
+void
+ExecEndSampleScan(SampleScanState *node)
+{
+ /*
+ * Tell sampling function that we finished thes can.
+ */
+ FunctionCall1(&node->tsmend, PointerGetDatum(node));
+
+ if (BufferIsValid(node->openbuffer))
+ {
+ UnlockReleaseBuffer(node->openbuffer);
+ node->openbuffer = InvalidBuffer;
+ }
+
+ /*
+ * Free the exprcontext
+ */
+ ExecFreeExprContext(&node->ss.ps);
+
+ /*
+ * clean out the tuple table
+ */
+ ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
+
+ /*
+ * close the heap relation.
+ */
+ ExecCloseScanRelation(node->ss.ss_currentRelation);
+}
+
+/* ----------------------------------------------------------------
+ * Join Support
+ * ----------------------------------------------------------------
+ */
+
+/* ----------------------------------------------------------------
+ * ExecReScanSampleScan
+ *
+ * Rescans the relation.
+ *
+ * ----------------------------------------------------------------
+ */
+void
+ExecReScanSampleScan(SampleScanState *scanstate)
+{
+ if (BufferIsValid(scanstate->openbuffer))
+ {
+ UnlockReleaseBuffer(scanstate->openbuffer);
+ scanstate->openbuffer = InvalidBuffer;
+ }
+
+ /*
+ * Tell sampling function to reset its state for rescan.
+ */
+ FunctionCall1(&scanstate->tsmreset, PointerGetDatum(scanstate));
+
+ ExecScanReScan(&scanstate->ss);
+}
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index f1a24f5..383bbf6 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -628,6 +628,22 @@ _copyCustomScan(const CustomScan *from)
}
/*
+ * _copySampleScan
+ */
+static SampleScan *
+_copySampleScan(const SampleScan *from)
+{
+ SampleScan *newnode = makeNode(SampleScan);
+
+ /*
+ * copy node superclass fields
+ */
+ CopyScanFields((const Scan *) from, (Scan *) newnode);
+
+ return newnode;
+}
+
+/*
* CopyJoinFields
*
* This function copies the fields of the Join node. It is used by
@@ -2006,6 +2022,7 @@ _copyRangeTblEntry(const RangeTblEntry *from)
COPY_SCALAR_FIELD(rtekind);
COPY_SCALAR_FIELD(relid);
COPY_SCALAR_FIELD(relkind);
+ COPY_NODE_FIELD(tablesample);
COPY_NODE_FIELD(subquery);
COPY_SCALAR_FIELD(security_barrier);
COPY_SCALAR_FIELD(jointype);
@@ -2138,6 +2155,38 @@ _copyCommonTableExpr(const CommonTableExpr *from)
return newnode;
}
+static RangeTableSample *
+_copyRangeTableSample(const RangeTableSample *from)
+{
+ RangeTableSample *newnode = makeNode(RangeTableSample);
+
+ COPY_NODE_FIELD(relation);
+ COPY_STRING_FIELD(method);
+ COPY_NODE_FIELD(repeatable);
+ COPY_NODE_FIELD(args);
+
+ return newnode;
+}
+
+static TableSampleClause *
+_copyTableSampleClause(const TableSampleClause *from)
+{
+ TableSampleClause *newnode = makeNode(TableSampleClause);
+
+ COPY_SCALAR_FIELD(tsmid);
+ COPY_SCALAR_FIELD(tsminit);
+ COPY_SCALAR_FIELD(tsmnextblock);
+ COPY_SCALAR_FIELD(tsmnexttuple);
+ COPY_SCALAR_FIELD(tsmreturntuple);
+ COPY_SCALAR_FIELD(tsmend);
+ COPY_SCALAR_FIELD(tsmreset);
+ COPY_SCALAR_FIELD(tsmcost);
+ COPY_NODE_FIELD(repeatable);
+ COPY_NODE_FIELD(args);
+
+ return newnode;
+}
+
static A_Expr *
_copyAExpr(const A_Expr *from)
{
@@ -4075,6 +4124,9 @@ copyObject(const void *from)
case T_CustomScan:
retval = _copyCustomScan(from);
break;
+ case T_SampleScan:
+ retval = _copySampleScan(from);
+ break;
case T_Join:
retval = _copyJoin(from);
break;
@@ -4723,6 +4775,12 @@ copyObject(const void *from)
case T_CommonTableExpr:
retval = _copyCommonTableExpr(from);
break;
+ case T_RangeTableSample:
+ retval = _copyRangeTableSample(from);
+ break;
+ case T_TableSampleClause:
+ retval = _copyTableSampleClause(from);
+ break;
case T_PrivGrantee:
retval = _copyPrivGrantee(from);
break;
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 6e8b308..07c69c9 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -2323,6 +2323,7 @@ _equalRangeTblEntry(const RangeTblEntry *a, const RangeTblEntry *b)
COMPARE_SCALAR_FIELD(rtekind);
COMPARE_SCALAR_FIELD(relid);
COMPARE_SCALAR_FIELD(relkind);
+ COMPARE_NODE_FIELD(tablesample);
COMPARE_NODE_FIELD(subquery);
COMPARE_SCALAR_FIELD(security_barrier);
COMPARE_SCALAR_FIELD(jointype);
@@ -2442,6 +2443,34 @@ _equalCommonTableExpr(const CommonTableExpr *a, const CommonTableExpr *b)
}
static bool
+_equalRangeTableSample(const RangeTableSample *a, const RangeTableSample *b)
+{
+ COMPARE_NODE_FIELD(relation);
+ COMPARE_STRING_FIELD(method);
+ COMPARE_NODE_FIELD(repeatable);
+ COMPARE_NODE_FIELD(args);
+
+ return true;
+}
+
+static bool
+_equalTableSampleClause(const TableSampleClause *a, const TableSampleClause *b)
+{
+ COMPARE_SCALAR_FIELD(tsmid);
+ COMPARE_SCALAR_FIELD(tsminit);
+ COMPARE_SCALAR_FIELD(tsmnextblock);
+ COMPARE_SCALAR_FIELD(tsmnexttuple);
+ COMPARE_SCALAR_FIELD(tsmreturntuple);
+ COMPARE_SCALAR_FIELD(tsmend);
+ COMPARE_SCALAR_FIELD(tsmreset);
+ COMPARE_SCALAR_FIELD(tsmcost);
+ COMPARE_NODE_FIELD(repeatable);
+ COMPARE_NODE_FIELD(args);
+
+ return true;
+}
+
+static bool
_equalXmlSerialize(const XmlSerialize *a, const XmlSerialize *b)
{
COMPARE_SCALAR_FIELD(xmloption);
@@ -3150,6 +3179,12 @@ equal(const void *a, const void *b)
case T_CommonTableExpr:
retval = _equalCommonTableExpr(a, b);
break;
+ case T_RangeTableSample:
+ retval = _equalRangeTableSample(a, b);
+ break;
+ case T_TableSampleClause:
+ retval = _equalTableSampleClause(a, b);
+ break;
case T_PrivGrantee:
retval = _equalPrivGrantee(a, b);
break;
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index 21dfda7..bd9ce09 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -3209,6 +3209,18 @@ raw_expression_tree_walker(Node *node,
return walker(((WithClause *) node)->ctes, context);
case T_CommonTableExpr:
return walker(((CommonTableExpr *) node)->ctequery, context);
+ case T_RangeTableSample:
+ {
+ RangeTableSample *rts = (RangeTableSample *) node;
+
+ if (walker(rts->relation, context))
+ return true;
+ if (walker(rts->repeatable, context))
+ return true;
+ if (walker(rts->args, context))
+ return true;
+ }
+ break;
default:
elog(ERROR, "unrecognized node type: %d",
(int) nodeTag(node));
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index dd1278b..c343732 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -578,6 +578,14 @@ _outCustomScan(StringInfo str, const CustomScan *node)
}
static void
+_outSampleScan(StringInfo str, const SampleScan *node)
+{
+ WRITE_NODE_TYPE("SAMPLESCAN");
+
+ _outScanInfo(str, (const Scan *) node);
+}
+
+static void
_outJoin(StringInfo str, const Join *node)
{
WRITE_NODE_TYPE("JOIN");
@@ -2391,6 +2399,34 @@ _outCommonTableExpr(StringInfo str, const CommonTableExpr *node)
}
static void
+_outRangeTableSample(StringInfo str, const RangeTableSample *node)
+{
+ WRITE_NODE_TYPE("RANGETABLESAMPLE");
+
+ WRITE_NODE_FIELD(relation);
+ WRITE_STRING_FIELD(method);
+ WRITE_NODE_FIELD(repeatable);
+ WRITE_NODE_FIELD(args);
+}
+
+static void
+_outTableSampleClause(StringInfo str, const TableSampleClause *node)
+{
+ WRITE_NODE_TYPE("TABLESAMPLECLAUSE");
+
+ WRITE_OID_FIELD(tsmid);
+ WRITE_OID_FIELD(tsminit);
+ WRITE_OID_FIELD(tsmnextblock);
+ WRITE_OID_FIELD(tsmnexttuple);
+ WRITE_OID_FIELD(tsmreturntuple);
+ WRITE_OID_FIELD(tsmend);
+ WRITE_OID_FIELD(tsmreset);
+ WRITE_OID_FIELD(tsmcost);
+ WRITE_NODE_FIELD(repeatable);
+ WRITE_NODE_FIELD(args);
+}
+
+static void
_outSetOperationStmt(StringInfo str, const SetOperationStmt *node)
{
WRITE_NODE_TYPE("SETOPERATIONSTMT");
@@ -2420,6 +2456,7 @@ _outRangeTblEntry(StringInfo str, const RangeTblEntry *node)
case RTE_RELATION:
WRITE_OID_FIELD(relid);
WRITE_CHAR_FIELD(relkind);
+ WRITE_NODE_FIELD(tablesample);
break;
case RTE_SUBQUERY:
WRITE_NODE_FIELD(subquery);
@@ -2887,6 +2924,9 @@ _outNode(StringInfo str, const void *obj)
case T_CustomScan:
_outCustomScan(str, obj);
break;
+ case T_SampleScan:
+ _outSampleScan(str, obj);
+ break;
case T_Join:
_outJoin(str, obj);
break;
@@ -3228,6 +3268,12 @@ _outNode(StringInfo str, const void *obj)
case T_CommonTableExpr:
_outCommonTableExpr(str, obj);
break;
+ case T_RangeTableSample:
+ _outRangeTableSample(str, obj);
+ break;
+ case T_TableSampleClause:
+ _outTableSampleClause(str, obj);
+ break;
case T_SetOperationStmt:
_outSetOperationStmt(str, obj);
break;
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index ae24d05..d08d8e9 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -350,6 +350,44 @@ _readCommonTableExpr(void)
}
/*
+ * _readRangeTableSample
+ */
+static RangeTableSample *
+_readRangeTableSample(void)
+{
+ READ_LOCALS(RangeTableSample);
+
+ READ_NODE_FIELD(relation);
+ READ_STRING_FIELD(method);
+ READ_NODE_FIELD(repeatable);
+ READ_NODE_FIELD(args);
+
+ READ_DONE();
+}
+
+/*
+ * _readTableSampleClause
+ */
+static TableSampleClause *
+_readTableSampleClause(void)
+{
+ READ_LOCALS(TableSampleClause);
+
+ READ_OID_FIELD(tsmid);
+ READ_OID_FIELD(tsminit);
+ READ_OID_FIELD(tsmnextblock);
+ READ_OID_FIELD(tsmnexttuple);
+ READ_OID_FIELD(tsmreturntuple);
+ READ_OID_FIELD(tsmend);
+ READ_OID_FIELD(tsmreset);
+ READ_OID_FIELD(tsmcost);
+ READ_NODE_FIELD(repeatable);
+ READ_NODE_FIELD(args);
+
+ READ_DONE();
+}
+
+/*
* _readSetOperationStmt
*/
static SetOperationStmt *
@@ -1216,6 +1254,7 @@ _readRangeTblEntry(void)
case RTE_RELATION:
READ_OID_FIELD(relid);
READ_CHAR_FIELD(relkind);
+ READ_NODE_FIELD(tablesample);
break;
case RTE_SUBQUERY:
READ_NODE_FIELD(subquery);
@@ -1311,6 +1350,10 @@ parseNodeString(void)
return_value = _readRowMarkClause();
else if (MATCH("COMMONTABLEEXPR", 15))
return_value = _readCommonTableExpr();
+ else if (MATCH("RANGETABLESAMPLE", 16))
+ return_value = _readRangeTableSample();
+ else if (MATCH("TABLESAMPLECLAUSE", 17))
+ return_value = _readTableSampleClause();
else if (MATCH("SETOPERATIONSTMT", 16))
return_value = _readSetOperationStmt();
else if (MATCH("ALIAS", 5))
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 58d78e6..9a16e46 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -71,6 +71,8 @@ static void set_plain_rel_size(PlannerInfo *root, RelOptInfo *rel,
RangeTblEntry *rte);
static void set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
RangeTblEntry *rte);
+static void set_tablesample_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
+ RangeTblEntry *rte);
static void set_foreign_size(PlannerInfo *root, RelOptInfo *rel,
RangeTblEntry *rte);
static void set_foreign_pathlist(PlannerInfo *root, RelOptInfo *rel,
@@ -332,6 +334,11 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
/* Foreign table */
set_foreign_pathlist(root, rel, rte);
}
+ else if (rte->tablesample != NULL)
+ {
+ /* Build sample scan on relation */
+ set_tablesample_rel_pathlist(root, rel, rte);
+ }
else
{
/* Plain relation */
@@ -418,6 +425,30 @@ set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
}
/*
+ * set_tablesample_rel_pathlist
+ * Build access paths for a sampled relation
+ *
+ * There is only one possible path - sampling scan
+ */
+static void
+set_tablesample_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
+{
+ Relids required_outer;
+ Path *path;
+
+ /*
+ * We don't support pushing join clauses into the quals of a seqscan, but
+ * it could still have required parameterization due to LATERAL refs in
+ * its tlist.
+ */
+ required_outer = rel->lateral_relids;
+
+ /* We only do sample scan if it was requested */
+ path = create_samplescan_path(root, rel, required_outer);
+ rel->pathlist = list_make1(path);
+}
+
+/*
* set_foreign_size
* Set size estimates for a foreign table RTE
*/
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 020558b..8342392a 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -90,6 +90,7 @@
#include "utils/lsyscache.h"
#include "utils/selfuncs.h"
#include "utils/spccache.h"
+#include "utils/tablesample.h"
#include "utils/tuplesort.h"
@@ -219,6 +220,75 @@ cost_seqscan(Path *path, PlannerInfo *root,
}
/*
+ * cost_samplescan
+ * Determines and returns the cost of scanning a relation using sampling.
+ *
+ * From planner/optimizer perspective, we dont't care all that much about cost
+ * itself since there is always only one scan path to consider when sampling
+ * scan is present, but number of rows estimation is still important.
+ *
+ * 'baserel' is the relation to be scanned
+ * 'param_info' is the ParamPathInfo if this is a parameterized path, else NULL
+ */
+void
+cost_samplescan(Path *path, PlannerInfo *root, RelOptInfo *baserel)
+{
+ Cost startup_cost = 0;
+ Cost run_cost = 0;
+ double spc_seq_page_cost,
+ spc_random_page_cost,
+ spc_page_cost;
+ QualCost qpqual_cost;
+ Cost cpu_per_tuple;
+ BlockNumber pages;
+ double tuples;
+ RangeTblEntry *rte = planner_rt_fetch(baserel->relid, root);
+ TableSampleClause *tablesample = rte->tablesample;
+ SamplerAccessStrategy strategy;
+
+ /* Should only be applied to base relations */
+ Assert(baserel->relid > 0);
+ Assert(baserel->rtekind == RTE_RELATION);
+
+ /* Call the sampling method's costing function. */
+ OidFunctionCall7(tablesample->tsmcost, PointerGetDatum(root),
+ PointerGetDatum(path), PointerGetDatum(baserel),
+ PointerGetDatum(tablesample->args),
+ PointerGetDatum(&pages), PointerGetDatum(&tuples),
+ PointerGetDatum(&strategy));
+
+ /* Mark the path with the correct row estimate */
+ if (path->param_info)
+ path->rows = path->param_info->ppi_rows;
+ else
+ path->rows = tuples;
+
+ /* fetch estimated page cost for tablespace containing table */
+ get_tablespace_page_costs(baserel->reltablespace,
+ &spc_random_page_cost,
+ &spc_seq_page_cost);
+
+
+ spc_page_cost = strategy == SAS_RANDOM ?
+ spc_random_page_cost : spc_seq_page_cost;
+
+ /*
+ * disk costs
+ */
+ run_cost += spc_page_cost * pages;
+
+ /* CPU costs */
+ get_restriction_qual_cost(root, baserel, path->param_info, &qpqual_cost);
+
+ startup_cost += qpqual_cost.startup;
+ cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple;
+ run_cost += cpu_per_tuple * tuples;
+
+ path->startup_cost = startup_cost;
+ path->total_cost = startup_cost + run_cost;
+}
+
+/*
* cost_index
* Determines and returns the cost of scanning a relation using an index.
*
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 655be81..10a5e02 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -58,6 +58,8 @@ static Material *create_material_plan(PlannerInfo *root, MaterialPath *best_path
static Plan *create_unique_plan(PlannerInfo *root, UniquePath *best_path);
static SeqScan *create_seqscan_plan(PlannerInfo *root, Path *best_path,
List *tlist, List *scan_clauses);
+static SampleScan *create_samplescan_plan(PlannerInfo *root, Path *best_path,
+ List *tlist, List *scan_clauses);
static Scan *create_indexscan_plan(PlannerInfo *root, IndexPath *best_path,
List *tlist, List *scan_clauses, bool indexonly);
static BitmapHeapScan *create_bitmap_scan_plan(PlannerInfo *root,
@@ -100,6 +102,7 @@ static List *order_qual_clauses(PlannerInfo *root, List *clauses);
static void copy_path_costsize(Plan *dest, Path *src);
static void copy_plan_costsize(Plan *dest, Plan *src);
static SeqScan *make_seqscan(List *qptlist, List *qpqual, Index scanrelid);
+static SampleScan *make_samplescan(List *qptlist, List *qpqual, Index scanrelid);
static IndexScan *make_indexscan(List *qptlist, List *qpqual, Index scanrelid,
Oid indexid, List *indexqual, List *indexqualorig,
List *indexorderby, List *indexorderbyorig,
@@ -228,6 +231,7 @@ create_plan_recurse(PlannerInfo *root, Path *best_path)
switch (best_path->pathtype)
{
case T_SeqScan:
+ case T_SampleScan:
case T_IndexScan:
case T_IndexOnlyScan:
case T_BitmapHeapScan:
@@ -343,6 +347,13 @@ create_scan_plan(PlannerInfo *root, Path *best_path)
scan_clauses);
break;
+ case T_SampleScan:
+ plan = (Plan *) create_samplescan_plan(root,
+ best_path,
+ tlist,
+ scan_clauses);
+ break;
+
case T_IndexScan:
plan = (Plan *) create_indexscan_plan(root,
(IndexPath *) best_path,
@@ -546,6 +557,7 @@ disuse_physical_tlist(PlannerInfo *root, Plan *plan, Path *path)
switch (path->pathtype)
{
case T_SeqScan:
+ case T_SampleScan:
case T_IndexScan:
case T_IndexOnlyScan:
case T_BitmapHeapScan:
@@ -1133,6 +1145,45 @@ create_seqscan_plan(PlannerInfo *root, Path *best_path,
}
/*
+ * create_samplescan_plan
+ * Returns a samplecan plan for the base relation scanned by 'best_path'
+ * with restriction clauses 'scan_clauses' and targetlist 'tlist'.
+ */
+static SampleScan *
+create_samplescan_plan(PlannerInfo *root, Path *best_path,
+ List *tlist, List *scan_clauses)
+{
+ SampleScan *scan_plan;
+ Index scan_relid = best_path->parent->relid;
+
+ /* it should be a base rel with tablesample clause... */
+ Assert(scan_relid > 0);
+ Assert(best_path->parent->rtekind == RTE_RELATION);
+ Assert(best_path->pathtype == T_SampleScan);
+
+ /* Sort clauses into best execution order */
+ scan_clauses = order_qual_clauses(root, scan_clauses);
+
+ /* Reduce RestrictInfo list to bare expressions; ignore pseudoconstants */
+ scan_clauses = extract_actual_clauses(scan_clauses, false);
+
+ /* Replace any outer-relation variables with nestloop params */
+ if (best_path->param_info)
+ {
+ scan_clauses = (List *)
+ replace_nestloop_params(root, (Node *) scan_clauses);
+ }
+
+ scan_plan = make_samplescan(tlist,
+ scan_clauses,
+ scan_relid);
+
+ copy_path_costsize(&scan_plan->plan, best_path);
+
+ return scan_plan;
+}
+
+/*
* create_indexscan_plan
* Returns an indexscan plan for the base relation scanned by 'best_path'
* with restriction clauses 'scan_clauses' and targetlist 'tlist'.
@@ -3318,6 +3369,24 @@ make_seqscan(List *qptlist,
return node;
}
+static SampleScan *
+make_samplescan(List *qptlist,
+ List *qpqual,
+ Index scanrelid)
+{
+ SampleScan *node = makeNode(SampleScan);
+ Plan *plan = &node->plan;
+
+ /* cost should be inserted by caller */
+ plan->targetlist = qptlist;
+ plan->qual = qpqual;
+ plan->lefttree = NULL;
+ plan->righttree = NULL;
+ node->scanrelid = scanrelid;
+
+ return node;
+}
+
static IndexScan *
make_indexscan(List *qptlist,
List *qpqual,
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 7703946..de33fc6 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -446,6 +446,17 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
fix_scan_list(root, splan->plan.qual, rtoffset);
}
break;
+ case T_SampleScan:
+ {
+ SampleScan *splan = (SampleScan *) plan;
+
+ splan->scanrelid += rtoffset;
+ splan->plan.targetlist =
+ fix_scan_list(root, splan->plan.targetlist, rtoffset);
+ splan->plan.qual =
+ fix_scan_list(root, splan->plan.qual, rtoffset);
+ }
+ break;
case T_IndexScan:
{
IndexScan *splan = (IndexScan *) plan;
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index 78fb6b1..191624c 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -2163,6 +2163,7 @@ finalize_plan(PlannerInfo *root, Plan *plan, Bitmapset *valid_params,
break;
case T_SeqScan:
+ case T_SampleScan:
context.paramids = bms_add_members(context.paramids, scan_params);
break;
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 1395a21..014d670 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -706,6 +706,26 @@ create_seqscan_path(PlannerInfo *root, RelOptInfo *rel, Relids required_outer)
}
/*
+ * create_samplescan_path
+ * Like seqscan but uses sampling function while scanning.
+ */
+Path *
+create_samplescan_path(PlannerInfo *root, RelOptInfo *rel, Relids required_outer)
+{
+ Path *pathnode = makeNode(Path);
+
+ pathnode->pathtype = T_SampleScan;
+ pathnode->parent = rel;
+ pathnode->param_info = get_baserel_parampathinfo(root, rel,
+ required_outer);
+ pathnode->pathkeys = NIL; /* samplescan has unordered result */
+
+ cost_samplescan(pathnode, root, rel);
+
+ return pathnode;
+}
+
+/*
* create_index_path
* Creates a path node for an index scan.
*
@@ -1921,6 +1941,8 @@ reparameterize_path(PlannerInfo *root, Path *path,
case T_SubqueryScan:
return create_subqueryscan_path(root, rel, path->pathkeys,
required_outer);
+ case T_SampleScan:
+ return (Path *) create_samplescan_path(root, rel, required_outer);
default:
break;
}
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 36dac29..ac5e095 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -447,6 +447,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
%type <range> relation_expr
%type <range> relation_expr_opt_alias
%type <target> target_el single_set_clause set_target insert_column_item
+%type <node> relation_expr_tablesample tablesample_clause opt_repeatable_clause
%type <str> generic_option_name
%type <node> generic_option_arg
@@ -611,8 +612,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
STATEMENT STATISTICS STDIN STDOUT STORAGE STRICT_P STRIP_P SUBSTRING
SYMMETRIC SYSID SYSTEM_P
- TABLE TABLES TABLESPACE TEMP TEMPLATE TEMPORARY TEXT_P THEN TIME TIMESTAMP
- TO TRAILING TRANSACTION TREAT TRIGGER TRIM TRUE_P
+ TABLE TABLES TABLESAMPLE TABLESPACE TEMP TEMPLATE TEMPORARY TEXT_P THEN
+ TIME TIMESTAMP TO TRAILING TRANSACTION TREAT TRIGGER TRIM TRUE_P
TRUNCATE TRUSTED TYPE_P TYPES_P
UNBOUNDED UNCOMMITTED UNENCRYPTED UNION UNIQUE UNKNOWN UNLISTEN UNLOGGED
@@ -10227,6 +10228,12 @@ table_ref: relation_expr opt_alias_clause
$1->alias = $2;
$$ = (Node *) $1;
}
+ | relation_expr_tablesample opt_alias_clause
+ {
+ RangeTableSample *n = (RangeTableSample *) $1;
+ n->relation->alias = $2;
+ $$ = (Node *) n;
+ }
| func_table func_alias_clause
{
RangeFunction *n = (RangeFunction *) $1;
@@ -10522,7 +10529,6 @@ relation_expr_list:
| relation_expr_list ',' relation_expr { $$ = lappend($1, $3); }
;
-
/*
* Given "UPDATE foo set set ...", we have to decide without looking any
* further ahead whether the first "set" is an alias or the UPDATE's SET
@@ -10552,6 +10558,31 @@ relation_expr_opt_alias: relation_expr %prec UMINUS
}
;
+
+relation_expr_tablesample: relation_expr tablesample_clause
+ {
+ RangeTableSample *n = (RangeTableSample *) $2;
+ n->relation = $1;
+ $$ = (Node *) n;
+ }
+ ;
+
+tablesample_clause:
+ TABLESAMPLE ColId '(' func_arg_list ')' opt_repeatable_clause
+ {
+ RangeTableSample *n = makeNode(RangeTableSample);
+ n->method = $2;
+ n->args = $4;
+ n->repeatable = $6;
+ $$ = (Node *) n;
+ }
+ ;
+
+opt_repeatable_clause:
+ REPEATABLE '(' AexprConst ')' { $$ = (Node *) $3; }
+ | /*EMPTY*/ { $$ = NULL; }
+ ;
+
/*
* func_table represents a function invocation in a FROM list. It can be
* a plain function call, like "foo(...)", or a ROWS FROM expression with
@@ -13334,7 +13365,6 @@ unreserved_keyword:
| RELATIVE_P
| RELEASE
| RENAME
- | REPEATABLE
| REPLACE
| REPLICA
| RESET
@@ -13509,6 +13539,7 @@ type_func_name_keyword:
| OVERLAPS
| RIGHT
| SIMILAR
+ | TABLESAMPLE
| VERBOSE
;
@@ -13577,6 +13608,7 @@ reserved_keyword:
| PLACING
| PRIMARY
| REFERENCES
+ | REPEATABLE
| RETURNING
| SELECT
| SESSION_USER
diff --git a/src/backend/parser/parse_clause.c b/src/backend/parser/parse_clause.c
index 654dce6..03632d2 100644
--- a/src/backend/parser/parse_clause.c
+++ b/src/backend/parser/parse_clause.c
@@ -16,6 +16,7 @@
#include "postgres.h"
#include "access/heapam.h"
+#include "access/htup_details.h"
#include "catalog/heap.h"
#include "catalog/pg_type.h"
#include "commands/defrem.h"
@@ -29,6 +30,7 @@
#include "parser/parse_coerce.h"
#include "parser/parse_collate.h"
#include "parser/parse_expr.h"
+#include "parser/parse_func.h"
#include "parser/parse_oper.h"
#include "parser/parse_relation.h"
#include "parser/parse_target.h"
@@ -36,6 +38,7 @@
#include "utils/guc.h"
#include "utils/lsyscache.h"
#include "utils/rel.h"
+#include "utils/syscache.h"
/* Convenience macro for the most common makeNamespaceItem() case */
@@ -413,6 +416,19 @@ transformJoinOnClause(ParseState *pstate, JoinExpr *j, List *namespace)
return result;
}
+static RangeTblEntry *
+transformTableSampleEntry(ParseState *pstate, RangeTableSample *r)
+{
+ RangeTblEntry *rte;
+ TableSampleClause *tablesample = NULL;
+
+ rte = transformTableEntry(pstate, r->relation);
+ tablesample = ParseTableSample(pstate, r->method, r->repeatable, r->args);
+ rte->tablesample = tablesample;
+
+ return rte;
+}
+
/*
* transformTableEntry --- transform a RangeVar (simple relation reference)
*/
@@ -421,7 +437,7 @@ transformTableEntry(ParseState *pstate, RangeVar *r)
{
RangeTblEntry *rte;
- /* We need only build a range table entry */
+ /* We first need to build a range table entry */
rte = addRangeTableEntry(pstate, r, r->alias,
interpretInhOption(r->inhOpt), true);
@@ -1121,6 +1137,26 @@ transformFromClauseItem(ParseState *pstate, Node *n,
return (Node *) j;
}
+ else if (IsA(n, RangeTableSample))
+ {
+ /* Tablesample reference */
+ RangeTableSample *rv = (RangeTableSample *) n;
+ RangeTblRef *rtr;
+ RangeTblEntry *rte = NULL;
+ int rtindex;
+
+ rte = transformTableSampleEntry(pstate, rv);
+
+ /* assume new rte is at end */
+ rtindex = list_length(pstate->p_rtable);
+ Assert(rte == rt_fetch(rtindex, pstate->p_rtable));
+ *top_rte = rte;
+ *top_rti = rtindex;
+ *namespace = list_make1(makeDefaultNSItem(rte));
+ rtr = makeNode(RangeTblRef);
+ rtr->rtindex = rtindex;
+ return (Node *) rtr;
+ }
else
elog(ERROR, "unrecognized node type: %d", (int) nodeTag(n));
return NULL; /* can't get here, keep compiler quiet */
diff --git a/src/backend/parser/parse_func.c b/src/backend/parser/parse_func.c
index a200804..7ca1268 100644
--- a/src/backend/parser/parse_func.c
+++ b/src/backend/parser/parse_func.c
@@ -18,6 +18,7 @@
#include "catalog/pg_aggregate.h"
#include "catalog/pg_proc.h"
#include "catalog/pg_type.h"
+#include "catalog/pg_tablesample_method.h"
#include "funcapi.h"
#include "lib/stringinfo.h"
#include "nodes/makefuncs.h"
@@ -26,6 +27,7 @@
#include "parser/parse_clause.h"
#include "parser/parse_coerce.h"
#include "parser/parse_func.h"
+#include "parser/parse_expr.h"
#include "parser/parse_relation.h"
#include "parser/parse_target.h"
#include "parser/parse_type.h"
@@ -760,6 +762,133 @@ ParseFuncOrColumn(ParseState *pstate, List *funcname, List *fargs,
}
+/*
+ * ParseTableSample
+ *
+ * Parse TABLESAMPLE clause and process the arguments
+ */
+extern TableSampleClause *
+ParseTableSample(ParseState *pstate, char *samplemethod, Node *repeatable,
+ List *sampleargs)
+{
+ HeapTuple tuple;
+ Form_pg_tablesample_method tsm;
+ Form_pg_proc procform;
+ TableSampleClause *tablesample;
+ List *fargs;
+ ListCell *larg;
+ int nargs, initnargs;
+ Oid actual_arg_types[FUNC_MAX_ARGS];
+ Oid init_arg_types[FUNC_MAX_ARGS];
+
+ /* Load the tablesample method */
+ tuple = SearchSysCache1(TABLESAMPLEMETHODNAME, PointerGetDatum(samplemethod));
+ if (!HeapTupleIsValid(tuple))
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_OBJECT),
+ errmsg("tablesample method \"%s\" does not exist",
+ samplemethod)));
+
+ tablesample = makeNode(TableSampleClause);
+ tablesample->tsmid = HeapTupleGetOid(tuple);
+
+ tsm = (Form_pg_tablesample_method) GETSTRUCT(tuple);
+
+ tablesample->tsminit = tsm->tsminit;
+ tablesample->tsmnextblock = tsm->tsmnextblock;
+ tablesample->tsmnexttuple = tsm->tsmnexttuple;
+ tablesample->tsmreturntuple = tsm->tsmreturntuple;
+ tablesample->tsmend = tsm->tsmend;
+ tablesample->tsmreset = tsm->tsmreset;
+ tablesample->tsmcost = tsm->tsmcost;
+
+ ReleaseSysCache(tuple);
+
+ /* Validate the parameters against init function definition. */
+ tuple = SearchSysCache1(PROCOID,
+ ObjectIdGetDatum(tablesample->tsminit));
+
+ if (!HeapTupleIsValid(tuple)) /* should not happen */
+ elog(ERROR, "cache lookup failed for function %u",
+ tablesample->tsminit);
+
+ procform = (Form_pg_proc) GETSTRUCT(tuple);
+ initnargs = procform->pronargs;
+ Assert(initnargs >= 3);
+
+ /*
+ * First parameter is used to pass the SampleScanState, second is
+ * seed (REPEATABLE), skip the processing for them here, just assert
+ * that the types are correct.
+ */
+ Assert(procform->proargtypes.values[0] == INTERNALOID);
+ Assert(procform->proargtypes.values[1] == INT4OID);
+ initnargs -= 2;
+ memcpy(init_arg_types, procform->proargtypes.values + 2,
+ initnargs * sizeof(Oid));
+
+ /* Now we are done with the catalog */
+ ReleaseSysCache(tuple);
+
+ /* Process repeatable (seed) */
+ if (repeatable != NULL)
+ {
+ Node *arg = repeatable;
+
+ if (arg && IsA(arg, A_Const))
+ {
+ A_Const *con = (A_Const *) arg;
+
+ if (con->val.type == T_Null)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("REPEATABLE clause must be NOT NULL numeric value"),
+ parser_errposition(pstate, con->location)));
+
+ }
+
+ arg = transformExpr(pstate, arg, EXPR_KIND_FROM_FUNCTION);
+ arg = coerce_to_specific_type(pstate, arg, INT4OID, "REPEATABLE");
+ tablesample->repeatable = arg;
+ }
+ else
+ tablesample->repeatable = NULL;
+
+ /* Transform the rest of arguments ... */
+ fargs = NIL;
+ nargs = 0;
+ foreach(larg, sampleargs)
+ {
+ Node *arg = transformExpr(pstate, (Node *) lfirst(larg), EXPR_KIND_FROM_FUNCTION);
+ Oid argtype = exprType(arg);
+
+ fargs = lappend(fargs, arg);
+
+ actual_arg_types[nargs++] = argtype;
+ }
+
+ /*
+ * Check if parameters are correct.
+ *
+ * XXX: can we do better at hinting here?
+ */
+ if (initnargs != nargs ||
+ !can_coerce_type(initnargs, actual_arg_types, init_arg_types,
+ COERCION_IMPLICIT))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("wrong parameters for tablesample method \"%s\"",
+ samplemethod)));
+
+ /* perform the necessary typecasting of arguments */
+ make_fn_arguments(pstate, fargs, actual_arg_types, init_arg_types);
+
+ /* Pass the arguments down */
+ tablesample->args = fargs;
+
+ return tablesample;
+}
+
/* func_match_argtypes()
*
* Given a list of candidate functions (having the right name and number
diff --git a/src/backend/utils/Makefile b/src/backend/utils/Makefile
index 8374533..9daa2ae 100644
--- a/src/backend/utils/Makefile
+++ b/src/backend/utils/Makefile
@@ -9,7 +9,8 @@ top_builddir = ../../..
include $(top_builddir)/src/Makefile.global
OBJS = fmgrtab.o
-SUBDIRS = adt cache error fmgr hash init mb misc mmgr resowner sort time
+SUBDIRS = adt cache error fmgr hash init mb misc mmgr resowner sort time \
+ tablesample
# location of Catalog.pm
catalogdir = $(top_srcdir)/src/backend/catalog
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index c1d860c..8198fc7 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -31,6 +31,7 @@
#include "catalog/pg_opclass.h"
#include "catalog/pg_operator.h"
#include "catalog/pg_proc.h"
+#include "catalog/pg_tablesample_method.h"
#include "catalog/pg_trigger.h"
#include "catalog/pg_type.h"
#include "commands/defrem.h"
@@ -343,6 +344,8 @@ static void make_ruledef(StringInfo buf, HeapTuple ruletup, TupleDesc rulettc,
int prettyFlags);
static void make_viewdef(StringInfo buf, HeapTuple ruletup, TupleDesc rulettc,
int prettyFlags, int wrapColumn);
+static void get_tablesample_def(TableSampleClause *tablesample,
+ deparse_context *context);
static void get_query_def(Query *query, StringInfo buf, List *parentnamespace,
TupleDesc resultDesc,
int prettyFlags, int wrapColumn, int startIndent);
@@ -4184,6 +4187,50 @@ make_viewdef(StringInfo buf, HeapTuple ruletup, TupleDesc rulettc,
heap_close(ev_relation, AccessShareLock);
}
+/* ----------
+ * get_tablesample_def - Convert TableSampleClause back to SQL
+ * ----------
+ */
+static void
+get_tablesample_def(TableSampleClause *tablesample, deparse_context *context)
+{
+ StringInfo buf = context->buf;
+ HeapTuple tuple;
+ Form_pg_tablesample_method tsm;
+ char *tsmname;
+ int nargs;
+ ListCell *l;
+
+ /* Load the tablesample method */
+ tuple = SearchSysCache1(TABLESAMPLEMETHODOID, ObjectIdGetDatum(tablesample->tsmid));
+ if (!HeapTupleIsValid(tuple))
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_OBJECT),
+ errmsg("cache lookup failed for tablesample method %u",
+ tablesample->tsmid)));
+
+ tsm = (Form_pg_tablesample_method) GETSTRUCT(tuple);
+ tsmname = NameStr(tsm->tsmname);
+ appendStringInfo(buf, " TABLESAMPLE %s (", quote_identifier(tsmname));
+
+ ReleaseSysCache(tuple);
+
+ nargs = 0;
+ foreach(l, tablesample->args)
+ {
+ if (nargs++ > 0)
+ appendStringInfoString(buf, ", ");
+ get_rule_expr((Node *) lfirst(l), context, true);
+ }
+ appendStringInfoChar(buf, ')');
+
+ if (tablesample->repeatable != NULL)
+ {
+ appendStringInfoString(buf, " REPEATABLE (");
+ get_rule_expr(tablesample->repeatable, context, true);
+ appendStringInfoChar(buf, ')');
+ }
+}
/* ----------
* get_query_def - Parse back one query parsetree
@@ -8411,6 +8458,9 @@ get_from_clause_item(Node *jtnode, Query *query, deparse_context *context)
only_marker(rte),
generate_relation_name(rte->relid,
context->namespaces));
+
+ if (rte->tablesample)
+ get_tablesample_def(rte->tablesample, context);
break;
case RTE_SUBQUERY:
/* Subquery RTE */
diff --git a/src/backend/utils/cache/syscache.c b/src/backend/utils/cache/syscache.c
index bd27168..3a8f01e 100644
--- a/src/backend/utils/cache/syscache.c
+++ b/src/backend/utils/cache/syscache.c
@@ -55,6 +55,7 @@
#include "catalog/pg_shdescription.h"
#include "catalog/pg_shseclabel.h"
#include "catalog/pg_statistic.h"
+#include "catalog/pg_tablesample_method.h"
#include "catalog/pg_tablespace.h"
#include "catalog/pg_ts_config.h"
#include "catalog/pg_ts_config_map.h"
@@ -642,6 +643,28 @@ static const struct cachedesc cacheinfo[] = {
},
128
},
+ {TableSampleMethodRelationId, /* TABLESAMPLEMETHODNAME */
+ TableSampleMethodNameIndexId,
+ 1,
+ {
+ Anum_pg_tablesample_method_tsmname,
+ 0,
+ 0,
+ 0,
+ },
+ 2
+ },
+ {TableSampleMethodRelationId, /* TABLESAMPLEMETHODOID */
+ TableSampleMethodOidIndexId,
+ 1,
+ {
+ ObjectIdAttributeNumber,
+ 0,
+ 0,
+ 0,
+ },
+ 2
+ },
{TableSpaceRelationId, /* TABLESPACEOID */
TablespaceOidIndexId,
1,
diff --git a/src/backend/utils/misc/sampling.c b/src/backend/utils/misc/sampling.c
index 1eeabaf..f213c46 100644
--- a/src/backend/utils/misc/sampling.c
+++ b/src/backend/utils/misc/sampling.c
@@ -46,6 +46,8 @@ BlockSampler_Init(BlockSampler bs, BlockNumber nblocks, int samplesize,
bs->n = samplesize;
bs->t = 0; /* blocks scanned so far */
bs->m = 0; /* blocks selected so far */
+
+ sampler_random_init_state(randseed, bs->randstate);
}
bool
@@ -92,7 +94,7 @@ BlockSampler_Next(BlockSampler bs)
* less than k, which means that we cannot fail to select enough blocks.
*----------
*/
- V = sampler_random_fract();
+ V = sampler_random_fract(bs->randstate);
p = 1.0 - (double) k / (double) K;
while (V < p)
{
@@ -126,8 +128,14 @@ BlockSampler_Next(BlockSampler bs)
void
reservoir_init_selection_state(ReservoirState rs, int n)
{
+ /*
+ * Reservoir sampling is not used anywhere where it would need to return
+ * repeatable results so we can initialize it randomly.
+ */
+ sampler_random_init_state(random(), rs->randstate);
+
/* Initial value of W (for use when Algorithm Z is first applied) */
- *rs = exp(-log(sampler_random_fract()) / n);
+ rs->W = exp(-log(sampler_random_fract(rs->randstate)) / n);
}
double
@@ -142,7 +150,7 @@ reservoir_get_next_S(ReservoirState rs, double t, int n)
double V,
quot;
- V = sampler_random_fract(); /* Generate V */
+ V = sampler_random_fract(rs->randstate); /* Generate V */
S = 0;
t += 1;
/* Note: "num" in Vitter's code is always equal to t - n */
@@ -158,7 +166,7 @@ reservoir_get_next_S(ReservoirState rs, double t, int n)
else
{
/* Now apply Algorithm Z */
- double W = *rs;
+ double W = rs->W;
double term = t - (double) n + 1;
for (;;)
@@ -174,7 +182,7 @@ reservoir_get_next_S(ReservoirState rs, double t, int n)
tmp;
/* Generate U and X */
- U = sampler_random_fract();
+ U = sampler_random_fract(rs->randstate);
X = t * (W - 1.0);
S = floor(X); /* S is tentatively set to floor(X) */
/* Test if U <= h(S)/cg(X) in the manner of (6.3) */
@@ -203,11 +211,11 @@ reservoir_get_next_S(ReservoirState rs, double t, int n)
y *= numer / denom;
denom -= 1;
}
- W = exp(-log(sampler_random_fract()) / n); /* Generate W in advance */
+ W = exp(-log(sampler_random_fract(rs->randstate)) / n); /* Generate W in advance */
if (exp(log(y) / n) <= (t + X) / t)
break;
}
- *rs = W;
+ rs->W = W;
}
return S;
}
@@ -217,10 +225,17 @@ reservoir_get_next_S(ReservoirState rs, double t, int n)
* Random number generator used by sampling
*----------
*/
+void
+sampler_random_init_state(long seed, SamplerRandomState randstate)
+{
+ randstate[0] = 0x330e;
+ randstate[1] = (unsigned short) seed;
+ randstate[2] = (unsigned short) (seed >> 16);
+}
/* Select a random value R uniformly distributed in (0 - 1) */
double
-sampler_random_fract()
+sampler_random_fract(SamplerRandomState randstate)
{
- return ((double) random() + 1) / ((double) MAX_RANDOM_VALUE + 2);
+ return pg_erand48(randstate);
}
diff --git a/src/backend/utils/tablesample/Makefile b/src/backend/utils/tablesample/Makefile
new file mode 100644
index 0000000..df92939
--- /dev/null
+++ b/src/backend/utils/tablesample/Makefile
@@ -0,0 +1,17 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+# Makefile for utils/tablesample
+#
+# IDENTIFICATION
+# src/backend/utils/tablesample/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/backend/utils/tablesample
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS = system.o bernoulli.o
+
+include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/utils/tablesample/bernoulli.c b/src/backend/utils/tablesample/bernoulli.c
new file mode 100644
index 0000000..15e29d8
--- /dev/null
+++ b/src/backend/utils/tablesample/bernoulli.c
@@ -0,0 +1,206 @@
+/*-------------------------------------------------------------------------
+ *
+ * bernoulli.c
+ * interface routines for BERNOULLI tablesample method
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/utils/tablesample/bernoulli.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "fmgr.h"
+
+#include "nodes/execnodes.h"
+#include "nodes/relation.h"
+#include "optimizer/clauses.h"
+#include "storage/bufmgr.h"
+#include "utils/sampling.h"
+#include "utils/tablesample.h"
+
+
+/* State */
+typedef struct
+{
+ uint32 seed; /* random seed */
+ BlockNumber tblocks; /* total blocks in relation */
+ BlockNumber blockno; /* current block */
+ float4 probability; /* probabilty that tuple will be returned (0.0-1.0) */
+ OffsetNumber lt; /* last tuple returned from current block */
+ SamplerRandomState randstate; /* random generator state */
+} BernoulliSamplerData;
+
+/*
+ * Initialize the state.
+ */
+Datum
+tsm_bernoulli_init(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ uint32 seed = PG_GETARG_UINT32(1);
+ float4 percent = PG_ARGISNULL(2) ? -1 : PG_GETARG_FLOAT4(2);
+ Relation rel = scanstate->ss.ss_currentRelation;
+ BernoulliSamplerData *sampler;
+
+ if (percent < 0 || percent > 100)
+ ereport(ERROR,
+ (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+ errmsg("invalid sample size"),
+ errhint("Sample size must be numeric value between 0 and 100 (inclusive).")));
+
+ sampler = palloc0(sizeof(BernoulliSamplerData));
+
+ /* Remember initial values for reinit */
+ sampler->seed = seed;
+ sampler->tblocks = RelationGetNumberOfBlocks(rel);
+ sampler->blockno = InvalidBlockNumber;
+ sampler->probability = percent / 100;
+ sampler->lt = InvalidOffsetNumber;
+ sampler_random_init_state(sampler->seed, sampler->randstate);
+
+ scanstate->tsmdata = (void *) sampler;
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Get next block number to read or InvalidBlockNumber if we are at the
+ * end of the relation.
+ */
+Datum
+tsm_bernoulli_nextblock(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ BernoulliSamplerData *sampler =
+ (BernoulliSamplerData *) scanstate->tsmdata;
+
+ if (sampler->blockno == InvalidBlockNumber)
+ sampler->blockno = 0;
+ else if (++sampler->blockno >= sampler->tblocks)
+ PG_RETURN_UINT32(InvalidBlockNumber);
+
+ PG_RETURN_UINT32(sampler->blockno);
+}
+
+/*
+ * Get next tuple from current block.
+ *
+ * This method implements the main logic in bernoulli sampling.
+ * The algorithm simply generates new random number (in 0.0-1.0 range) and if
+ * it falls within user specified probability (in the same range) return the
+ * tuple offset.
+ *
+ * If we reach end of the block return InvalidOffsetNumber which tells
+ * SampleScan to go to next block.
+ */
+Datum
+tsm_bernoulli_nexttuple(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ OffsetNumber maxoffset = PG_GETARG_UINT16(2);
+ BernoulliSamplerData *sampler =
+ (BernoulliSamplerData *) scanstate->tsmdata;
+ OffsetNumber tupoffset = sampler->lt;
+ float4 probability = sampler->probability;
+
+ if (tupoffset == InvalidOffsetNumber)
+ tupoffset = FirstOffsetNumber;
+ else
+ tupoffset++;
+
+ /*
+ * Loop over tuple offsets until the random generator returns value that
+ * is within the probability of returning the tuple or until we reach
+ * end of the block.
+ */
+ while (sampler_random_fract(sampler->randstate) > probability)
+ {
+ tupoffset++;
+
+ if (tupoffset > maxoffset)
+ break;
+ }
+
+ if (tupoffset > maxoffset)
+ /* Tell SampleScan that we want next block. */
+ tupoffset = InvalidOffsetNumber;
+
+ sampler->lt = tupoffset;
+
+ PG_RETURN_UINT16(tupoffset);
+}
+
+/*
+ * Cleanup method.
+ */
+Datum
+tsm_bernoulli_end(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+
+ pfree(scanstate->tsmdata);
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Reset state (called by ReScan).
+ */
+Datum
+tsm_bernoulli_reset(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ BernoulliSamplerData *sampler =
+ (BernoulliSamplerData *) scanstate->tsmdata;
+
+ sampler->blockno = InvalidBlockNumber;
+ sampler->lt = InvalidOffsetNumber;
+ sampler_random_init_state(sampler->seed, sampler->randstate);
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Costing function.
+ */
+Datum
+tsm_bernoulli_cost(PG_FUNCTION_ARGS)
+{
+ PlannerInfo *root = (PlannerInfo *) PG_GETARG_POINTER(0);
+ RelOptInfo *baserel = (RelOptInfo *) PG_GETARG_POINTER(2);
+ List *args = (List *) PG_GETARG_POINTER(3);
+ BlockNumber *pages = (BlockNumber *) PG_GETARG_POINTER(4);
+ double *tuples = (double *) PG_GETARG_POINTER(5);
+ Node *pctnode;
+ float4 samplesize;
+
+ SamplerAccessStrategy *strategy =
+ (SamplerAccessStrategy *) PG_GETARG_POINTER(5);
+
+ *strategy = SAS_SEQUENTIAL;
+ *pages = baserel->pages;
+
+ pctnode = linitial(args);
+ pctnode = estimate_expression_value(root, pctnode);
+
+ if (IsA(pctnode, RelabelType))
+ pctnode = (Node *) ((RelabelType *) pctnode)->arg;
+
+ if (!IsA(pctnode, Const))
+ {
+ *tuples = baserel->tuples * 0.1;
+ PG_RETURN_VOID();
+ }
+
+ samplesize = DatumGetFloat4(((Const *) pctnode)->constvalue);
+ samplesize /= 100.0;
+
+ *tuples = baserel->tuples * samplesize;
+
+ PG_RETURN_VOID();
+}
diff --git a/src/backend/utils/tablesample/system.c b/src/backend/utils/tablesample/system.c
new file mode 100644
index 0000000..9d410d2
--- /dev/null
+++ b/src/backend/utils/tablesample/system.c
@@ -0,0 +1,187 @@
+/*-------------------------------------------------------------------------
+ *
+ * system.c
+ * interface routines for system tablesample method
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/utils/tablesample/system.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "fmgr.h"
+
+#include "nodes/execnodes.h"
+#include "nodes/relation.h"
+#include "optimizer/clauses.h"
+#include "storage/bufmgr.h"
+#include "utils/sampling.h"
+#include "utils/tablesample.h"
+
+
+/*
+ * State
+ */
+typedef struct
+{
+ BlockSamplerData bs;
+ uint32 seed; /* random seed */
+ BlockNumber tblocks; /* total blocks in relation */
+ int samplesize; /* number of blocks to return */
+ OffsetNumber lt; /* last tuple returned from current block */
+} SystemSamplerData;
+
+
+/*
+ * Initializes the state.
+ */
+Datum
+tsm_system_init(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ uint32 seed = PG_GETARG_UINT32(1);
+ float4 percent = PG_ARGISNULL(2) ? -1 : PG_GETARG_FLOAT4(2);
+ SystemSamplerData *sampler;
+
+ if (percent < 0 || percent > 100)
+ ereport(ERROR,
+ (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+ errmsg("invalid sample size"),
+ errhint("Sample size must be numeric value between 0 and 100 (inclusive).")));
+
+ sampler = palloc0(sizeof(SystemSamplerData));
+
+ /* Remember initial values for reinit */
+ sampler->seed = seed;
+ sampler->tblocks =
+ RelationGetNumberOfBlocks(scanstate->ss.ss_currentRelation);
+ sampler->samplesize = 1 + (int) (sampler->tblocks * (percent / 100.0));
+ sampler->lt = InvalidOffsetNumber;
+
+ BlockSampler_Init(&sampler->bs, sampler->tblocks, sampler->samplesize,
+ sampler->seed);
+
+ scanstate->tsmdata = (void *) sampler;
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Get next block number or InvalidBlockNumber when we're done.
+ *
+ * Uses the same logic as VACUUM for picking the random blocks.
+ */
+Datum
+tsm_system_nextblock(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ SystemSamplerData *sampler = (SystemSamplerData *) scanstate->tsmdata;
+ BlockNumber blockno;
+
+ if (!BlockSampler_HasMore(&sampler->bs))
+ PG_RETURN_UINT32(InvalidBlockNumber);
+
+ blockno = BlockSampler_Next(&sampler->bs);
+
+ PG_RETURN_UINT32(blockno);
+}
+
+/*
+ * Get next tuple offset in current block or InvalidOffsetNumber if we are done
+ * with this block.
+ */
+Datum
+tsm_system_nexttuple(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ OffsetNumber maxoffset = PG_GETARG_UINT16(2);
+ SystemSamplerData *sampler = (SystemSamplerData *) scanstate->tsmdata;
+ OffsetNumber tupoffset = sampler->lt;
+
+ if (tupoffset == InvalidOffsetNumber)
+ tupoffset = FirstOffsetNumber;
+ else
+ tupoffset++;
+
+ if (tupoffset > maxoffset)
+ tupoffset = InvalidOffsetNumber;
+
+ sampler->lt = tupoffset;
+
+ PG_RETURN_UINT16(tupoffset);
+}
+
+/*
+ * Cleanup method.
+ */
+Datum
+tsm_system_end(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+
+ pfree(scanstate->tsmdata);
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Reset state (called by ReScan).
+ */
+Datum
+tsm_system_reset(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ SystemSamplerData *sampler = (SystemSamplerData *) scanstate->tsmdata;
+
+ sampler->lt = InvalidOffsetNumber;
+ BlockSampler_Init(&sampler->bs, sampler->tblocks, sampler->samplesize,
+ sampler->seed);
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Costing function.
+ */
+Datum
+tsm_system_cost(PG_FUNCTION_ARGS)
+{
+ PlannerInfo *root = (PlannerInfo *) PG_GETARG_POINTER(0);
+ RelOptInfo *baserel = (RelOptInfo *) PG_GETARG_POINTER(2);
+ List *args = (List *) PG_GETARG_POINTER(3);
+ BlockNumber *pages = (BlockNumber *) PG_GETARG_POINTER(4);
+ double *tuples = (double *) PG_GETARG_POINTER(5);
+ Node *pctnode;
+ float4 percent;
+
+ SamplerAccessStrategy *strategy =
+ (SamplerAccessStrategy *) PG_GETARG_POINTER(5);
+
+ *strategy = SAS_RANDOM;
+
+ pctnode = linitial(args);
+ pctnode = estimate_expression_value(root, pctnode);
+
+ if (IsA(pctnode, RelabelType))
+ pctnode = (Node *) ((RelabelType *) pctnode)->arg;
+
+ if (!IsA(pctnode, Const))
+ {
+ *pages = baserel->pages * 0.1;
+ *tuples = baserel->tuples * 0.1;
+ PG_RETURN_VOID();
+ }
+
+ percent = DatumGetFloat4(((Const *) pctnode)->constvalue);
+ percent /= 100.0;
+
+ *pages = baserel->pages * percent;
+ *tuples = baserel->tuples * percent;
+
+ PG_RETURN_VOID();
+}
diff --git a/src/include/catalog/indexing.h b/src/include/catalog/indexing.h
index a680229..c711cca 100644
--- a/src/include/catalog/indexing.h
+++ b/src/include/catalog/indexing.h
@@ -305,6 +305,11 @@ DECLARE_UNIQUE_INDEX(pg_policy_oid_index, 3257, on pg_policy using btree(oid oid
DECLARE_UNIQUE_INDEX(pg_policy_polrelid_polname_index, 3258, on pg_policy using btree(polrelid oid_ops, polname name_ops));
#define PolicyPolrelidPolnameIndexId 3258
+DECLARE_UNIQUE_INDEX(pg_tablesample_method_name_index, 3281, on pg_tablesample_method using btree(tsmname name_ops));
+#define TableSampleMethodNameIndexId 3281
+DECLARE_UNIQUE_INDEX(pg_tablesample_method_oid_index, 3282, on pg_tablesample_method using btree(oid oid_ops));
+#define TableSampleMethodOidIndexId 3282
+
/* last step of initialization script: build the indexes declared above */
BUILD_INDICES
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 9edfdb8..e6d821d 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -5143,6 +5143,31 @@ DESCR("rank of hypothetical row without gaps");
DATA(insert OID = 3993 ( dense_rank_final PGNSP PGUID 12 1 0 2276 0 f f f f f f i 2 0 20 "2281 2276" "{2281,2276}" "{i,v}" _null_ _null_ hypothetical_dense_rank_final _null_ _null_ _null_ ));
DESCR("aggregate final function");
+DATA(insert OID = 3285 ( tsm_system_init PGNSP PGUID 12 1 0 0 0 f f f f t f v 3 0 2278 "2281 23 700" _null_ _null_ _null_ _null_ tsm_system_init _null_ _null_ _null_ ));
+DESCR("tsm_system_init(internal)");
+DATA(insert OID = 3286 ( tsm_system_nextblock PGNSP PGUID 12 1 0 0 0 f f f f t f v 2 0 23 "2281 16" _null_ _null_ _null_ _null_ tsm_system_nextblock _null_ _null_ _null_ ));
+DESCR("tsm_system_nextblock(internal)");
+DATA(insert OID = 3287 ( tsm_system_nexttuple PGNSP PGUID 12 1 0 0 0 f f f f t f v 4 0 21 "2281 23 21 16" _null_ _null_ _null_ _null_ tsm_system_nexttuple _null_ _null_ _null_ ));
+DESCR("tsm_system_nexttuple(internal)");
+DATA(insert OID = 3288 ( tsm_system_end PGNSP PGUID 12 1 0 0 0 f f f f t f v 1 0 2278 "2281" _null_ _null_ _null_ _null_ tsm_system_end _null_ _null_ _null_ ));
+DESCR("tsm_system_end(internal)");
+DATA(insert OID = 3289 ( tsm_system_reset PGNSP PGUID 12 1 0 0 0 f f f f t f v 1 0 2278 "2281" _null_ _null_ _null_ _null_ tsm_system_reset _null_ _null_ _null_ ));
+DESCR("tsm_system_reset(internal)");
+DATA(insert OID = 3290 ( tsm_system_cost PGNSP PGUID 12 1 0 0 0 f f f f t f v 7 0 2278 "2281 2281 2281 2281 2281 2281 2281" _null_ _null_ _null_ _null_ tsm_system_cost _null_ _null_ _null_ ));
+DESCR("tsm_system_cost(internal)");
+
+DATA(insert OID = 3291 ( tsm_bernoulli_init PGNSP PGUID 12 1 0 0 0 f f f f t f v 3 0 2278 "2281 23 700" _null_ _null_ _null_ _null_ tsm_bernoulli_init _null_ _null_ _null_ ));
+DESCR("tsm_bernoulli_init(internal)");
+DATA(insert OID = 3292 ( tsm_bernoulli_nextblock PGNSP PGUID 12 1 0 0 0 f f f f t f v 2 0 23 "2281 16" _null_ _null_ _null_ _null_ tsm_bernoulli_nextblock _null_ _null_ _null_ ));
+DESCR("tsm_bernoulli_nextblock(internal)");
+DATA(insert OID = 3293 ( tsm_bernoulli_nexttuple PGNSP PGUID 12 1 0 0 0 f f f f t f v 4 0 21 "2281 23 21 16" _null_ _null_ _null_ _null_ tsm_bernoulli_nexttuple _null_ _null_ _null_ ));
+DESCR("tsm_bernoulli_nexttuple(internal)");
+DATA(insert OID = 3294 ( tsm_bernoulli_end PGNSP PGUID 12 1 0 0 0 f f f f t f v 1 0 2278 "2281" _null_ _null_ _null_ _null_ tsm_bernoulli_end _null_ _null_ _null_ ));
+DESCR("tsm_bernoulli_end(internal)");
+DATA(insert OID = 3296 ( tsm_bernoulli_reset PGNSP PGUID 12 1 0 0 0 f f f f t f v 1 0 2278 "2281" _null_ _null_ _null_ _null_ tsm_bernoulli_reset _null_ _null_ _null_ ));
+DESCR("tsm_bernoulli_reset(internal)");
+DATA(insert OID = 3297 ( tsm_bernoulli_cost PGNSP PGUID 12 1 0 0 0 f f f f t f v 7 0 2278 "2281 2281 2281 2281 2281 2281 2281" _null_ _null_ _null_ _null_ tsm_bernoulli_cost _null_ _null_ _null_ ));
+DESCR("tsm_bernoulli_cost(internal)");
/*
* Symbolic values for provolatile column: these indicate whether the result
diff --git a/src/include/catalog/pg_tablesample_method.h b/src/include/catalog/pg_tablesample_method.h
new file mode 100644
index 0000000..ea11f45
--- /dev/null
+++ b/src/include/catalog/pg_tablesample_method.h
@@ -0,0 +1,73 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_tablesample_method.h
+ * definition of the table scan methods.
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/catalog/pg_tablesample_method.h
+ *
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PG_TABLESAMPLE_METHOD_H
+#define PG_TABLESAMPLE_METHOD_H
+
+#include "catalog/genbki.h"
+
+/* ----------------
+ * pg_tablesample_method definition. cpp turns this into
+ * typedef struct FormData_pg_tablesample_method
+ * ----------------
+ */
+#define TableSampleMethodRelationId 3280
+
+CATALOG(pg_tablesample_method,3280)
+{
+ NameData tsmname; /* tablescan method name */
+ regproc tsminit; /* init scan function */
+ regproc tsmnextblock; /* function returning next block to sample
+ or InvalidBlockOffset if finished */
+ regproc tsmnexttuple; /* function returning next tuple offset from current block
+ or InvalidOffsetNumber if end of the block was reacher */
+ regproc tsmreturntuple; /* optional function which can examine tuple contents and
+ decide if tuple should be returned or not */
+ regproc tsmend; /* end scan function*/
+ regproc tsmreset; /* reset state - used by rescan */
+ regproc tsmcost; /* costing function */
+} FormData_pg_tablesample_method;
+
+/* ----------------
+ * Form_pg_tablesample_method corresponds to a pointer to a tuple with
+ * the format of pg_tablesample_method relation.
+ * ----------------
+ */
+typedef FormData_pg_tablesample_method *Form_pg_tablesample_method;
+
+/* ----------------
+ * compiler constants for pg_tablesample_method
+ * ----------------
+ */
+#define Natts_pg_tablesample_method 8
+#define Anum_pg_tablesample_method_tsmname 1
+#define Anum_pg_tablesample_method_tsminit 2
+#define Anum_pg_tablesample_method_tsmnextblock 3
+#define Anum_pg_tablesample_method_tsmnexttuple 4
+#define Anum_pg_tablesample_method_tsmreturntuple 5
+#define Anum_pg_tablesample_method_tsmend 6
+#define Anum_pg_tablesample_method_tsmreset 7
+#define Anum_pg_tablesample_method_tsmcost 8
+
+/* ----------------
+ * initial contents of pg_tablesample_method
+ * ----------------
+ */
+
+DATA(insert OID = 3283 ( system tsm_system_init tsm_system_nextblock tsm_system_nexttuple - tsm_system_end tsm_system_reset tsm_system_cost ));
+DESCR("SYSTEM table sampling method");
+DATA(insert OID = 3284 ( bernoulli tsm_bernoulli_init tsm_bernoulli_nextblock tsm_bernoulli_nexttuple - tsm_bernoulli_end tsm_bernoulli_reset tsm_bernoulli_cost ));
+DESCR("BERNOULLI table sampling method");
+
+#endif /* PG_TABLESAMPLE_METHOD_H */
diff --git a/src/include/executor/nodeSamplescan.h b/src/include/executor/nodeSamplescan.h
new file mode 100644
index 0000000..4b769da
--- /dev/null
+++ b/src/include/executor/nodeSamplescan.h
@@ -0,0 +1,24 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeSamplescan.h
+ *
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/executor/nodeSamplescan.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef NODESAMPLESCAN_H
+#define NODESAMPLESCAN_H
+
+#include "nodes/execnodes.h"
+
+extern SampleScanState *ExecInitSampleScan(SampleScan *node, EState *estate, int eflags);
+extern TupleTableSlot *ExecSampleScan(SampleScanState *node);
+extern void ExecEndSampleScan(SampleScanState *node);
+extern void ExecReScanSampleScan(SampleScanState *node);
+
+#endif /* NODESAMPLESCAN_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 41288ed..526be4c 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1212,6 +1212,27 @@ typedef struct ScanState
typedef ScanState SeqScanState;
/*
+ * SampleScan
+ */
+typedef struct SampleScanState
+{
+ ScanState ss;
+
+ /* Sampling method functions. */
+ FmgrInfo tsminit;
+ FmgrInfo tsmnextblock;
+ FmgrInfo tsmnexttuple;
+ FmgrInfo tsmreturntuple;
+ FmgrInfo tsmend;
+ FmgrInfo tsmreset;
+
+ Buffer openbuffer; /* currently open buffer */
+ HeapTupleData tup; /* last tuple */
+
+ void *tsmdata; /* for use by table scan method */
+} SampleScanState;
+
+/*
* These structs store information about index quals that don't have simple
* constant right-hand sides. See comments for ExecIndexBuildScanKeys()
* for discussion.
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 97ef0fc..3276be8 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -61,6 +61,7 @@ typedef enum NodeTag
T_ValuesScan,
T_CteScan,
T_WorkTableScan,
+ T_SampleScan,
T_ForeignScan,
T_CustomScan,
T_Join,
@@ -97,6 +98,7 @@ typedef enum NodeTag
T_BitmapOrState,
T_ScanState,
T_SeqScanState,
+ T_SampleScanState,
T_IndexScanState,
T_IndexOnlyScanState,
T_BitmapIndexScanState,
@@ -413,6 +415,8 @@ typedef enum NodeTag
T_XmlSerialize,
T_WithClause,
T_CommonTableExpr,
+ T_RangeTableSample,
+ T_TableSampleClause,
/*
* TAGS FOR REPLICATION GRAMMAR PARSE NODES (replnodes.h)
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index b1dfa85..7927a82 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -307,6 +307,24 @@ typedef struct FuncCall
} FuncCall;
/*
+ * TableSampleClause - a sampling method information
+ */
+typedef struct TableSampleClause
+{
+ NodeTag type;
+ Oid tsmid;
+ Oid tsminit;
+ Oid tsmnextblock;
+ Oid tsmnexttuple;
+ Oid tsmreturntuple;
+ Oid tsmend;
+ Oid tsmreset;
+ Oid tsmcost;
+ Node *repeatable;
+ List *args;
+} TableSampleClause;
+
+/*
* A_Star - '*' representing all columns of a table or compound field
*
* This can appear within ColumnRef.fields, A_Indirection.indirection, and
@@ -507,6 +525,21 @@ typedef struct RangeFunction
} RangeFunction;
/*
+ * RangeTableSample - represents <table> TABLESAMPLE <method> (<params>) REPEATABLE (<num>)
+ *
+ * We are more generic than SQL Standard so we pass generic function
+ * arguments to the sampling method.
+ */
+typedef struct RangeTableSample
+{
+ NodeTag type;
+ RangeVar *relation;
+ char *method; /* sampling method */
+ Node *repeatable;
+ List *args; /* arguments for sampling method */
+} RangeTableSample;
+
+/*
* ColumnDef - column definition (used in various creates)
*
* If the column has a default value, we may have the value expression
@@ -751,6 +784,7 @@ typedef struct RangeTblEntry
*/
Oid relid; /* OID of the relation */
char relkind; /* relation kind (see pg_class.relkind) */
+ TableSampleClause *tablesample; /* sampling method and parameters */
/*
* Fields valid for a subquery RTE (else NULL):
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 316c9ce..8a2a146 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -278,6 +278,12 @@ typedef struct Scan
typedef Scan SeqScan;
/* ----------------
+ * table sample scan node
+ * ----------------
+ */
+typedef Scan SampleScan;
+
+/* ----------------
* index scan node
*
* indexqualorig is an implicitly-ANDed list of index qual expressions, each
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 9c2000b..24003ae 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -68,6 +68,7 @@ extern double index_pages_fetched(double tuples_fetched, BlockNumber pages,
double index_pages, PlannerInfo *root);
extern void cost_seqscan(Path *path, PlannerInfo *root, RelOptInfo *baserel,
ParamPathInfo *param_info);
+extern void cost_samplescan(Path *path, PlannerInfo *root, RelOptInfo *baserel);
extern void cost_index(IndexPath *path, PlannerInfo *root,
double loop_count);
extern void cost_bitmap_heap_scan(Path *path, PlannerInfo *root, RelOptInfo *baserel,
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 9923f0e..89c8ded 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -32,6 +32,8 @@ extern bool add_path_precheck(RelOptInfo *parent_rel,
extern Path *create_seqscan_path(PlannerInfo *root, RelOptInfo *rel,
Relids required_outer);
+extern Path *create_samplescan_path(PlannerInfo *root, RelOptInfo *rel,
+ Relids required_outer);
extern IndexPath *create_index_path(PlannerInfo *root,
IndexOptInfo *index,
List *indexclauses,
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index 7c243ec..6ff7b44 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -312,7 +312,7 @@ PG_KEYWORD("reindex", REINDEX, UNRESERVED_KEYWORD)
PG_KEYWORD("relative", RELATIVE_P, UNRESERVED_KEYWORD)
PG_KEYWORD("release", RELEASE, UNRESERVED_KEYWORD)
PG_KEYWORD("rename", RENAME, UNRESERVED_KEYWORD)
-PG_KEYWORD("repeatable", REPEATABLE, UNRESERVED_KEYWORD)
+PG_KEYWORD("repeatable", REPEATABLE, RESERVED_KEYWORD)
PG_KEYWORD("replace", REPLACE, UNRESERVED_KEYWORD)
PG_KEYWORD("replica", REPLICA, UNRESERVED_KEYWORD)
PG_KEYWORD("reset", RESET, UNRESERVED_KEYWORD)
@@ -366,6 +366,7 @@ PG_KEYWORD("sysid", SYSID, UNRESERVED_KEYWORD)
PG_KEYWORD("system", SYSTEM_P, UNRESERVED_KEYWORD)
PG_KEYWORD("table", TABLE, RESERVED_KEYWORD)
PG_KEYWORD("tables", TABLES, UNRESERVED_KEYWORD)
+PG_KEYWORD("tablesample", TABLESAMPLE, TYPE_FUNC_NAME_KEYWORD)
PG_KEYWORD("tablespace", TABLESPACE, UNRESERVED_KEYWORD)
PG_KEYWORD("temp", TEMP, UNRESERVED_KEYWORD)
PG_KEYWORD("template", TEMPLATE, UNRESERVED_KEYWORD)
diff --git a/src/include/parser/parse_func.h b/src/include/parser/parse_func.h
index 3264691..6727e55 100644
--- a/src/include/parser/parse_func.h
+++ b/src/include/parser/parse_func.h
@@ -33,6 +33,10 @@ typedef enum
extern Node *ParseFuncOrColumn(ParseState *pstate, List *funcname, List *fargs,
FuncCall *fn, int location);
+extern TableSampleClause *ParseTableSample(ParseState *pstate,
+ char *samplemethod,
+ Node *repeatable, List *args);
+
extern FuncDetailCode func_get_detail(List *funcname,
List *fargs, List *fargnames,
int nargs, Oid *argtypes,
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index 6bd786d..185bd81 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -63,7 +63,6 @@ typedef struct RelationAmInfo
FmgrInfo amcanreturn;
} RelationAmInfo;
-
/*
* Here are the contents of a relation cache entry.
*/
diff --git a/src/include/utils/sampling.h b/src/include/utils/sampling.h
index e3e7f9c..4ac208d 100644
--- a/src/include/utils/sampling.h
+++ b/src/include/utils/sampling.h
@@ -15,7 +15,12 @@
#include "storage/bufmgr.h"
-extern double sampler_random_fract(void);
+/* Random generator for sampling code */
+typedef unsigned short SamplerRandomState[3];
+
+extern void sampler_random_init_state(long seed,
+ SamplerRandomState randstate);
+extern double sampler_random_fract(SamplerRandomState randstate);
/* Block sampling methods */
/* Data structure for Algorithm S from Knuth 3.4.2 */
@@ -25,6 +30,7 @@ typedef struct
int n; /* desired sample size */
BlockNumber t; /* current block number */
int m; /* blocks selected so far */
+ SamplerRandomState randstate; /* random generator state */
} BlockSamplerData;
typedef BlockSamplerData *BlockSampler;
@@ -35,7 +41,12 @@ extern bool BlockSampler_HasMore(BlockSampler bs);
extern BlockNumber BlockSampler_Next(BlockSampler bs);
/* Reservoid sampling methods */
-typedef double ReservoirStateData;
+typedef struct
+{
+ double W;
+ SamplerRandomState randstate; /* random generator state */
+} ReservoirStateData;
+
typedef ReservoirStateData *ReservoirState;
extern void reservoir_init_selection_state(ReservoirState rs, int n);
diff --git a/src/include/utils/syscache.h b/src/include/utils/syscache.h
index ba0b090..6b628f6 100644
--- a/src/include/utils/syscache.h
+++ b/src/include/utils/syscache.h
@@ -79,6 +79,8 @@ enum SysCacheIdentifier
RELOID,
RULERELNAME,
STATRELATTINH,
+ TABLESAMPLEMETHODNAME,
+ TABLESAMPLEMETHODOID,
TABLESPACEOID,
TSCONFIGMAP,
TSCONFIGNAMENSP,
diff --git a/src/include/utils/tablesample.h b/src/include/utils/tablesample.h
new file mode 100644
index 0000000..3c97a04
--- /dev/null
+++ b/src/include/utils/tablesample.h
@@ -0,0 +1,33 @@
+/*--------------------------------------------------------------------------
+ * tablesample.h
+ * Header file for builtin table sampling methods.
+ *
+ * Copyright (c) 2006-2014, PostgreSQL Global Development Group
+ *
+ * src/include/utils/tablesample.h
+ *--------------------------------------------------------------------------
+ */
+#ifndef TABLESAMPLE_H
+#define TABLESAMPLE_H
+
+typedef enum SamplerAccessStrategy
+{
+ SAS_RANDOM,
+ SAS_SEQUENTIAL
+} SamplerAccessStrategy;
+
+extern Datum tsm_system_init(PG_FUNCTION_ARGS);
+extern Datum tsm_system_nextblock(PG_FUNCTION_ARGS);
+extern Datum tsm_system_nexttuple(PG_FUNCTION_ARGS);
+extern Datum tsm_system_end(PG_FUNCTION_ARGS);
+extern Datum tsm_system_reset(PG_FUNCTION_ARGS);
+extern Datum tsm_system_cost(PG_FUNCTION_ARGS);
+
+extern Datum tsm_bernoulli_init(PG_FUNCTION_ARGS);
+extern Datum tsm_bernoulli_nextblock(PG_FUNCTION_ARGS);
+extern Datum tsm_bernoulli_nexttuple(PG_FUNCTION_ARGS);
+extern Datum tsm_bernoulli_end(PG_FUNCTION_ARGS);
+extern Datum tsm_bernoulli_reset(PG_FUNCTION_ARGS);
+extern Datum tsm_bernoulli_cost(PG_FUNCTION_ARGS);
+
+#endif /* TABLESAMPLE_H */
diff --git a/src/test/regress/expected/sanity_check.out b/src/test/regress/expected/sanity_check.out
index c7be273..5946edf 100644
--- a/src/test/regress/expected/sanity_check.out
+++ b/src/test/regress/expected/sanity_check.out
@@ -127,6 +127,7 @@ pg_shdepend|t
pg_shdescription|t
pg_shseclabel|t
pg_statistic|t
+pg_tablesample_method|t
pg_tablespace|t
pg_trigger|t
pg_ts_config|t
diff --git a/src/test/regress/expected/tablesample.out b/src/test/regress/expected/tablesample.out
new file mode 100644
index 0000000..9b387a2
--- /dev/null
+++ b/src/test/regress/expected/tablesample.out
@@ -0,0 +1,165 @@
+CREATE TABLE test_tablesample (id INT, name text) WITH (fillfactor=10); -- force smaller pages so we don't have to load too much data to get multiple pages
+INSERT INTO test_tablesample SELECT i, repeat(i::text, 200) FROM generate_series(0, 9) s(i) ORDER BY i;
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50) REPEATABLE (10);
+ id
+----
+ 0
+ 1
+ 2
+ 3
+ 4
+ 5
+ 9
+(7 rows)
+
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (100.0/11) REPEATABLE (9999);
+ id
+----
+ 6
+ 7
+ 8
+(3 rows)
+
+SELECT count(*) FROM test_tablesample TABLESAMPLE SYSTEM (100);
+ count
+-------
+ 10
+(1 row)
+
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50) REPEATABLE (100);
+ id
+----
+ 0
+ 1
+ 2
+ 6
+ 7
+ 8
+ 9
+(7 rows)
+
+SELECT id FROM test_tablesample TABLESAMPLE BERNOULLI (50) REPEATABLE (100);
+ id
+----
+ 0
+ 1
+ 3
+ 4
+ 5
+(5 rows)
+
+SELECT id FROM test_tablesample TABLESAMPLE BERNOULLI (5.5) REPEATABLE (1);
+ id
+----
+ 0
+ 5
+(2 rows)
+
+CREATE VIEW test_tablesample_v1 AS SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (10*2) REPEATABLE (2);
+CREATE VIEW test_tablesample_v2 AS SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (99);
+SELECT pg_get_viewdef('test_tablesample_v1'::regclass);
+ pg_get_viewdef
+--------------------------------------------------------------------------------
+ SELECT test_tablesample.id +
+ FROM test_tablesample TABLESAMPLE system (((10 * 2))::real) REPEATABLE (2);
+(1 row)
+
+SELECT pg_get_viewdef('test_tablesample_v2'::regclass);
+ pg_get_viewdef
+-----------------------------------------------------------
+ SELECT test_tablesample.id +
+ FROM test_tablesample TABLESAMPLE system ((99)::real);
+(1 row)
+
+BEGIN;
+DECLARE tablesample_cur CURSOR FOR SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50) REPEATABLE (100);
+FETCH FIRST FROM tablesample_cur;
+ id
+----
+ 0
+(1 row)
+
+FETCH NEXT FROM tablesample_cur;
+ id
+----
+ 1
+(1 row)
+
+FETCH NEXT FROM tablesample_cur;
+ id
+----
+ 2
+(1 row)
+
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50) REPEATABLE (10);
+ id
+----
+ 0
+ 1
+ 2
+ 3
+ 4
+ 5
+ 9
+(7 rows)
+
+FETCH NEXT FROM tablesample_cur;
+ id
+----
+ 6
+(1 row)
+
+FETCH NEXT FROM tablesample_cur;
+ id
+----
+ 7
+(1 row)
+
+FETCH NEXT FROM tablesample_cur;
+ id
+----
+ 8
+(1 row)
+
+FETCH FIRST FROM tablesample_cur;
+ id
+----
+ 0
+(1 row)
+
+FETCH NEXT FROM tablesample_cur;
+ id
+----
+ 1
+(1 row)
+
+FETCH NEXT FROM tablesample_cur;
+ id
+----
+ 2
+(1 row)
+
+FETCH NEXT FROM tablesample_cur;
+ id
+----
+ 6
+(1 row)
+
+FETCH NEXT FROM tablesample_cur;
+ id
+----
+ 7
+(1 row)
+
+FETCH NEXT FROM tablesample_cur;
+ id
+----
+ 8
+(1 row)
+
+CLOSE tablesample_cur;
+END;
+DROP TABLE test_tablesample CASCADE;
+NOTICE: drop cascades to 2 other objects
+DETAIL: drop cascades to view test_tablesample_v1
+drop cascades to view test_tablesample_v2
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index e0ae2f2..e0240ac 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -83,7 +83,7 @@ test: select_into select_distinct select_distinct_on select_implicit select_havi
# ----------
# Another group of parallel tests
# ----------
-test: brin gin gist spgist privileges security_label collate matview lock replica_identity rowsecurity object_address
+test: brin gin gist spgist privileges security_label collate matview lock replica_identity rowsecurity object_address tablesample
# ----------
# Another group of parallel tests
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 7f762bd..9a7611b 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -152,3 +152,4 @@ test: with
test: xml
test: event_trigger
test: stats
+test: tablesample
diff --git a/src/test/regress/sql/tablesample.sql b/src/test/regress/sql/tablesample.sql
new file mode 100644
index 0000000..2b89b55
--- /dev/null
+++ b/src/test/regress/sql/tablesample.sql
@@ -0,0 +1,39 @@
+CREATE TABLE test_tablesample (id INT, name text) WITH (fillfactor=10); -- force smaller pages so we don't have to load too much data to get multiple pages
+
+INSERT INTO test_tablesample SELECT i, repeat(i::text, 200) FROM generate_series(0, 9) s(i) ORDER BY i;
+
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50) REPEATABLE (10);
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (100.0/11) REPEATABLE (9999);
+SELECT count(*) FROM test_tablesample TABLESAMPLE SYSTEM (100);
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50) REPEATABLE (100);
+SELECT id FROM test_tablesample TABLESAMPLE BERNOULLI (50) REPEATABLE (100);
+SELECT id FROM test_tablesample TABLESAMPLE BERNOULLI (5.5) REPEATABLE (1);
+
+CREATE VIEW test_tablesample_v1 AS SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (10*2) REPEATABLE (2);
+CREATE VIEW test_tablesample_v2 AS SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (99);
+SELECT pg_get_viewdef('test_tablesample_v1'::regclass);
+SELECT pg_get_viewdef('test_tablesample_v2'::regclass);
+
+BEGIN;
+DECLARE tablesample_cur CURSOR FOR SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50) REPEATABLE (100);
+FETCH FIRST FROM tablesample_cur;
+FETCH NEXT FROM tablesample_cur;
+FETCH NEXT FROM tablesample_cur;
+
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50) REPEATABLE (10);
+
+FETCH NEXT FROM tablesample_cur;
+FETCH NEXT FROM tablesample_cur;
+FETCH NEXT FROM tablesample_cur;
+
+FETCH FIRST FROM tablesample_cur;
+FETCH NEXT FROM tablesample_cur;
+FETCH NEXT FROM tablesample_cur;
+FETCH NEXT FROM tablesample_cur;
+FETCH NEXT FROM tablesample_cur;
+FETCH NEXT FROM tablesample_cur;
+
+CLOSE tablesample_cur;
+END;
+
+DROP TABLE test_tablesample CASCADE;
--
1.9.1
Hi, I took a look on this and found nice.
By the way, the parameter for REPEATABLE seems allowing to be a
expression in ParseTableSample but the grammer rejects it.
The following change seems enough.
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 4578b5e..8cf09d5 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -10590,7 +10590,7 @@ tablesample_clause:
;
opt_repeatable_clause:
- REPEATABLE '(' AexprConst ')' { $$ = (Node *) $3; }
+ REPEATABLE '(' a_expr ')' { $$ = (Node *) $3; }
| /*EMPTY*/ { $$ = NULL; }
;
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hello,
On 19/01/15 07:08, Amit Kapila wrote:
On Sun, Jan 18, 2015 at 12:46 AM, Petr Jelinek <petr@2ndquadrant.com
<mailto:petr@2ndquadrant.com>> wrote:
No issues, but it seems we should check other paths where
different handling could be required for tablesample scan.
In set_rel_size(), it uses normal path for heapscan (set_plain_rel_size())
for rel size estimates where it checks the presence of partial indexes
and that might effect the size estimates and that doesn't seem to
be required when we have to perform scan based on TableSample
method.I think that's actually good to have, because we still do costing and
the partial index might help produce better estimate of number of
returned rows for tablesample as well.
As an issue related to size esmation, I got a explain result as
following,
=# explain (analyze on, buffers on) select a from t1 tablesample system(10) where a < 50000;
QUERY PLAN
--------------------------------------------------------------------------------
Sample Scan on t1 (cost=0.00..301.00 rows=10000 width=4) (actual time=0.015..2
.920 rows=4294 loops=1)
Filter: (a < 50000)
Rows Removed by Filter: 5876
Buffers: shared hit=45
actual rows is large as twice as the estimated. tsm_system_cost
estimates the number of the result rows using baserel->tuples,
not using baserel->rows so it doesn't reflect the scan quals. Did
the code come from some other intention?
4.
SampleNext()
a.
Isn't it better to code SampleNext() similar to SeqNext() and
IndexNext(), which encapsulate the logic to get the tuple in
another function(tbs_next() or something like that)?Possibly, my thinking was that unlike the index_getnext() and
heap_getnext(), this function would not be called from any other
place so adding one more layer of abstraction didn't seem useful.
And it would mean creating new ScanDesc struct, etc.I think adding additional abstraction would simplify the function
and reduce the chance of discrepency, I think we need to almost
create a duplicate version of code for heapgetpage() method.
Yeah, I agree that we need to build structure like ScanDesc, but
still it will be worth to keep code consistent.Well similar, not same as we are not always fetching whole page or doing
visibility checks on all tuples in the page, etc. But I don't see why it
can't be inside nodeSamplescan. If you look at bitmap heap scan, that
one is also essentially somewhat modified sequential scan and does
everything within the node nodeBitmapHeapscan because the algorithm is
not used anywhere else, just like sample scan.
As a general opinion, I'll vote for Amit's comment, since three
or four similar instances seems to be a enough reason to abstract
it. On the other hand, since the scan processes are distributed
in ExecProcNode by the nodeTag of scan nodes, reunioning of the
control by abstraction layer after that could be said to
introducing useless annoyance. In short, bastraction at the level
of *Next() would bring the necessity of additional changes around
the execution node distribution.
Another one is PageIsAllVisible() check.
Done.
Another thing is don't we want to do anything for sync scans
for these method's as they are doing heap scan?I don't follow this one tbh.
Refer parameter (HeapScanDesc->rs_syncscan) and syncscan.c.
Basically during sequiantial scan on same table by different
backends, we attempt to keep them synchronized to reduce the I/O.Ah this, yes, it makes sense for bernoulli, not for system though. I
guess it should be used for sampling methods that use SAS_SEQUENTIAL
strategy.c.
for bernoulli method, it will first get the tupoffset with
the help of function and then apply visibility check, it seems
that could reduce the sample set way lower than expected
in case lot of tuples are not visible, shouldn't it be done in
reverse
way(first visibility check, then call function to see if we want to
include it in the sample)?
I think something similar is done in acquire_sample_rows which
seems right to me.I don't think so, the way bernoulli works is that it returns every
tuple with equal probability, so the visible tuples have same chance
of being returned as the invisible ones so the issue should be
smoothed away automatically (IMHO).How will it get smoothen for cases when let us say
more than 50% of tuples are not visible. I think its
due to Postgres non-overwritting storage architecture
that we maintain multiple version of rows in same heap,
otherwise for different kind of architecture (like mysql/oracle)
where multiple row versions are not maintained in same
heap, the same function (with same percentage) can return
entirely different number of rows.I don't know how else to explain, if we loop over every tuple in the
relation and return it with equal probability then visibility checks
don't matter as the percentage of visible tuples will be same in the
result as in the relation.
Surely it migh yield the effectively the same result. Even so,
this code needs exaplation comment about the nature aroud the
code, or write code as MMVC people won't get confused, I suppose.
Whoops, time's up. Sorry for the incomplete comment.
regards,
For example if you have 30% visible tuples and you are interested in 10%
of tuples overall it will return 10% of all tuples and since every tuple
has 10% chance of being returned, 30% of those should be visible (if we
assume smooth distribution of random numbers generated). So in the end
you are getting 10% of visible tuples because the scan will obviously
skip the invisible ones and that's what you wanted.As I said problem of analyze is that it uses tuple limit instead of
probability.5.
So above test yield 60% rows first time and 100% rows next time,
when the test has requested 80%.
I understand that sample percentage will result an approximate
number of rows, however I just wanted that we should check if the
implementation has any problem or not?I think this is caused by random generator not producing smooth
enough random distribution on such a small sample (or possibly in
general?).Yeah it could be possible, I feel we should check with large
sample of rows to see if there is any major difference?In this regard, I have one question related to below code:
Why are we not considering tuple in above code
if tupoffset is less than maxoffset?We consider it, I will rename the samplesize to probability or
something to make it more clear what it actually is and maybe expand
the comment above the loop.Yes the differences is smaller (in relative numbers) for bigger tables
when I test this. On 1k row table the spread of 20 runs was between
770-818 and on 100k row table it's between 79868-80154. I think that's
acceptable variance and it's imho indeed the random generator that
produces this.Oh and BTW when I delete 50k of tuples (make them invisible) the results
of 20 runs are between 30880 and 40063 rows.I originally started from SeqScan but well, it requires completely
different State structure, it needs to call sampling methods in
various places (not just for next tuple), it needs different
handling in explain and in optimizer (costing). If we add all this
to sequential scan it will not be that much different from new scan
node (we'd save the couple of new copy functions and one struct, but
I'd rather have clearly distinct scan).I understand that point, but I think it is worth considering if
it can be done as SeqScan node especially because plan node
doesn't need to store any additional information for sample scan.I think this point needs some more thoughts and if none of us
could come with a clean way to do it via seqscan node then we can
proceed with a separate node idea.Another reason why I am asking to consider it is that after
spending effort on further review and making the current approach
robust, it should not happen that someone else (probably one
of the committers) should say that it can be done with sequence scan
node without much problems.I am sure it could be done with sequence scan. Not sure if it would be
pretty and more importantly, the TABLESAMPLE is *not* sequential scan.
The fact that bernoulli method looks similar should not make us assume
that every other method does as well, especially when system method is
completely different.One another separate observation:
typedef struct SamplePath
{
Pathpath;
Oidtsmcost;/*
table sample method costing function */
List *tsmargs;/* arguments to a TABLESAMPLE clause
*/
} SamplePath;a.
Do we really need to have tsmcost and tsmargs stored in SamplePath
when we don't want to maintain them in plan (SamplePlan), patch gets
all the info via RTE in executor, so it seems to me we can do without
them.Hmm yeah, we actually don't, I removed it.
Anyway, attached is new version with some updates that you mentioned
(all_visible, etc).
I also added optional interface for the sampling method to access the
tuple contents as I can imagine sampling methods that will want to do
that. And I updated the test/sample module to use this new api.--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Kyotaro Horiguchi
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 28/01/15 09:41, Kyotaro HORIGUCHI wrote:
As an issue related to size esmation, I got a explain result as
following,=# explain (analyze on, buffers on) select a from t1 tablesample system(10) where a < 50000;
QUERY PLAN
--------------------------------------------------------------------------------
Sample Scan on t1 (cost=0.00..301.00 rows=10000 width=4) (actual time=0.015..2
.920 rows=4294 loops=1)
Filter: (a < 50000)
Rows Removed by Filter: 5876
Buffers: shared hit=45actual rows is large as twice as the estimated. tsm_system_cost
estimates the number of the result rows using baserel->tuples,
not using baserel->rows so it doesn't reflect the scan quals. Did
the code come from some other intention?
No, that's a bug.
4.
SampleNext()
a.
Isn't it better to code SampleNext() similar to SeqNext() and
IndexNext(), which encapsulate the logic to get the tuple in
another function(tbs_next() or something like that)?Possibly, my thinking was that unlike the index_getnext() and
heap_getnext(), this function would not be called from any other
place so adding one more layer of abstraction didn't seem useful.
And it would mean creating new ScanDesc struct, etc.I think adding additional abstraction would simplify the function
and reduce the chance of discrepency, I think we need to almost
create a duplicate version of code for heapgetpage() method.
Yeah, I agree that we need to build structure like ScanDesc, but
still it will be worth to keep code consistent.Well similar, not same as we are not always fetching whole page or doing
visibility checks on all tuples in the page, etc. But I don't see why it
can't be inside nodeSamplescan. If you look at bitmap heap scan, that
one is also essentially somewhat modified sequential scan and does
everything within the node nodeBitmapHeapscan because the algorithm is
not used anywhere else, just like sample scan.As a general opinion, I'll vote for Amit's comment, since three
or four similar instances seems to be a enough reason to abstract
it. On the other hand, since the scan processes are distributed
in ExecProcNode by the nodeTag of scan nodes, reunioning of the
control by abstraction layer after that could be said to
introducing useless annoyance. In short, bastraction at the level
of *Next() would bring the necessity of additional changes around
the execution node distribution.
Yes, that's my view too. I would generally be for that change also and
it would be worth it if the code was used in more than one place, but as
it is it seems like it will just add code/complexity for no real
benefit. It would make sense in case we used sequential scan node
instead of the new node as Amit also suggested, but I remain unconvinced
that mixing sampling and sequential scan into single scan node would be
a good idea.
How will it get smoothen for cases when let us say
more than 50% of tuples are not visible. I think its
due to Postgres non-overwritting storage architecture
that we maintain multiple version of rows in same heap,
otherwise for different kind of architecture (like mysql/oracle)
where multiple row versions are not maintained in same
heap, the same function (with same percentage) can return
entirely different number of rows.I don't know how else to explain, if we loop over every tuple in the
relation and return it with equal probability then visibility checks
don't matter as the percentage of visible tuples will be same in the
result as in the relation.Surely it migh yield the effectively the same result. Even so,
this code needs exaplation comment about the nature aroud the
code, or write code as MMVC people won't get confused, I suppose.
Yes it does, but as I am failing to explain even here, it's not clear to
me what to write there. From my point of view it's just effect of the
essential property of bernoulli sampling which is that every element has
equal probability of being included in the sample. It comes from the
fact that we do bernoulli trial (in the code it's the while
(sampler_random_fract(sampler->randstate) > probability) loop) on every
individual tuple.
This means that the ratio of visible and invisible tuples should be same
in the sample as it is in the relation. We then just skip the invisible
tuples and get the correct percentage of the visible ones (this has
performance benefit of not having to do visibility check on all tuples).
If that wasn't true than the bernoulli sampling would just simply not
work as intended as the same property is the reason why it's used in
statistics - the trends should look the same in sample as they look in
the source population.
Obviously there is some variation in practice as we don't have perfect
random generator but that's independent of the algorithm.
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 28/01/15 08:23, Kyotaro HORIGUCHI wrote:
Hi, I took a look on this and found nice.
By the way, the parameter for REPEATABLE seems allowing to be a
expression in ParseTableSample but the grammer rejects it.
It wasn't my intention to support it, but you are correct, the code is
generic enough that we can support it.
The following change seems enough.
Seems about right, thanks.
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Jan 28, 2015 at 5:19 AM, Petr Jelinek <petr@2ndquadrant.com> wrote:
Yes, that's my view too. I would generally be for that change also and it
would be worth it if the code was used in more than one place, but as it is
it seems like it will just add code/complexity for no real benefit. It would
make sense in case we used sequential scan node instead of the new node as
Amit also suggested, but I remain unconvinced that mixing sampling and
sequential scan into single scan node would be a good idea.
Based on previous experience, I expect that any proposal to merge
those nodes would get shot down by Tom with his laser-guided atomic
bazooka faster than you can say "-1 from me regards tom lane".
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Jan 29, 2015 at 11:08:55AM -0500, Robert Haas wrote:
On Wed, Jan 28, 2015 at 5:19 AM, Petr Jelinek <petr@2ndquadrant.com> wrote:
Yes, that's my view too. I would generally be for that change also and it
would be worth it if the code was used in more than one place, but as it is
it seems like it will just add code/complexity for no real benefit. It would
make sense in case we used sequential scan node instead of the new node as
Amit also suggested, but I remain unconvinced that mixing sampling and
sequential scan into single scan node would be a good idea.Based on previous experience, I expect that any proposal to merge
those nodes would get shot down by Tom with his laser-guided atomic
bazooka faster than you can say "-1 from me regards tom lane".
Do we get illustrations with that? ;-) I want a poster for my wall!
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ Everyone has their own god. +
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 1/29/15 10:44 AM, Bruce Momjian wrote:
On Thu, Jan 29, 2015 at 11:08:55AM -0500, Robert Haas wrote:
On Wed, Jan 28, 2015 at 5:19 AM, Petr Jelinek <petr@2ndquadrant.com> wrote:
Yes, that's my view too. I would generally be for that change also and it
would be worth it if the code was used in more than one place, but as it is
it seems like it will just add code/complexity for no real benefit. It would
make sense in case we used sequential scan node instead of the new node as
Amit also suggested, but I remain unconvinced that mixing sampling and
sequential scan into single scan node would be a good idea.Based on previous experience, I expect that any proposal to merge
those nodes would get shot down by Tom with his laser-guided atomic
bazooka faster than you can say "-1 from me regards tom lane".Do we get illustrations with that? ;-) I want a poster for my wall!
+1. It should also be the tshirt for the next pgCon. ;)
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Jan 23, 2015 at 5:39 AM, Petr Jelinek <petr@2ndquadrant.com> wrote:
On 19/01/15 07:08, Amit Kapila wrote:
On Sun, Jan 18, 2015 at 12:46 AM, Petr Jelinek <petr@2ndquadrant.com
<mailto:petr@2ndquadrant.com>> wrote:I think that's actually good to have, because we still do costing and the
partial index might help produce better estimate of number of returned rows
for tablesample as well.
I don't understand how will it help, because for tablesample scan
it doesn't consider partial index at all as per patch.
CREATE TABLE test_tablesample (id INT, name text) WITH (fillfactor=10);
INSERT INTO test_tablesample SELECT i, repeat(i::text, 200) FROM
generate_series(0, 9) s(i) ORDER BY i;
INSERT INTO test_tablesample values(generate_series(10,10000);
create index idx_tblsample on test_tablesample(id) where id>9999;
Analyze test_tablesample;
postgres=# explain SELECT id FROM test_tablesample where id > 9999;
QUERY PLAN
--------------------------------------------------------------------------------
-----------
Index Only Scan using idx_tblsample on test_tablesample (cost=0.13..8.14
rows=
1 width=4)
Index Cond: (id > 9999)
(2 rows)
postgres=# explain SELECT id FROM test_tablesample TABLESAMPLE BERNOULLI
(80) wh
ere id > 9999;
QUERY PLAN
------------------------------------------------------------------------
Sample Scan on test_tablesample (cost=0.00..658.00 rows=8000 width=4)
Filter: (id > 9999)
(2 rows)
The above result also clearly indicate that when TABLESAMPLE
clause is used, it won't use partial index.
Well similar, not same as we are not always fetching whole page or doing
visibility checks on all tuples in the page, etc. But I don't see why it
can't be inside nodeSamplescan. If you look at bitmap heap scan, that one
is also essentially somewhat modified sequential scan and does everything
within the node nodeBitmapHeapscan because the algorithm is not used
anywhere else, just like sample scan.
I don't mind doing everything in nodeSamplescan, however if
you can split the function, it will be easier to read and understand,
if you see in nodeBitmapHeapscan, that also has function like
bitgetpage().
This is just a suggestion and if you think that it can be splitted,
then it's okay, otherwise leave it as it is.
Refer parameter (HeapScanDesc->rs_syncscan) and syncscan.c.
Basically during sequiantial scan on same table by different
backends, we attempt to keep them synchronized to reduce the I/O.Ah this, yes, it makes sense for bernoulli, not for system though. I guess
it should be used for sampling methods that use SAS_SEQUENTIAL strategy.
Have you taken care of this in your latest patch?
I don't know how else to explain, if we loop over every tuple in the
relation and return it with equal probability then visibility checks don't
matter as the percentage of visible tuples will be same in the result as in
the relation.For example if you have 30% visible tuples and you are interested in 10%
of tuples overall it will return 10% of all tuples and since every tuple
has 10% chance of being returned, 30% of those should be visible (if we
assume smooth distribution of random numbers generated). So in the end you
are getting 10% of visible tuples because the scan will obviously skip the
invisible ones and that's what you wanted.As I said problem of analyze is that it uses tuple limit instead of
probability.
Okay, got the point.
Yes the differences is smaller (in relative numbers) for bigger tables
when I test this. On 1k row table the spread of 20 runs was between 770-818
and on 100k row table it's between 79868-80154. I think that's acceptable
variance and it's imho indeed the random generator that produces this.
Sure, I think this is acceptable.
Oh and BTW when I delete 50k of tuples (make them invisible) the results
of 20 runs are between 30880 and 40063 rows.
This is between 60% to 80%, lower than what is expected,
but I guess we can't do much for this except for trying with
reverse order for visibility test and sample tuple call,
you can decide if you want to try that once just to see if that
is better.
I am sure it could be done with sequence scan. Not sure if it would be
pretty and more importantly, the TABLESAMPLE is *not* sequential scan. The
fact that bernoulli method looks similar should not make us assume that
every other method does as well, especially when system method is
completely different.
Okay, lets keep it as separate.
Anyway, attached is new version with some updates that you mentioned
(all_visible, etc).
I also added optional interface for the sampling method to access the
tuple contents as I can imagine sampling methods that will want to do that.
+ /*
+ * Let the sampling method examine the actual tuple and decide if we
+ * should return it.
+ *
+ * Note that we let it examine even invisible tuples.
+ */
+ if (OidIsValid(node->tsmreturntuple.fn_oid))
+ {
+ found = DatumGetBool(FunctionCall4(&node->tsmreturntuple,
+ PointerGetDatum
(node),
+ UInt32GetDatum
(blockno),
+ PointerGetDatum
(tuple),
+ BoolGetDatum
(visible)));
+ /* XXX: better error */
+ if (found && !visible)
+ elog(ERROR, "Sampling method wanted to return invisible tuple");
+ }
You have mentioned in comment that let it examine invisible tuple,
but it is not clear why you want to allow examining invisible tuple
and then later return error, why can't it skips invisible tuple.
1.
How about statistics (pgstat_count_heap_getnext()) during
SampleNext as we do in heap_getnext?
2.
+static TupleTableSlot *
+SampleNext(SampleScanState *node)
+{
..
+ /*
+ * Lock the buffer so we can safely assess tuple
+ * visibility later.
+ */
+ LockBuffer(buffer, BUFFER_LOCK_SHARE);
..
}
When is this content lock released, shouldn't we release it after
checking visibility of tuple?
3.
+static TupleTableSlot *
+SampleNext(SampleScanState *node)
{
..
}
Currently in this function as soon as it sees one valid tuple,
it return's the same, however isn't it better to do some caching
for tuples on same page like we do in heapgetpage()
(scan->rs_vistuples[ntup++] = lineoff;). Basically that can avoid
taking content lock and some other overhead of operating on a
page.
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On 31/01/15 14:27, Amit Kapila wrote:
On Fri, Jan 23, 2015 at 5:39 AM, Petr Jelinek <petr@2ndquadrant.com
<mailto:petr@2ndquadrant.com>> wrote:On 19/01/15 07:08, Amit Kapila wrote:
On Sun, Jan 18, 2015 at 12:46 AM, Petr Jelinek
<petr@2ndquadrant.com <mailto:petr@2ndquadrant.com>
<mailto:petr@2ndquadrant.com <mailto:petr@2ndquadrant.com>>> wrote:I think that's actually good to have, because we still do costing
and the partial index might help produce better estimate of number
of returned rows for tablesample as well.I don't understand how will it help, because for tablesample scan
it doesn't consider partial index at all as per patch.
Oh I think we were talking abut different things then, I thought you
were talking about the index checks that planner/optimizer sometimes
does to get more accurate statistics. I'll take another look then.
Well similar, not same as we are not always fetching whole page or
doing visibility checks on all tuples in the page, etc. But I don't
see why it can't be inside nodeSamplescan. If you look at bitmap
heap scan, that one is also essentially somewhat modified sequential
scan and does everything within the node nodeBitmapHeapscan because
the algorithm is not used anywhere else, just like sample scan.I don't mind doing everything in nodeSamplescan, however if
you can split the function, it will be easier to read and understand,
if you see in nodeBitmapHeapscan, that also has function like
bitgetpage().
This is just a suggestion and if you think that it can be splitted,
then it's okay, otherwise leave it as it is.
Yeah I can split it to separate function within the nodeSamplescan, that
sounds reasonable.
Refer parameter (HeapScanDesc->rs_syncscan) and syncscan.c.
Basically during sequiantial scan on same table by different
backends, we attempt to keep them synchronized to reduce the I/O.Ah this, yes, it makes sense for bernoulli, not for system though. I
guess it should be used for sampling methods that use SAS_SEQUENTIAL
strategy.Have you taken care of this in your latest patch?
Not yet. I think I will need to make the strategy property of the
sampling method instead of returning it by costing function so that the
info can be used by the scan.
Oh and BTW when I delete 50k of tuples (make them invisible) the
results of 20 runs are between 30880 and 40063 rows.This is between 60% to 80%, lower than what is expected,
but I guess we can't do much for this except for trying with
reverse order for visibility test and sample tuple call,
you can decide if you want to try that once just to see if that
is better.
No, that's because I can't write properly, the lower number was supposed
to be 39880 which is well within the tolerance, sorry for the confusion
(9 and 0 are just too close).
Anyway, attached is new version with some updates that you mentioned
(all_visible, etc).
I also added optional interface for the sampling method to access
the tuple contents as I can imagine sampling methods that will want
to do that.+/* +* Let the sampling method examine the actual tuple and decide if we +* should return it. +* +* Note that we let it examine even invisible tuples. +*/ +if (OidIsValid(node->tsmreturntuple.fn_oid)) +{ +found = DatumGetBool(FunctionCall4(&node->tsmreturntuple, + PointerGetDatum (node), + UInt32GetDatum (blockno), + PointerGetDatum (tuple), + BoolGetDatum (visible))); +/* XXX: better error */ +if (found && !visible) +elog(ERROR, "Sampling method wanted to return invisible tuple"); +}You have mentioned in comment that let it examine invisible tuple,
but it is not clear why you want to allow examining invisible tuple
and then later return error, why can't it skips invisible tuple.
Well I think returning invisible tuples to user is bad idea and that's
why the check, but I also think it might make sense to examine the
invisible tuple for the sampling function in case it wants to create
some kind of stats about the scan and wants to use those for making the
decision about returning other tuples. The interface should be probably
called tsmexaminetuple instead to make it more clear what the intention is.
1.
How about statistics (pgstat_count_heap_getnext()) during
SampleNext as we do in heap_getnext?
Right, will add.
2. +static TupleTableSlot * +SampleNext(SampleScanState *node) +{ .. +/* +* Lock the buffer so we can safely assess tuple +* visibility later. +*/ +LockBuffer(buffer, BUFFER_LOCK_SHARE); .. }When is this content lock released, shouldn't we release it after
checking visibility of tuple?
Here,
+ if (!OffsetNumberIsValid(tupoffset))
+ {
+ UnlockReleaseBuffer(buffer);
but yes you are correct, it should be just released there and we can
unlock already after visibility check.
3.
+static TupleTableSlot *
+SampleNext(SampleScanState *node)
{
..
}Currently in this function as soon as it sees one valid tuple,
it return's the same, however isn't it better to do some caching
for tuples on same page like we do in heapgetpage()
(scan->rs_vistuples[ntup++] = lineoff;). Basically that can avoid
taking content lock and some other overhead of operating on a
page.
That's somewhat hard question, it would make sense in cases where we
read most of the page (which is true for system sampling for example)
but it would probably slow things down in case where we select small
number of tuples (like say 1) which is true for bernoulli with small
percentage parameter, it's actually quite easy to imagine that on really
big tables (which is where TABLESAMPLE makes sense) we might get blocks
where we don't actually read any tuples. This is where optimizing for
one sampling method will hurt another so I don't know what's better
here. Perhaps the sampling method should have option that says if it
prefers page mode reading or not, because only the author knows this.
Anyway, I am thinking of making the heapgetpage() public and using it
directly. It will mean that we have to initialize HeapScanDesc which
might add couple of lines but we anyway already have to keep last buffer
and last tuple and position info in the scan info so we can instead use
HeapScanDesc for that. There will couple of properties of HeapScanDesc
we don't use but I don't think we care.
BTW I don't expect to have time to work on this patch in next ~10 days
so I will move it to Feb commitfest.
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 31/01/15 20:08, Petr Jelinek wrote:
On 31/01/15 14:27, Amit Kapila wrote:
On Fri, Jan 23, 2015 at 5:39 AM, Petr Jelinek <petr@2ndquadrant.com
<mailto:petr@2ndquadrant.com>> wrote:On 19/01/15 07:08, Amit Kapila wrote:
On Sun, Jan 18, 2015 at 12:46 AM, Petr Jelinek
<petr@2ndquadrant.com <mailto:petr@2ndquadrant.com>
<mailto:petr@2ndquadrant.com <mailto:petr@2ndquadrant.com>>>
wrote:I think that's actually good to have, because we still do costing
and the partial index might help produce better estimate of number
of returned rows for tablesample as well.I don't understand how will it help, because for tablesample scan
it doesn't consider partial index at all as per patch.Oh I think we were talking abut different things then, I thought you
were talking about the index checks that planner/optimizer sometimes
does to get more accurate statistics. I'll take another look then.
You were correct here, I removed the index consideration code for
tablesample. I went through the planner to see if the RTE needs special
handling anywhere else and it does not seem like it to me.
Refer parameter (HeapScanDesc->rs_syncscan) and syncscan.c.
Basically during sequiantial scan on same table by different
backends, we attempt to keep them synchronized to reduce the I/O.Ah this, yes, it makes sense for bernoulli, not for system though. I
guess it should be used for sampling methods that use SAS_SEQUENTIAL
strategy.Have you taken care of this in your latest patch?
I added support for this, the sampling method has seqscan boolean now
which determines this.
+/* XXX: better error */ +if (found && !visible) +elog(ERROR, "Sampling method wanted to return invisible tuple"); +}You have mentioned in comment that let it examine invisible tuple,
but it is not clear why you want to allow examining invisible tuple
and then later return error, why can't it skips invisible tuple.Well I think returning invisible tuples to user is bad idea and that's
why the check, but I also think it might make sense to examine the
invisible tuple for the sampling function in case it wants to create
some kind of stats about the scan and wants to use those for making the
decision about returning other tuples. The interface should be probably
called tsmexaminetuple instead to make it more clear what the intention is.
I added comment to the code about this, explaining why I think it's god
idea.
1.
How about statistics (pgstat_count_heap_getnext()) during
SampleNext as we do in heap_getnext?
Added.
Otherwise I changed the code for getting next tuple to be in separate
function inside nodeSamplescan, it uses similar logic like heapgettup
and it uses heapgetpage (which is now exported) to read the blocks. And
uses normal HeapScanDesc like sequential and bitmapindex scans do.
I didn't add the whole page visibility caching as the tuple ids we get
from sampling methods don't map well to the visibility info we get from
heapgetpage (it maps to the values in the rs_vistuples array not to to
its indexes). Commented about it in code also.
I also fixed the estimation issue reported Kyotaro HORIGUCHI (that was
just me being stupid and using baserel->tuples instead of baserel->rows).
And I did some minor changes like pg_dump support, documented the new
catalog, rename tsmreturntuple to tsmexaminetuple which fits better
IMHO, etc.
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
0001-separate-block-sampling-functions-v2.patchtext/x-diff; name=0001-separate-block-sampling-functions-v2.patchDownload
>From de3bfeb6275228b03944fdf39cee0277d97c4449 Mon Sep 17 00:00:00 2001
From: Petr Jelinek <pjmodos@pjmodos.net>
Date: Wed, 7 Jan 2015 23:36:56 +0100
Subject: [PATCH 1/3] separate block sampling functions v2
---
contrib/file_fdw/file_fdw.c | 9 +-
contrib/postgres_fdw/postgres_fdw.c | 10 +-
src/backend/commands/analyze.c | 225 +----------------------------------
src/backend/utils/misc/Makefile | 2 +-
src/backend/utils/misc/sampling.c | 226 ++++++++++++++++++++++++++++++++++++
src/include/commands/vacuum.h | 3 -
src/include/utils/sampling.h | 44 +++++++
7 files changed, 287 insertions(+), 232 deletions(-)
create mode 100644 src/backend/utils/misc/sampling.c
create mode 100644 src/include/utils/sampling.h
diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index d569760..df732c0 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -34,6 +34,7 @@
#include "optimizer/var.h"
#include "utils/memutils.h"
#include "utils/rel.h"
+#include "utils/sampling.h"
PG_MODULE_MAGIC;
@@ -1005,7 +1006,7 @@ file_acquire_sample_rows(Relation onerel, int elevel,
{
int numrows = 0;
double rowstoskip = -1; /* -1 means not set yet */
- double rstate;
+ ReservoirStateData rstate;
TupleDesc tupDesc;
Datum *values;
bool *nulls;
@@ -1043,7 +1044,7 @@ file_acquire_sample_rows(Relation onerel, int elevel,
ALLOCSET_DEFAULT_MAXSIZE);
/* Prepare for sampling rows */
- rstate = anl_init_selection_state(targrows);
+ reservoir_init_selection_state(&rstate, targrows);
/* Set up callback to identify error line number. */
errcallback.callback = CopyFromErrorCallback;
@@ -1087,7 +1088,7 @@ file_acquire_sample_rows(Relation onerel, int elevel,
* not-yet-incremented value of totalrows as t.
*/
if (rowstoskip < 0)
- rowstoskip = anl_get_next_S(*totalrows, targrows, &rstate);
+ rowstoskip = reservoir_get_next_S(&rstate, *totalrows, targrows);
if (rowstoskip <= 0)
{
@@ -1095,7 +1096,7 @@ file_acquire_sample_rows(Relation onerel, int elevel,
* Found a suitable tuple, so save it, replacing one old tuple
* at random
*/
- int k = (int) (targrows * anl_random_fract());
+ int k = (int) (targrows * sampler_random_fract());
Assert(k >= 0 && k < targrows);
heap_freetuple(rows[k]);
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index d76e739..cbcba6e 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -37,6 +37,7 @@
#include "utils/lsyscache.h"
#include "utils/memutils.h"
#include "utils/rel.h"
+#include "utils/sampling.h"
PG_MODULE_MAGIC;
@@ -202,7 +203,7 @@ typedef struct PgFdwAnalyzeState
/* for random sampling */
double samplerows; /* # of rows fetched */
double rowstoskip; /* # of rows to skip before next sample */
- double rstate; /* random state */
+ ReservoirStateData rstate; /* state for reservoir sampling*/
/* working memory contexts */
MemoryContext anl_cxt; /* context for per-analyze lifespan data */
@@ -2393,7 +2394,7 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,
astate.numrows = 0;
astate.samplerows = 0;
astate.rowstoskip = -1; /* -1 means not set yet */
- astate.rstate = anl_init_selection_state(targrows);
+ reservoir_init_selection_state(&astate.rstate, targrows);
/* Remember ANALYZE context, and create a per-tuple temp context */
astate.anl_cxt = CurrentMemoryContext;
@@ -2533,13 +2534,12 @@ analyze_row_processor(PGresult *res, int row, PgFdwAnalyzeState *astate)
* analyze.c; see Jeff Vitter's paper.
*/
if (astate->rowstoskip < 0)
- astate->rowstoskip = anl_get_next_S(astate->samplerows, targrows,
- &astate->rstate);
+ astate->rowstoskip = reservoir_get_next_S(&astate->rstate, astate->samplerows, targrows);
if (astate->rowstoskip <= 0)
{
/* Choose a random reservoir element to replace. */
- pos = (int) (targrows * anl_random_fract());
+ pos = (int) (targrows * sampler_random_fract());
Assert(pos >= 0 && pos < targrows);
heap_freetuple(astate->rows[pos]);
}
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index d2856a3..fc9dd44 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -50,23 +50,13 @@
#include "utils/lsyscache.h"
#include "utils/memutils.h"
#include "utils/pg_rusage.h"
+#include "utils/sampling.h"
#include "utils/sortsupport.h"
#include "utils/syscache.h"
#include "utils/timestamp.h"
#include "utils/tqual.h"
-/* Data structure for Algorithm S from Knuth 3.4.2 */
-typedef struct
-{
- BlockNumber N; /* number of blocks, known in advance */
- int n; /* desired sample size */
- BlockNumber t; /* current block number */
- int m; /* blocks selected so far */
-} BlockSamplerData;
-
-typedef BlockSamplerData *BlockSampler;
-
/* Per-index data for ANALYZE */
typedef struct AnlIndexData
{
@@ -88,10 +78,6 @@ static BufferAccessStrategy vac_strategy;
static void do_analyze_rel(Relation onerel, VacuumStmt *vacstmt,
AcquireSampleRowsFunc acquirefunc, BlockNumber relpages,
bool inh, bool in_outer_xact, int elevel);
-static void BlockSampler_Init(BlockSampler bs, BlockNumber nblocks,
- int samplesize);
-static bool BlockSampler_HasMore(BlockSampler bs);
-static BlockNumber BlockSampler_Next(BlockSampler bs);
static void compute_index_stats(Relation onerel, double totalrows,
AnlIndexData *indexdata, int nindexes,
HeapTuple *rows, int numrows,
@@ -947,94 +933,6 @@ examine_attribute(Relation onerel, int attnum, Node *index_expr)
}
/*
- * BlockSampler_Init -- prepare for random sampling of blocknumbers
- *
- * BlockSampler is used for stage one of our new two-stage tuple
- * sampling mechanism as discussed on pgsql-hackers 2004-04-02 (subject
- * "Large DB"). It selects a random sample of samplesize blocks out of
- * the nblocks blocks in the table. If the table has less than
- * samplesize blocks, all blocks are selected.
- *
- * Since we know the total number of blocks in advance, we can use the
- * straightforward Algorithm S from Knuth 3.4.2, rather than Vitter's
- * algorithm.
- */
-static void
-BlockSampler_Init(BlockSampler bs, BlockNumber nblocks, int samplesize)
-{
- bs->N = nblocks; /* measured table size */
-
- /*
- * If we decide to reduce samplesize for tables that have less or not much
- * more than samplesize blocks, here is the place to do it.
- */
- bs->n = samplesize;
- bs->t = 0; /* blocks scanned so far */
- bs->m = 0; /* blocks selected so far */
-}
-
-static bool
-BlockSampler_HasMore(BlockSampler bs)
-{
- return (bs->t < bs->N) && (bs->m < bs->n);
-}
-
-static BlockNumber
-BlockSampler_Next(BlockSampler bs)
-{
- BlockNumber K = bs->N - bs->t; /* remaining blocks */
- int k = bs->n - bs->m; /* blocks still to sample */
- double p; /* probability to skip block */
- double V; /* random */
-
- Assert(BlockSampler_HasMore(bs)); /* hence K > 0 and k > 0 */
-
- if ((BlockNumber) k >= K)
- {
- /* need all the rest */
- bs->m++;
- return bs->t++;
- }
-
- /*----------
- * It is not obvious that this code matches Knuth's Algorithm S.
- * Knuth says to skip the current block with probability 1 - k/K.
- * If we are to skip, we should advance t (hence decrease K), and
- * repeat the same probabilistic test for the next block. The naive
- * implementation thus requires an anl_random_fract() call for each block
- * number. But we can reduce this to one anl_random_fract() call per
- * selected block, by noting that each time the while-test succeeds,
- * we can reinterpret V as a uniform random number in the range 0 to p.
- * Therefore, instead of choosing a new V, we just adjust p to be
- * the appropriate fraction of its former value, and our next loop
- * makes the appropriate probabilistic test.
- *
- * We have initially K > k > 0. If the loop reduces K to equal k,
- * the next while-test must fail since p will become exactly zero
- * (we assume there will not be roundoff error in the division).
- * (Note: Knuth suggests a "<=" loop condition, but we use "<" just
- * to be doubly sure about roundoff error.) Therefore K cannot become
- * less than k, which means that we cannot fail to select enough blocks.
- *----------
- */
- V = anl_random_fract();
- p = 1.0 - (double) k / (double) K;
- while (V < p)
- {
- /* skip */
- bs->t++;
- K--; /* keep K == N - t */
-
- /* adjust p to be new cutoff point in reduced range */
- p *= 1.0 - (double) k / (double) K;
- }
-
- /* select */
- bs->m++;
- return bs->t++;
-}
-
-/*
* acquire_sample_rows -- acquire a random sample of rows from the table
*
* Selected rows are returned in the caller-allocated array rows[], which
@@ -1080,7 +978,7 @@ acquire_sample_rows(Relation onerel, int elevel,
BlockNumber totalblocks;
TransactionId OldestXmin;
BlockSamplerData bs;
- double rstate;
+ ReservoirStateData rstate;
Assert(targrows > 0);
@@ -1090,9 +988,9 @@ acquire_sample_rows(Relation onerel, int elevel,
OldestXmin = GetOldestXmin(onerel, true);
/* Prepare for sampling block numbers */
- BlockSampler_Init(&bs, totalblocks, targrows);
+ BlockSampler_Init(&bs, totalblocks, targrows, random());
/* Prepare for sampling rows */
- rstate = anl_init_selection_state(targrows);
+ reservoir_init_selection_state(&rstate, targrows);
/* Outer loop over blocks to sample */
while (BlockSampler_HasMore(&bs))
@@ -1240,8 +1138,7 @@ acquire_sample_rows(Relation onerel, int elevel,
* t.
*/
if (rowstoskip < 0)
- rowstoskip = anl_get_next_S(samplerows, targrows,
- &rstate);
+ rowstoskip = reservoir_get_next_S(&rstate, samplerows, targrows);
if (rowstoskip <= 0)
{
@@ -1249,7 +1146,7 @@ acquire_sample_rows(Relation onerel, int elevel,
* Found a suitable tuple, so save it, replacing one
* old tuple at random
*/
- int k = (int) (targrows * anl_random_fract());
+ int k = (int) (targrows * sampler_random_fract());
Assert(k >= 0 && k < targrows);
heap_freetuple(rows[k]);
@@ -1308,116 +1205,6 @@ acquire_sample_rows(Relation onerel, int elevel,
return numrows;
}
-/* Select a random value R uniformly distributed in (0 - 1) */
-double
-anl_random_fract(void)
-{
- return ((double) random() + 1) / ((double) MAX_RANDOM_VALUE + 2);
-}
-
-/*
- * These two routines embody Algorithm Z from "Random sampling with a
- * reservoir" by Jeffrey S. Vitter, in ACM Trans. Math. Softw. 11, 1
- * (Mar. 1985), Pages 37-57. Vitter describes his algorithm in terms
- * of the count S of records to skip before processing another record.
- * It is computed primarily based on t, the number of records already read.
- * The only extra state needed between calls is W, a random state variable.
- *
- * anl_init_selection_state computes the initial W value.
- *
- * Given that we've already read t records (t >= n), anl_get_next_S
- * determines the number of records to skip before the next record is
- * processed.
- */
-double
-anl_init_selection_state(int n)
-{
- /* Initial value of W (for use when Algorithm Z is first applied) */
- return exp(-log(anl_random_fract()) / n);
-}
-
-double
-anl_get_next_S(double t, int n, double *stateptr)
-{
- double S;
-
- /* The magic constant here is T from Vitter's paper */
- if (t <= (22.0 * n))
- {
- /* Process records using Algorithm X until t is large enough */
- double V,
- quot;
-
- V = anl_random_fract(); /* Generate V */
- S = 0;
- t += 1;
- /* Note: "num" in Vitter's code is always equal to t - n */
- quot = (t - (double) n) / t;
- /* Find min S satisfying (4.1) */
- while (quot > V)
- {
- S += 1;
- t += 1;
- quot *= (t - (double) n) / t;
- }
- }
- else
- {
- /* Now apply Algorithm Z */
- double W = *stateptr;
- double term = t - (double) n + 1;
-
- for (;;)
- {
- double numer,
- numer_lim,
- denom;
- double U,
- X,
- lhs,
- rhs,
- y,
- tmp;
-
- /* Generate U and X */
- U = anl_random_fract();
- X = t * (W - 1.0);
- S = floor(X); /* S is tentatively set to floor(X) */
- /* Test if U <= h(S)/cg(X) in the manner of (6.3) */
- tmp = (t + 1) / term;
- lhs = exp(log(((U * tmp * tmp) * (term + S)) / (t + X)) / n);
- rhs = (((t + X) / (term + S)) * term) / t;
- if (lhs <= rhs)
- {
- W = rhs / lhs;
- break;
- }
- /* Test if U <= f(S)/cg(X) */
- y = (((U * (t + 1)) / term) * (t + S + 1)) / (t + X);
- if ((double) n < S)
- {
- denom = t;
- numer_lim = term + S;
- }
- else
- {
- denom = t - (double) n + S;
- numer_lim = t + 1;
- }
- for (numer = t + S; numer >= numer_lim; numer -= 1)
- {
- y *= numer / denom;
- denom -= 1;
- }
- W = exp(-log(anl_random_fract()) / n); /* Generate W in advance */
- if (exp(log(y) / n) <= (t + X) / t)
- break;
- }
- *stateptr = W;
- }
- return S;
-}
-
/*
* qsort comparator for sorting rows[] array
*/
diff --git a/src/backend/utils/misc/Makefile b/src/backend/utils/misc/Makefile
index 378b77e..7889101 100644
--- a/src/backend/utils/misc/Makefile
+++ b/src/backend/utils/misc/Makefile
@@ -15,7 +15,7 @@ include $(top_builddir)/src/Makefile.global
override CPPFLAGS := -I. -I$(srcdir) $(CPPFLAGS)
OBJS = guc.o help_config.o pg_rusage.o ps_status.o rls.o \
- superuser.o timeout.o tzparser.o
+ sampling.o superuser.o timeout.o tzparser.o
# This location might depend on the installation directories. Therefore
# we can't subsitute it into pg_config.h.
diff --git a/src/backend/utils/misc/sampling.c b/src/backend/utils/misc/sampling.c
new file mode 100644
index 0000000..1eeabaf
--- /dev/null
+++ b/src/backend/utils/misc/sampling.c
@@ -0,0 +1,226 @@
+/*-------------------------------------------------------------------------
+ *
+ * sampling.c
+ * Relation block sampling routines.
+ *
+ * Portions Copyright (c) 1996-2012, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ * src/backend/utils/misc/sampling.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include <math.h>
+
+#include "utils/sampling.h"
+
+
+/*
+ * BlockSampler_Init -- prepare for random sampling of blocknumbers
+ *
+ * BlockSampler provides algorithm for block level sampling of a relation
+ * as discussed on pgsql-hackers 2004-04-02 (subject "Large DB")
+ * It selects a random sample of samplesize blocks out of
+ * the nblocks blocks in the table. If the table has less than
+ * samplesize blocks, all blocks are selected.
+ *
+ * Since we know the total number of blocks in advance, we can use the
+ * straightforward Algorithm S from Knuth 3.4.2, rather than Vitter's
+ * algorithm.
+ */
+void
+BlockSampler_Init(BlockSampler bs, BlockNumber nblocks, int samplesize,
+ long randseed)
+{
+ bs->N = nblocks; /* measured table size */
+
+ /*
+ * If we decide to reduce samplesize for tables that have less or not much
+ * more than samplesize blocks, here is the place to do it.
+ */
+ bs->n = samplesize;
+ bs->t = 0; /* blocks scanned so far */
+ bs->m = 0; /* blocks selected so far */
+}
+
+bool
+BlockSampler_HasMore(BlockSampler bs)
+{
+ return (bs->t < bs->N) && (bs->m < bs->n);
+}
+
+BlockNumber
+BlockSampler_Next(BlockSampler bs)
+{
+ BlockNumber K = bs->N - bs->t; /* remaining blocks */
+ int k = bs->n - bs->m; /* blocks still to sample */
+ double p; /* probability to skip block */
+ double V; /* random */
+
+ Assert(BlockSampler_HasMore(bs)); /* hence K > 0 and k > 0 */
+
+ if ((BlockNumber) k >= K)
+ {
+ /* need all the rest */
+ bs->m++;
+ return bs->t++;
+ }
+
+ /*----------
+ * It is not obvious that this code matches Knuth's Algorithm S.
+ * Knuth says to skip the current block with probability 1 - k/K.
+ * If we are to skip, we should advance t (hence decrease K), and
+ * repeat the same probabilistic test for the next block. The naive
+ * implementation thus requires an sampler_random_fract() call for each
+ * block number. But we can reduce this to one sampler_random_fract()
+ * call per selected block, by noting that each time the while-test
+ * succeeds, we can reinterpret V as a uniform random number in the range
+ * 0 to p. Therefore, instead of choosing a new V, we just adjust p to be
+ * the appropriate fraction of its former value, and our next loop
+ * makes the appropriate probabilistic test.
+ *
+ * We have initially K > k > 0. If the loop reduces K to equal k,
+ * the next while-test must fail since p will become exactly zero
+ * (we assume there will not be roundoff error in the division).
+ * (Note: Knuth suggests a "<=" loop condition, but we use "<" just
+ * to be doubly sure about roundoff error.) Therefore K cannot become
+ * less than k, which means that we cannot fail to select enough blocks.
+ *----------
+ */
+ V = sampler_random_fract();
+ p = 1.0 - (double) k / (double) K;
+ while (V < p)
+ {
+ /* skip */
+ bs->t++;
+ K--; /* keep K == N - t */
+
+ /* adjust p to be new cutoff point in reduced range */
+ p *= 1.0 - (double) k / (double) K;
+ }
+
+ /* select */
+ bs->m++;
+ return bs->t++;
+}
+
+/*
+ * These two routines embody Algorithm Z from "Random sampling with a
+ * reservoir" by Jeffrey S. Vitter, in ACM Trans. Math. Softw. 11, 1
+ * (Mar. 1985), Pages 37-57. Vitter describes his algorithm in terms
+ * of the count S of records to skip before processing another record.
+ * It is computed primarily based on t, the number of records already read.
+ * The only extra state needed between calls is W, a random state variable.
+ *
+ * reservoir_init_selection_state computes the initial W value.
+ *
+ * Given that we've already read t records (t >= n), reservoir_get_next_S
+ * determines the number of records to skip before the next record is
+ * processed.
+ */
+void
+reservoir_init_selection_state(ReservoirState rs, int n)
+{
+ /* Initial value of W (for use when Algorithm Z is first applied) */
+ *rs = exp(-log(sampler_random_fract()) / n);
+}
+
+double
+reservoir_get_next_S(ReservoirState rs, double t, int n)
+{
+ double S;
+
+ /* The magic constant here is T from Vitter's paper */
+ if (t <= (22.0 * n))
+ {
+ /* Process records using Algorithm X until t is large enough */
+ double V,
+ quot;
+
+ V = sampler_random_fract(); /* Generate V */
+ S = 0;
+ t += 1;
+ /* Note: "num" in Vitter's code is always equal to t - n */
+ quot = (t - (double) n) / t;
+ /* Find min S satisfying (4.1) */
+ while (quot > V)
+ {
+ S += 1;
+ t += 1;
+ quot *= (t - (double) n) / t;
+ }
+ }
+ else
+ {
+ /* Now apply Algorithm Z */
+ double W = *rs;
+ double term = t - (double) n + 1;
+
+ for (;;)
+ {
+ double numer,
+ numer_lim,
+ denom;
+ double U,
+ X,
+ lhs,
+ rhs,
+ y,
+ tmp;
+
+ /* Generate U and X */
+ U = sampler_random_fract();
+ X = t * (W - 1.0);
+ S = floor(X); /* S is tentatively set to floor(X) */
+ /* Test if U <= h(S)/cg(X) in the manner of (6.3) */
+ tmp = (t + 1) / term;
+ lhs = exp(log(((U * tmp * tmp) * (term + S)) / (t + X)) / n);
+ rhs = (((t + X) / (term + S)) * term) / t;
+ if (lhs <= rhs)
+ {
+ W = rhs / lhs;
+ break;
+ }
+ /* Test if U <= f(S)/cg(X) */
+ y = (((U * (t + 1)) / term) * (t + S + 1)) / (t + X);
+ if ((double) n < S)
+ {
+ denom = t;
+ numer_lim = term + S;
+ }
+ else
+ {
+ denom = t - (double) n + S;
+ numer_lim = t + 1;
+ }
+ for (numer = t + S; numer >= numer_lim; numer -= 1)
+ {
+ y *= numer / denom;
+ denom -= 1;
+ }
+ W = exp(-log(sampler_random_fract()) / n); /* Generate W in advance */
+ if (exp(log(y) / n) <= (t + X) / t)
+ break;
+ }
+ *rs = W;
+ }
+ return S;
+}
+
+
+/*----------
+ * Random number generator used by sampling
+ *----------
+ */
+
+/* Select a random value R uniformly distributed in (0 - 1) */
+double
+sampler_random_fract()
+{
+ return ((double) random() + 1) / ((double) MAX_RANDOM_VALUE + 2);
+}
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index 4275484..d38fead 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -178,8 +178,5 @@ extern void lazy_vacuum_rel(Relation onerel, VacuumStmt *vacstmt,
extern void analyze_rel(Oid relid, VacuumStmt *vacstmt,
bool in_outer_xact, BufferAccessStrategy bstrategy);
extern bool std_typanalyze(VacAttrStats *stats);
-extern double anl_random_fract(void);
-extern double anl_init_selection_state(int n);
-extern double anl_get_next_S(double t, int n, double *stateptr);
#endif /* VACUUM_H */
diff --git a/src/include/utils/sampling.h b/src/include/utils/sampling.h
new file mode 100644
index 0000000..e3e7f9c
--- /dev/null
+++ b/src/include/utils/sampling.h
@@ -0,0 +1,44 @@
+/*-------------------------------------------------------------------------
+ *
+ * sampling.h
+ * definitions for sampling functions
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/utils/sampling.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef SAMPLING_H
+#define SAMPLING_H
+
+#include "storage/bufmgr.h"
+
+extern double sampler_random_fract(void);
+
+/* Block sampling methods */
+/* Data structure for Algorithm S from Knuth 3.4.2 */
+typedef struct
+{
+ BlockNumber N; /* number of blocks, known in advance */
+ int n; /* desired sample size */
+ BlockNumber t; /* current block number */
+ int m; /* blocks selected so far */
+} BlockSamplerData;
+
+typedef BlockSamplerData *BlockSampler;
+
+extern void BlockSampler_Init(BlockSampler bs, BlockNumber nblocks,
+ int samplesize, long randseed);
+extern bool BlockSampler_HasMore(BlockSampler bs);
+extern BlockNumber BlockSampler_Next(BlockSampler bs);
+
+/* Reservoid sampling methods */
+typedef double ReservoirStateData;
+typedef ReservoirStateData *ReservoirState;
+
+extern void reservoir_init_selection_state(ReservoirState rs, int n);
+extern double reservoir_get_next_S(ReservoirState rs, double t, int n);
+
+#endif /* SAMPLING_H */
--
1.9.1
0002-tablesample-v8.patchtext/x-diff; name=0002-tablesample-v8.patchDownload
>From 7ccd2dcbd95a8c759068bd034f810b22e93748d8 Mon Sep 17 00:00:00 2001
From: Petr Jelinek <pjmodos@pjmodos.net>
Date: Wed, 7 Jan 2015 23:37:55 +0100
Subject: [PATCH 2/3] tablesample v8
---
contrib/file_fdw/file_fdw.c | 2 +-
contrib/postgres_fdw/postgres_fdw.c | 2 +-
doc/src/sgml/catalogs.sgml | 112 +++++++
doc/src/sgml/ref/select.sgml | 38 ++-
src/backend/access/Makefile | 3 +-
src/backend/access/heap/heapam.c | 7 +-
src/backend/catalog/Makefile | 2 +-
src/backend/commands/analyze.c | 2 +-
src/backend/commands/explain.c | 7 +
src/backend/executor/Makefile | 2 +-
src/backend/executor/execAmi.c | 8 +
src/backend/executor/execCurrent.c | 1 +
src/backend/executor/execProcnode.c | 14 +
src/backend/executor/nodeSamplescan.c | 500 ++++++++++++++++++++++++++++
src/backend/nodes/copyfuncs.c | 59 ++++
src/backend/nodes/equalfuncs.c | 36 ++
src/backend/nodes/nodeFuncs.c | 12 +
src/backend/nodes/outfuncs.c | 47 +++
src/backend/nodes/readfuncs.c | 44 +++
src/backend/optimizer/path/allpaths.c | 49 +++
src/backend/optimizer/path/costsize.c | 68 ++++
src/backend/optimizer/plan/createplan.c | 69 ++++
src/backend/optimizer/plan/setrefs.c | 11 +
src/backend/optimizer/plan/subselect.c | 1 +
src/backend/optimizer/util/pathnode.c | 22 ++
src/backend/parser/gram.y | 40 ++-
src/backend/parser/parse_clause.c | 38 ++-
src/backend/parser/parse_func.c | 130 ++++++++
src/backend/utils/Makefile | 3 +-
src/backend/utils/adt/ruleutils.c | 50 +++
src/backend/utils/cache/syscache.c | 23 ++
src/backend/utils/misc/sampling.c | 33 +-
src/backend/utils/tablesample/Makefile | 17 +
src/backend/utils/tablesample/bernoulli.c | 224 +++++++++++++
src/backend/utils/tablesample/system.c | 185 ++++++++++
src/include/access/heapam.h | 1 +
src/include/access/relscan.h | 1 +
src/include/catalog/indexing.h | 5 +
src/include/catalog/pg_proc.h | 25 ++
src/include/catalog/pg_tablesample_method.h | 75 +++++
src/include/executor/nodeSamplescan.h | 24 ++
src/include/nodes/execnodes.h | 18 +
src/include/nodes/nodes.h | 4 +
src/include/nodes/parsenodes.h | 35 ++
src/include/nodes/plannodes.h | 6 +
src/include/optimizer/cost.h | 1 +
src/include/optimizer/pathnode.h | 2 +
src/include/parser/kwlist.h | 3 +-
src/include/parser/parse_func.h | 4 +
src/include/utils/rel.h | 1 -
src/include/utils/sampling.h | 15 +-
src/include/utils/syscache.h | 2 +
src/include/utils/tablesample.h | 27 ++
src/test/regress/expected/sanity_check.out | 1 +
src/test/regress/expected/tablesample.out | 165 +++++++++
src/test/regress/parallel_schedule | 2 +-
src/test/regress/serial_schedule | 1 +
src/test/regress/sql/tablesample.sql | 39 +++
58 files changed, 2288 insertions(+), 30 deletions(-)
create mode 100644 src/backend/executor/nodeSamplescan.c
create mode 100644 src/backend/utils/tablesample/Makefile
create mode 100644 src/backend/utils/tablesample/bernoulli.c
create mode 100644 src/backend/utils/tablesample/system.c
create mode 100644 src/include/catalog/pg_tablesample_method.h
create mode 100644 src/include/executor/nodeSamplescan.h
create mode 100644 src/include/utils/tablesample.h
create mode 100644 src/test/regress/expected/tablesample.out
create mode 100644 src/test/regress/sql/tablesample.sql
diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index df732c0..3fc3962 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -1096,7 +1096,7 @@ file_acquire_sample_rows(Relation onerel, int elevel,
* Found a suitable tuple, so save it, replacing one old tuple
* at random
*/
- int k = (int) (targrows * sampler_random_fract());
+ int k = (int) (targrows * sampler_random_fract(rstate.randstate));
Assert(k >= 0 && k < targrows);
heap_freetuple(rows[k]);
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index cbcba6e..95f196e 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -2539,7 +2539,7 @@ analyze_row_processor(PGresult *res, int row, PgFdwAnalyzeState *astate)
if (astate->rowstoskip <= 0)
{
/* Choose a random reservoir element to replace. */
- pos = (int) (targrows * sampler_random_fract());
+ pos = (int) (targrows * sampler_random_fract(astate->rstate.randstate));
Assert(pos >= 0 && pos < targrows);
heap_freetuple(astate->rows[pos]);
}
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 515a40e..6b4e32b 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -269,6 +269,11 @@
</row>
<row>
+ <entry><link linkend="catalog-pg-tablesample-method"><structname>pg_tablesample_method</structname></link></entry>
+ <entry>table sampling methods</entry>
+ </row>
+
+ <row>
<entry><link linkend="catalog-pg-tablespace"><structname>pg_tablespace</structname></link></entry>
<entry>tablespaces within this database cluster</entry>
</row>
@@ -5989,6 +5994,113 @@
</sect1>
+ <sect1 id="catalog-pg-tablesample-method">
+ <title><structname>pg_tabesample_method</structname></title>
+
+ <indexterm zone="catalog-pg-tablesample-method">
+ <primary>pg_am</primary>
+ </indexterm>
+
+ <para>
+ The catalog <structname>pg_tablesample_method</structname> stores
+ information about table sampling methods which can be used in
+ <command>TABLESAMPLE</command> clause of a <command>SELECT</command>
+ statement.
+ </para>
+
+ <table>
+ <title><structname>pg_tablesample_method</> Columns</title>
+
+ <tgroup cols="4">
+ <thead>
+ <row>
+ <entry>Name</entry>
+ <entry>Type</entry>
+ <entry>References</entry>
+ <entry>Description</entry>
+ </row>
+ </thead>
+ <tbody>
+
+ <row>
+ <entry><structfield>oid</structfield></entry>
+ <entry><type>oid</type></entry>
+ <entry></entry>
+ <entry>Row identifier (hidden attribute; must be explicitly selected)</entry>
+ </row>
+
+ <row>
+ <entry><structfield>tsmname</structfield></entry>
+ <entry><type>name</type></entry>
+ <entry></entry>
+ <entry>Name of the sampling method</entry>
+ </row>
+
+ <row>
+ <entry><structfield>tsmseqscan</structfield></entry>
+ <entry><type>bool</type></entry>
+ <entry></entry>
+ <entry>Does the sampling method scan the whole table sequentially?
+ </entry>
+ </row>
+
+ <row>
+ <entry><structfield>tsminit</structfield></entry>
+ <entry><type>regproc</type></entry>
+ <entry><literal><link linkend="catalog-pg-proc"><structname>pg_proc</structname></link>.oid</literal></entry>
+ <entry><quote>Initialize the sampling scan</quote> function</entry>
+ </row>
+
+ <row>
+ <entry><structfield>tsmnextblock</structfield></entry>
+ <entry><type>regproc</type></entry>
+ <entry><literal><link linkend="catalog-pg-proc"><structname>pg_proc</structname></link>.oid</literal></entry>
+ <entry><quote>Get next block number</quote> function</entry>
+ </row>
+
+ <row>
+ <entry><structfield>tsmnexttuple</structfield></entry>
+ <entry><type>regproc</type></entry>
+ <entry><literal><link linkend="catalog-pg-proc"><structname>pg_proc</structname></link>.oid</literal></entry>
+ <entry><quote>Get next tuple offset</quote> function</entry>
+ </row>
+
+ <row>
+ <entry><structfield>tsmexaminetuple</structfield></entry>
+ <entry><type>regproc</type></entry>
+ <entry><literal><link linkend="catalog-pg-proc"><structname>pg_proc</structname></link>.oid</literal></entry>
+ <entry>Function which examines the tuple contents and decides if to
+ return in, or zero if none</entry>
+ </row>
+
+ <row>
+ <entry><structfield>tsmend</structfield></entry>
+ <entry><type>regproc</type></entry>
+ <entry><literal><link linkend="catalog-pg-proc"><structname>pg_proc</structname></link>.oid</literal></entry>
+ <entry><quote>End the sampling scan</quote> function</entry>
+ </row>
+
+ <row>
+ <entry><structfield>tsmreset</structfield></entry>
+ <entry><type>regproc</type></entry>
+ <entry><literal><link linkend="catalog-pg-proc"><structname>pg_proc</structname></link>.oid</literal></entry>
+ <entry><quote>Restart the state of sampling scan</quote> function</entry>
+ </row>
+
+ <row>
+ <entry><structfield>tsmcost</structfield></entry>
+ <entry><type>regproc</type></entry>
+ <entry><literal><link linkend="catalog-pg-proc"><structname>pg_proc</structname></link>.oid</literal></entry>
+ <entry>Costing function</entry>
+ </row>
+
+ </tbody>
+ </tgroup>
+ </table>
+
+ </sect1>
+
+
<sect1 id="catalog-pg-tablespace">
<title><structname>pg_tablespace</structname></title>
diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml
index 01d24a5..407bf9d 100644
--- a/doc/src/sgml/ref/select.sgml
+++ b/doc/src/sgml/ref/select.sgml
@@ -49,7 +49,7 @@ SELECT [ ALL | DISTINCT [ ON ( <replaceable class="parameter">expression</replac
<phrase>where <replaceable class="parameter">from_item</replaceable> can be one of:</phrase>
- [ ONLY ] <replaceable class="parameter">table_name</replaceable> [ * ] [ [ AS ] <replaceable class="parameter">alias</replaceable> [ ( <replaceable class="parameter">column_alias</replaceable> [, ...] ) ] ]
+ [ ONLY ] <replaceable class="parameter">table_name</replaceable> [ * ] [ TABLESAMPLE <replaceable class="parameter">sampling_method</replaceable> ( <replaceable class="parameter">argument</replaceable> [, ...] ) [ REPEATABLE ( <replaceable class="parameter">seed</replaceable> ) ] ] [ [ AS ] <replaceable class="parameter">alias</replaceable> [ ( <replaceable class="parameter">column_alias</replaceable> [, ...] ) ] ]
[ LATERAL ] ( <replaceable class="parameter">select</replaceable> ) [ AS ] <replaceable class="parameter">alias</replaceable> [ ( <replaceable class="parameter">column_alias</replaceable> [, ...] ) ]
<replaceable class="parameter">with_query_name</replaceable> [ [ AS ] <replaceable class="parameter">alias</replaceable> [ ( <replaceable class="parameter">column_alias</replaceable> [, ...] ) ] ]
[ LATERAL ] <replaceable class="parameter">function_name</replaceable> ( [ <replaceable class="parameter">argument</replaceable> [, ...] ] )
@@ -317,6 +317,42 @@ TABLE [ ONLY ] <replaceable class="parameter">table_name</replaceable> [ * ]
</varlistentry>
<varlistentry>
+ <term>TABLESAMPLE <replaceable class="parameter">sampling_method</replaceable> ( <replaceable class="parameter">argument</replaceable> [, ...] ) [ REPEATABLE ( <replaceable class="parameter">seed</replaceable> ) ]</term>
+ <listitem>
+ <para>
+ Table sample clause after
+ <replaceable class="parameter">table_name</replaceable> indicates that
+ a <replaceable class="parameter">sampling_method</replaceable> should
+ be used to retrieve subset of rows in the table.
+ The <replaceable class="parameter">sampling_method</replaceable> can be
+ one of:
+ <itemizedlist>
+ <listitem>
+ <para><literal>SYSTEM</literal></para>
+ </listitem>
+ <listitem>
+ <para><literal>BERNOULLI</literal></para>
+ </listitem>
+ </itemizedlist>
+ Both of those sampling methods currently accept only single argument
+ which is the percent (floating point from 0 to 100) of the rows to
+ be returned.
+ The <literal>SYSTEM</literal> sampling method does block level
+ sampling with each block having same chance of being selected and
+ returns all rows from each selected block.
+ The <literal>BERNOULLI</literal> scans whole table and returns
+ individual rows with equal probability.
+ The optional numeric parameter <literal>REPEATABLE</literal> is used
+ as random seed for sampling. Note that subsequent commands may return
+ different results even if same <literal>REPEATABLE</literal> clause
+ was specified. This happens because <acronym>DML</acronym> statements
+ and maintenance operations such as <command>VACUUM</> affect physical
+ distribution of data.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><replaceable class="parameter">alias</replaceable></term>
<listitem>
<para>
diff --git a/src/backend/access/Makefile b/src/backend/access/Makefile
index 21721b4..238057a 100644
--- a/src/backend/access/Makefile
+++ b/src/backend/access/Makefile
@@ -8,6 +8,7 @@ subdir = src/backend/access
top_builddir = ../../..
include $(top_builddir)/src/Makefile.global
-SUBDIRS = brin common gin gist hash heap index nbtree rmgrdesc spgist transam
+SUBDIRS = brin common gin gist hash heap index nbtree rmgrdesc spgist \
+ transam
include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 46060bc..e847ece 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -293,9 +293,10 @@ initscan(HeapScanDesc scan, ScanKey key, bool is_rescan)
/*
* Currently, we don't have a stats counter for bitmap heap scans (but the
- * underlying bitmap index scans will be counted).
+ * underlying bitmap index scans will be counted) or sample scans (we only
+ * update stats for tuple fetches there)
*/
- if (!scan->rs_bitmapscan)
+ if (!scan->rs_bitmapscan && !scan->rs_samplescan)
pgstat_count_heap_scan(scan->rs_rd);
}
@@ -314,7 +315,7 @@ heap_setscanlimits(HeapScanDesc scan, BlockNumber startBlk, BlockNumber numBlks)
* In page-at-a-time mode it performs additional work, namely determining
* which tuples on the page are visible.
*/
-static void
+void
heapgetpage(HeapScanDesc scan, BlockNumber page)
{
Buffer buffer;
diff --git a/src/backend/catalog/Makefile b/src/backend/catalog/Makefile
index a403c64..34db3e6 100644
--- a/src/backend/catalog/Makefile
+++ b/src/backend/catalog/Makefile
@@ -39,7 +39,7 @@ POSTGRES_BKI_SRCS = $(addprefix $(top_srcdir)/src/include/catalog/,\
pg_ts_config.h pg_ts_config_map.h pg_ts_dict.h \
pg_ts_parser.h pg_ts_template.h pg_extension.h \
pg_foreign_data_wrapper.h pg_foreign_server.h pg_user_mapping.h \
- pg_foreign_table.h pg_policy.h \
+ pg_foreign_table.h pg_policy.h pg_tablesample_method.h \
pg_default_acl.h pg_seclabel.h pg_shseclabel.h pg_collation.h pg_range.h \
toasting.h indexing.h \
)
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index fc9dd44..63feb07 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -1146,7 +1146,7 @@ acquire_sample_rows(Relation onerel, int elevel,
* Found a suitable tuple, so save it, replacing one
* old tuple at random
*/
- int k = (int) (targrows * sampler_random_fract());
+ int k = (int) (targrows * sampler_random_fract(rstate.randstate));
Assert(k >= 0 && k < targrows);
heap_freetuple(rows[k]);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 7cfc9bb..22525af 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -732,6 +732,7 @@ ExplainPreScanNode(PlanState *planstate, Bitmapset **rels_used)
case T_WorkTableScan:
case T_ForeignScan:
case T_CustomScan:
+ case T_SampleScan:
*rels_used = bms_add_member(*rels_used,
((Scan *) plan)->scanrelid);
break;
@@ -958,6 +959,9 @@ ExplainNode(PlanState *planstate, List *ancestors,
else
pname = sname;
break;
+ case T_SampleScan:
+ pname = sname = "Sample Scan";
+ break;
case T_Material:
pname = sname = "Materialize";
break;
@@ -1075,6 +1079,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
case T_WorkTableScan:
case T_ForeignScan:
case T_CustomScan:
+ case T_SampleScan:
ExplainScanTarget((Scan *) plan, es);
break;
case T_IndexScan:
@@ -1327,6 +1332,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
case T_CteScan:
case T_WorkTableScan:
case T_SubqueryScan:
+ case T_SampleScan:
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
@@ -2230,6 +2236,7 @@ ExplainTargetRel(Plan *plan, Index rti, ExplainState *es)
case T_TidScan:
case T_ForeignScan:
case T_CustomScan:
+ case T_SampleScan:
case T_ModifyTable:
/* Assert it's on a real relation */
Assert(rte->rtekind == RTE_RELATION);
diff --git a/src/backend/executor/Makefile b/src/backend/executor/Makefile
index af707b0..75f799c 100644
--- a/src/backend/executor/Makefile
+++ b/src/backend/executor/Makefile
@@ -21,7 +21,7 @@ OBJS = execAmi.o execCurrent.o execGrouping.o execJunk.o execMain.o \
nodeLimit.o nodeLockRows.o \
nodeMaterial.o nodeMergeAppend.o nodeMergejoin.o nodeModifyTable.o \
nodeNestloop.o nodeFunctionscan.o nodeRecursiveunion.o nodeResult.o \
- nodeSeqscan.o nodeSetOp.o nodeSort.o nodeUnique.o \
+ nodeSamplescan.o nodeSeqscan.o nodeSetOp.o nodeSort.o nodeUnique.o \
nodeValuesscan.o nodeCtescan.o nodeWorktablescan.o \
nodeGroup.o nodeSubplan.o nodeSubqueryscan.o nodeTidscan.o \
nodeForeignscan.o nodeWindowAgg.o tstoreReceiver.o spi.o
diff --git a/src/backend/executor/execAmi.c b/src/backend/executor/execAmi.c
index 6ebad2f..4948a26 100644
--- a/src/backend/executor/execAmi.c
+++ b/src/backend/executor/execAmi.c
@@ -39,6 +39,7 @@
#include "executor/nodeNestloop.h"
#include "executor/nodeRecursiveunion.h"
#include "executor/nodeResult.h"
+#include "executor/nodeSamplescan.h"
#include "executor/nodeSeqscan.h"
#include "executor/nodeSetOp.h"
#include "executor/nodeSort.h"
@@ -155,6 +156,10 @@ ExecReScan(PlanState *node)
ExecReScanSeqScan((SeqScanState *) node);
break;
+ case T_SampleScanState:
+ ExecReScanSampleScan((SampleScanState *) node);
+ break;
+
case T_IndexScanState:
ExecReScanIndexScan((IndexScanState *) node);
break;
@@ -480,6 +485,9 @@ ExecSupportsBackwardScan(Plan *node)
}
return false;
+ case T_SampleScan:
+ return false;
+
case T_Material:
case T_Sort:
/* these don't evaluate tlist */
diff --git a/src/backend/executor/execCurrent.c b/src/backend/executor/execCurrent.c
index 1c8be25..5cfe549 100644
--- a/src/backend/executor/execCurrent.c
+++ b/src/backend/executor/execCurrent.c
@@ -261,6 +261,7 @@ search_plan_tree(PlanState *node, Oid table_oid)
* Relation scan nodes can all be treated alike
*/
case T_SeqScanState:
+ case T_SampleScanState:
case T_IndexScanState:
case T_IndexOnlyScanState:
case T_BitmapHeapScanState:
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 9892499..03c2feb 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -102,6 +102,7 @@
#include "executor/nodeNestloop.h"
#include "executor/nodeRecursiveunion.h"
#include "executor/nodeResult.h"
+#include "executor/nodeSamplescan.h"
#include "executor/nodeSeqscan.h"
#include "executor/nodeSetOp.h"
#include "executor/nodeSort.h"
@@ -190,6 +191,11 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
estate, eflags);
break;
+ case T_SampleScan:
+ result = (PlanState *) ExecInitSampleScan((SampleScan *) node,
+ estate, eflags);
+ break;
+
case T_IndexScan:
result = (PlanState *) ExecInitIndexScan((IndexScan *) node,
estate, eflags);
@@ -406,6 +412,10 @@ ExecProcNode(PlanState *node)
result = ExecSeqScan((SeqScanState *) node);
break;
+ case T_SampleScanState:
+ result = ExecSampleScan((SampleScanState *) node);
+ break;
+
case T_IndexScanState:
result = ExecIndexScan((IndexScanState *) node);
break;
@@ -644,6 +654,10 @@ ExecEndNode(PlanState *node)
ExecEndSeqScan((SeqScanState *) node);
break;
+ case T_SampleScanState:
+ ExecEndSampleScan((SampleScanState *) node);
+ break;
+
case T_IndexScanState:
ExecEndIndexScan((IndexScanState *) node);
break;
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
new file mode 100644
index 0000000..34ea4ab
--- /dev/null
+++ b/src/backend/executor/nodeSamplescan.c
@@ -0,0 +1,500 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeSamplescan.c
+ * Support routines for sample scans of relations (table sampling).
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ * src/backend/executor/nodeSamplescan.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/relscan.h"
+#include "catalog/pg_tablesample_method.h"
+#include "executor/executor.h"
+#include "executor/nodeSamplescan.h"
+#include "miscadmin.h"
+#include "parser/parsetree.h"
+#include "pgstat.h"
+#include "storage/bufmgr.h"
+#include "storage/predicate.h"
+#include "utils/rel.h"
+#include "utils/syscache.h"
+#include "utils/tqual.h"
+
+static void InitScanRelation(SampleScanState *node, EState *estate,
+ int eflags, TableSampleClause *tablesample);
+static TupleTableSlot *SampleNext(SampleScanState *node);
+static HeapTuple samplenexttup(SampleScanState *node, HeapScanDesc scan);
+
+
+/*
+ * Initialize the sampling method - loads function info and
+ * calls the tsminit function.
+ *
+ * We need special handling for this because the tsminit function
+ * is allowed to take optional additional arguments.
+ */
+static void
+InitSamplingMethod(SampleScanState *scanstate, TableSampleClause *tablesample)
+{
+ FunctionCallInfoData fcinfo;
+ int i;
+ List *args = tablesample->args;
+ ListCell *arg;
+ ExprContext *econtext = scanstate->ss.ps.ps_ExprContext;
+
+ /* Load functions */
+ fmgr_info(tablesample->tsminit, &(scanstate->tsminit));
+ fmgr_info(tablesample->tsmnextblock, &(scanstate->tsmnextblock));
+ fmgr_info(tablesample->tsmnexttuple, &(scanstate->tsmnexttuple));
+ if (OidIsValid(tablesample->tsmexaminetuple))
+ fmgr_info(tablesample->tsmexaminetuple, &(scanstate->tsmexaminetuple));
+ else
+ scanstate->tsmexaminetuple.fn_oid = InvalidOid;
+ fmgr_info(tablesample->tsmend, &(scanstate->tsmend));
+ fmgr_info(tablesample->tsmreset, &(scanstate->tsmreset));
+
+ InitFunctionCallInfoData(fcinfo, &scanstate->tsminit,
+ list_length(args) + 2,
+ InvalidOid, NULL, NULL);
+
+ /* First arg is always SampleScanState */
+ fcinfo.arg[0] = PointerGetDatum(scanstate);
+ fcinfo.argnull[0] = false;
+
+ /*
+ * Second arg is always REPEATABLE
+ * When tablesample->repeatable is NULL then REPEATABLE clause was not
+ * specified.
+ * When specified, the expression cannot evaluate to NULL.
+ */
+ if (tablesample->repeatable)
+ {
+ ExprState *argstate = ExecInitExpr((Expr *) tablesample->repeatable,
+ (PlanState *) scanstate);
+ fcinfo.arg[1] = ExecEvalExpr(argstate, econtext,
+ &fcinfo.argnull[1], NULL);
+ if (fcinfo.argnull[1])
+ ereport(ERROR,
+ (errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED),
+ errmsg("REPEATABLE clause cannot be NULL")));
+ }
+ else
+ {
+ fcinfo.arg[1] = UInt32GetDatum(random());
+ fcinfo.argnull[1] = false;
+ }
+
+
+ i = 2;
+ foreach(arg, args)
+ {
+ Expr *argexpr = (Expr *) lfirst(arg);
+ ExprState *argstate = ExecInitExpr(argexpr, (PlanState *) scanstate);
+
+ if (argstate == NULL)
+ {
+ fcinfo.argnull[i] = true;
+ fcinfo.arg[i] = (Datum) 0;;
+ }
+
+ fcinfo.arg[i] = ExecEvalExpr(argstate, econtext,
+ &fcinfo.argnull[i], NULL);
+ i++;
+ }
+ Assert(i == fcinfo.nargs);
+
+ (void) FunctionCallInvoke(&fcinfo);
+}
+
+
+/* ----------------------------------------------------------------
+ * Scan Support
+ * ----------------------------------------------------------------
+ */
+
+/* ----------------------------------------------------------------
+ * SampleNext
+ *
+ * This is a workhorse for ExecSampleScan
+ * ----------------------------------------------------------------
+ */
+static TupleTableSlot *
+SampleNext(SampleScanState *node)
+{
+ TupleTableSlot *slot;
+ HeapScanDesc scan;
+ HeapTuple tuple;
+
+ /*
+ * get information from the scan state
+ */
+ slot = node->ss.ss_ScanTupleSlot;
+ scan = node->ss.ss_currentScanDesc;
+
+ tuple = samplenexttup(node, scan);
+
+ if (tuple)
+ ExecStoreTuple(tuple, /* tuple to store */
+ slot, /* slot to store in */
+ scan->rs_cbuf, /* buffer associated with this tuple */
+ false); /* don't pfree this pointer */
+ else
+ ExecClearTuple(slot);
+
+ return slot;
+}
+
+static HeapTuple
+samplenexttup(SampleScanState *node, HeapScanDesc scan)
+{
+ HeapTuple tuple = &(scan->rs_ctup);
+ Snapshot snapshot = scan->rs_snapshot;
+ BlockNumber blockno;
+ Page page;
+ ItemId itemid;
+ OffsetNumber tupoffset,
+ maxoffset;
+
+ if (!scan->rs_inited)
+ {
+ /*
+ * return null immediately if relation is empty
+ */
+ if (scan->rs_nblocks == 0)
+ {
+ Assert(!BufferIsValid(scan->rs_cbuf));
+ tuple->t_data = NULL;
+ return NULL;
+ }
+ blockno = DatumGetInt32(FunctionCall1(&node->tsmnextblock,
+ PointerGetDatum(node)));
+ if (!BlockNumberIsValid(blockno))
+ {
+ tuple->t_data = NULL;
+ return NULL;
+ }
+
+ heapgetpage(scan, blockno);
+ scan->rs_inited = true;
+ }
+ else
+ {
+ /* continue from previously returned page/tuple */
+ blockno = scan->rs_cblock; /* current page */
+ }
+
+ LockBuffer(scan->rs_cbuf, BUFFER_LOCK_SHARE);
+
+ page = (Page) BufferGetPage(scan->rs_cbuf);
+ maxoffset = PageGetMaxOffsetNumber(page);
+
+ for (;;)
+ {
+ CHECK_FOR_INTERRUPTS();
+
+ tupoffset = DatumGetUInt16(FunctionCall3(&node->tsmnexttuple,
+ PointerGetDatum(node),
+ UInt32GetDatum(blockno),
+ UInt16GetDatum(maxoffset)));
+
+ if (OffsetNumberIsValid(tupoffset))
+ {
+ bool visible;
+ bool found;
+
+ /* Skip invalid tuple pointers. */
+ itemid = PageGetItemId(page, tupoffset);
+ if (!ItemIdIsNormal(itemid))
+ continue;
+
+ tuple->t_data = (HeapTupleHeader) PageGetItem((Page) page, itemid);
+ tuple->t_len = ItemIdGetLength(itemid);
+ ItemPointerSet(&(tuple->t_self), blockno, tupoffset);
+
+ visible = HeapTupleSatisfiesVisibility(tuple, snapshot, scan->rs_cbuf);
+
+ CheckForSerializableConflictOut(visible, scan->rs_rd, tuple,
+ scan->rs_cbuf, snapshot);
+
+ /*
+ * Let the sampling method examine the actual tuple and decide if we
+ * should return it.
+ *
+ * Note that we let it examine even invisible tuples for
+ * statistical purposes, but not return them since user should
+ * never see invisible tuples.
+ */
+ if (OidIsValid(node->tsmexaminetuple.fn_oid))
+ {
+ found = DatumGetBool(FunctionCall4(&node->tsmexaminetuple,
+ PointerGetDatum(node),
+ UInt32GetDatum(blockno),
+ PointerGetDatum(tuple),
+ BoolGetDatum(visible)));
+ /* Should not happen if sampling method is well written. */
+ if (found && !visible)
+ elog(ERROR, "Sampling method wanted to return invisible tuple");
+ }
+ else
+ found = visible;
+
+ /* Found visible tuple, return it. */
+ if (found)
+ {
+ LockBuffer(scan->rs_cbuf, BUFFER_LOCK_UNLOCK);
+ break;
+ }
+ else
+ {
+ /* Try next tuple from same page. */
+ continue;
+ }
+ }
+
+ LockBuffer(scan->rs_cbuf, BUFFER_LOCK_UNLOCK);
+ blockno = DatumGetInt32(FunctionCall1(&node->tsmnextblock,
+ PointerGetDatum(node)));
+
+ /*
+ * Report our new scan position for synchronization purposes. We
+ * don't do that when moving backwards, however. That would just
+ * mess up any other forward-moving scanners.
+ *
+ * Note: we do this before checking for end of scan so that the
+ * final state of the position hint is back at the start of the
+ * rel. That's not strictly necessary, but otherwise when you run
+ * the same query multiple times the starting position would shift
+ * a little bit backwards on every invocation, which is confusing.
+ * We don't guarantee any specific ordering in general, though.
+ */
+ if (scan->rs_syncscan)
+ ss_report_location(scan->rs_rd, BlockNumberIsValid(blockno) ?
+ blockno : scan->rs_startblock);
+
+ /*
+ * Reached end of scan.
+ */
+ if (!BlockNumberIsValid(blockno))
+ {
+ if (BufferIsValid(scan->rs_cbuf))
+ ReleaseBuffer(scan->rs_cbuf);
+ scan->rs_cbuf = InvalidBuffer;
+ scan->rs_cblock = InvalidBlockNumber;
+ tuple->t_data = NULL;
+ scan->rs_inited = false;
+ return NULL;
+ }
+
+ heapgetpage(scan, blockno);
+ LockBuffer(scan->rs_cbuf, BUFFER_LOCK_SHARE);
+
+ page = (Page) BufferGetPage(scan->rs_cbuf);
+ maxoffset = PageGetMaxOffsetNumber(page);
+ }
+
+ pgstat_count_heap_getnext(scan->rs_rd);
+
+ return &(scan->rs_ctup);
+}
+
+/*
+ * SampleRecheck -- access method routine to recheck a tuple in EvalPlanQual
+ */
+static bool
+SampleRecheck(SampleScanState *node, TupleTableSlot *slot)
+{
+ /* No need to recheck for SampleScan */
+ return true;
+}
+
+/* ----------------------------------------------------------------
+ * ExecSampleScan(node)
+ *
+ * Scans the relation using the sampling method and returns
+ * the next qualifying tuple.
+ * We call the ExecScan() routine and pass it the appropriate
+ * access method functions.
+ * ----------------------------------------------------------------
+ */
+TupleTableSlot *
+ExecSampleScan(SampleScanState *node)
+{
+ return ExecScan((ScanState *) node,
+ (ExecScanAccessMtd) SampleNext,
+ (ExecScanRecheckMtd) SampleRecheck);
+}
+
+/* ----------------------------------------------------------------
+ * InitScanRelation
+ *
+ * Set up to access the scan relation.
+ * ----------------------------------------------------------------
+ */
+static void
+InitScanRelation(SampleScanState *node, EState *estate, int eflags,
+ TableSampleClause *tablesample)
+{
+ Relation currentRelation;
+
+ /*
+ * get the relation object id from the relid'th entry in the range table,
+ * open that relation and acquire appropriate lock on it.
+ */
+ currentRelation = ExecOpenScanRelation(estate,
+ ((SampleScan *) node->ss.ps.plan)->scanrelid,
+ eflags);
+
+ node->ss.ss_currentRelation = currentRelation;
+
+ /*
+ * Even though we aren't going to do a conventional seqscan, it is useful
+ * to create a HeapScanDesc --- many of the fields in it are usable.
+ */
+ node->ss.ss_currentScanDesc =
+ heap_beginscan_strat(currentRelation,
+ estate->es_snapshot,
+ 0, NULL, false,
+ tablesample->tsmseqscan);
+
+ /*
+ * Page at a time mode is useless for us as we need to check visibility
+ * of tuples individually because tuple offsets returned by sampling
+ * methods map to rs_vistuples values and not to its indexes.
+ */
+ node->ss.ss_currentScanDesc->rs_pageatatime = false;
+
+ /* and report the scan tuple slot's rowtype */
+ ExecAssignScanType(&node->ss, RelationGetDescr(currentRelation));
+}
+
+
+/* ----------------------------------------------------------------
+ * ExecInitSampleScan
+ * ----------------------------------------------------------------
+ */
+SampleScanState *
+ExecInitSampleScan(SampleScan *node, EState *estate, int eflags)
+{
+ SampleScanState *scanstate;
+ RangeTblEntry *rte = rt_fetch(node->scanrelid,
+ estate->es_range_table);
+
+ Assert(outerPlan(node) == NULL);
+ Assert(innerPlan(node) == NULL);
+ Assert(rte->tablesample != NULL);
+
+ /*
+ * create state structure
+ */
+ scanstate = makeNode(SampleScanState);
+ scanstate->ss.ps.plan = (Plan *) node;
+ scanstate->ss.ps.state = estate;
+
+ /*
+ * Miscellaneous initialization
+ *
+ * create expression context for node
+ */
+ ExecAssignExprContext(estate, &scanstate->ss.ps);
+
+ /*
+ * initialize child expressions
+ */
+ scanstate->ss.ps.targetlist = (List *)
+ ExecInitExpr((Expr *) node->plan.targetlist,
+ (PlanState *) scanstate);
+ scanstate->ss.ps.qual = (List *)
+ ExecInitExpr((Expr *) node->plan.qual,
+ (PlanState *) scanstate);
+
+ /*
+ * tuple table initialization
+ */
+ ExecInitResultTupleSlot(estate, &scanstate->ss.ps);
+ ExecInitScanTupleSlot(estate, &scanstate->ss);
+
+ /*
+ * initialize scan relation
+ */
+ InitScanRelation(scanstate, estate, eflags, rte->tablesample);
+
+ scanstate->ss.ps.ps_TupFromTlist = false;
+
+ /*
+ * Initialize result tuple type and projection info.
+ */
+ ExecAssignResultTypeFromTL(&scanstate->ss.ps);
+ ExecAssignScanProjectionInfo(&scanstate->ss);
+
+ InitSamplingMethod(scanstate, rte->tablesample);
+
+ return scanstate;
+}
+
+/* ----------------------------------------------------------------
+ * ExecEndSampleScan
+ *
+ * frees any storage allocated through C routines.
+ * ----------------------------------------------------------------
+ */
+void
+ExecEndSampleScan(SampleScanState *node)
+{
+ /*
+ * Tell sampling function that we finished thes can.
+ */
+ FunctionCall1(&node->tsmend, PointerGetDatum(node));
+
+ /*
+ * Free the exprcontext
+ */
+ ExecFreeExprContext(&node->ss.ps);
+
+ /*
+ * clean out the tuple table
+ */
+ ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
+
+ /*
+ * close heap scan
+ */
+ heap_endscan(node->ss.ss_currentScanDesc);
+
+ /*
+ * close the heap relation.
+ */
+ ExecCloseScanRelation(node->ss.ss_currentRelation);
+}
+
+/* ----------------------------------------------------------------
+ * Join Support
+ * ----------------------------------------------------------------
+ */
+
+/* ----------------------------------------------------------------
+ * ExecReScanSampleScan
+ *
+ * Rescans the relation.
+ *
+ * ----------------------------------------------------------------
+ */
+void
+ExecReScanSampleScan(SampleScanState *node)
+{
+ heap_rescan(node->ss.ss_currentScanDesc, NULL);
+
+ /*
+ * Tell sampling function to reset its state for rescan.
+ */
+ FunctionCall1(&node->tsmreset, PointerGetDatum(node));
+
+ ExecScanReScan(&node->ss);
+}
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index f1a24f5..9cff63e 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -628,6 +628,22 @@ _copyCustomScan(const CustomScan *from)
}
/*
+ * _copySampleScan
+ */
+static SampleScan *
+_copySampleScan(const SampleScan *from)
+{
+ SampleScan *newnode = makeNode(SampleScan);
+
+ /*
+ * copy node superclass fields
+ */
+ CopyScanFields((const Scan *) from, (Scan *) newnode);
+
+ return newnode;
+}
+
+/*
* CopyJoinFields
*
* This function copies the fields of the Join node. It is used by
@@ -2006,6 +2022,7 @@ _copyRangeTblEntry(const RangeTblEntry *from)
COPY_SCALAR_FIELD(rtekind);
COPY_SCALAR_FIELD(relid);
COPY_SCALAR_FIELD(relkind);
+ COPY_NODE_FIELD(tablesample);
COPY_NODE_FIELD(subquery);
COPY_SCALAR_FIELD(security_barrier);
COPY_SCALAR_FIELD(jointype);
@@ -2138,6 +2155,39 @@ _copyCommonTableExpr(const CommonTableExpr *from)
return newnode;
}
+static RangeTableSample *
+_copyRangeTableSample(const RangeTableSample *from)
+{
+ RangeTableSample *newnode = makeNode(RangeTableSample);
+
+ COPY_NODE_FIELD(relation);
+ COPY_STRING_FIELD(method);
+ COPY_NODE_FIELD(repeatable);
+ COPY_NODE_FIELD(args);
+
+ return newnode;
+}
+
+static TableSampleClause *
+_copyTableSampleClause(const TableSampleClause *from)
+{
+ TableSampleClause *newnode = makeNode(TableSampleClause);
+
+ COPY_SCALAR_FIELD(tsmid);
+ COPY_SCALAR_FIELD(tsmseqscan);
+ COPY_SCALAR_FIELD(tsminit);
+ COPY_SCALAR_FIELD(tsmnextblock);
+ COPY_SCALAR_FIELD(tsmnexttuple);
+ COPY_SCALAR_FIELD(tsmexaminetuple);
+ COPY_SCALAR_FIELD(tsmend);
+ COPY_SCALAR_FIELD(tsmreset);
+ COPY_SCALAR_FIELD(tsmcost);
+ COPY_NODE_FIELD(repeatable);
+ COPY_NODE_FIELD(args);
+
+ return newnode;
+}
+
static A_Expr *
_copyAExpr(const A_Expr *from)
{
@@ -4075,6 +4125,9 @@ copyObject(const void *from)
case T_CustomScan:
retval = _copyCustomScan(from);
break;
+ case T_SampleScan:
+ retval = _copySampleScan(from);
+ break;
case T_Join:
retval = _copyJoin(from);
break;
@@ -4723,6 +4776,12 @@ copyObject(const void *from)
case T_CommonTableExpr:
retval = _copyCommonTableExpr(from);
break;
+ case T_RangeTableSample:
+ retval = _copyRangeTableSample(from);
+ break;
+ case T_TableSampleClause:
+ retval = _copyTableSampleClause(from);
+ break;
case T_PrivGrantee:
retval = _copyPrivGrantee(from);
break;
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 6e8b308..165c4e5 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -2323,6 +2323,7 @@ _equalRangeTblEntry(const RangeTblEntry *a, const RangeTblEntry *b)
COMPARE_SCALAR_FIELD(rtekind);
COMPARE_SCALAR_FIELD(relid);
COMPARE_SCALAR_FIELD(relkind);
+ COMPARE_NODE_FIELD(tablesample);
COMPARE_NODE_FIELD(subquery);
COMPARE_SCALAR_FIELD(security_barrier);
COMPARE_SCALAR_FIELD(jointype);
@@ -2442,6 +2443,35 @@ _equalCommonTableExpr(const CommonTableExpr *a, const CommonTableExpr *b)
}
static bool
+_equalRangeTableSample(const RangeTableSample *a, const RangeTableSample *b)
+{
+ COMPARE_NODE_FIELD(relation);
+ COMPARE_STRING_FIELD(method);
+ COMPARE_NODE_FIELD(repeatable);
+ COMPARE_NODE_FIELD(args);
+
+ return true;
+}
+
+static bool
+_equalTableSampleClause(const TableSampleClause *a, const TableSampleClause *b)
+{
+ COMPARE_SCALAR_FIELD(tsmid);
+ COMPARE_SCALAR_FIELD(tsmseqscan);
+ COMPARE_SCALAR_FIELD(tsminit);
+ COMPARE_SCALAR_FIELD(tsmnextblock);
+ COMPARE_SCALAR_FIELD(tsmnexttuple);
+ COMPARE_SCALAR_FIELD(tsmexaminetuple);
+ COMPARE_SCALAR_FIELD(tsmend);
+ COMPARE_SCALAR_FIELD(tsmreset);
+ COMPARE_SCALAR_FIELD(tsmcost);
+ COMPARE_NODE_FIELD(repeatable);
+ COMPARE_NODE_FIELD(args);
+
+ return true;
+}
+
+static bool
_equalXmlSerialize(const XmlSerialize *a, const XmlSerialize *b)
{
COMPARE_SCALAR_FIELD(xmloption);
@@ -3150,6 +3180,12 @@ equal(const void *a, const void *b)
case T_CommonTableExpr:
retval = _equalCommonTableExpr(a, b);
break;
+ case T_RangeTableSample:
+ retval = _equalRangeTableSample(a, b);
+ break;
+ case T_TableSampleClause:
+ retval = _equalTableSampleClause(a, b);
+ break;
case T_PrivGrantee:
retval = _equalPrivGrantee(a, b);
break;
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index 21dfda7..bd9ce09 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -3209,6 +3209,18 @@ raw_expression_tree_walker(Node *node,
return walker(((WithClause *) node)->ctes, context);
case T_CommonTableExpr:
return walker(((CommonTableExpr *) node)->ctequery, context);
+ case T_RangeTableSample:
+ {
+ RangeTableSample *rts = (RangeTableSample *) node;
+
+ if (walker(rts->relation, context))
+ return true;
+ if (walker(rts->repeatable, context))
+ return true;
+ if (walker(rts->args, context))
+ return true;
+ }
+ break;
default:
elog(ERROR, "unrecognized node type: %d",
(int) nodeTag(node));
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index dd1278b..8dcd02a 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -578,6 +578,14 @@ _outCustomScan(StringInfo str, const CustomScan *node)
}
static void
+_outSampleScan(StringInfo str, const SampleScan *node)
+{
+ WRITE_NODE_TYPE("SAMPLESCAN");
+
+ _outScanInfo(str, (const Scan *) node);
+}
+
+static void
_outJoin(StringInfo str, const Join *node)
{
WRITE_NODE_TYPE("JOIN");
@@ -2391,6 +2399,35 @@ _outCommonTableExpr(StringInfo str, const CommonTableExpr *node)
}
static void
+_outRangeTableSample(StringInfo str, const RangeTableSample *node)
+{
+ WRITE_NODE_TYPE("RANGETABLESAMPLE");
+
+ WRITE_NODE_FIELD(relation);
+ WRITE_STRING_FIELD(method);
+ WRITE_NODE_FIELD(repeatable);
+ WRITE_NODE_FIELD(args);
+}
+
+static void
+_outTableSampleClause(StringInfo str, const TableSampleClause *node)
+{
+ WRITE_NODE_TYPE("TABLESAMPLECLAUSE");
+
+ WRITE_OID_FIELD(tsmid);
+ WRITE_BOOL_FIELD(tsmseqscan);
+ WRITE_OID_FIELD(tsminit);
+ WRITE_OID_FIELD(tsmnextblock);
+ WRITE_OID_FIELD(tsmnexttuple);
+ WRITE_OID_FIELD(tsmexaminetuple);
+ WRITE_OID_FIELD(tsmend);
+ WRITE_OID_FIELD(tsmreset);
+ WRITE_OID_FIELD(tsmcost);
+ WRITE_NODE_FIELD(repeatable);
+ WRITE_NODE_FIELD(args);
+}
+
+static void
_outSetOperationStmt(StringInfo str, const SetOperationStmt *node)
{
WRITE_NODE_TYPE("SETOPERATIONSTMT");
@@ -2420,6 +2457,7 @@ _outRangeTblEntry(StringInfo str, const RangeTblEntry *node)
case RTE_RELATION:
WRITE_OID_FIELD(relid);
WRITE_CHAR_FIELD(relkind);
+ WRITE_NODE_FIELD(tablesample);
break;
case RTE_SUBQUERY:
WRITE_NODE_FIELD(subquery);
@@ -2887,6 +2925,9 @@ _outNode(StringInfo str, const void *obj)
case T_CustomScan:
_outCustomScan(str, obj);
break;
+ case T_SampleScan:
+ _outSampleScan(str, obj);
+ break;
case T_Join:
_outJoin(str, obj);
break;
@@ -3228,6 +3269,12 @@ _outNode(StringInfo str, const void *obj)
case T_CommonTableExpr:
_outCommonTableExpr(str, obj);
break;
+ case T_RangeTableSample:
+ _outRangeTableSample(str, obj);
+ break;
+ case T_TableSampleClause:
+ _outTableSampleClause(str, obj);
+ break;
case T_SetOperationStmt:
_outSetOperationStmt(str, obj);
break;
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index ae24d05..076c958 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -350,6 +350,45 @@ _readCommonTableExpr(void)
}
/*
+ * _readRangeTableSample
+ */
+static RangeTableSample *
+_readRangeTableSample(void)
+{
+ READ_LOCALS(RangeTableSample);
+
+ READ_NODE_FIELD(relation);
+ READ_STRING_FIELD(method);
+ READ_NODE_FIELD(repeatable);
+ READ_NODE_FIELD(args);
+
+ READ_DONE();
+}
+
+/*
+ * _readTableSampleClause
+ */
+static TableSampleClause *
+_readTableSampleClause(void)
+{
+ READ_LOCALS(TableSampleClause);
+
+ READ_OID_FIELD(tsmid);
+ READ_BOOL_FIELD(tsmseqscan);
+ READ_OID_FIELD(tsminit);
+ READ_OID_FIELD(tsmnextblock);
+ READ_OID_FIELD(tsmnexttuple);
+ READ_OID_FIELD(tsmexaminetuple);
+ READ_OID_FIELD(tsmend);
+ READ_OID_FIELD(tsmreset);
+ READ_OID_FIELD(tsmcost);
+ READ_NODE_FIELD(repeatable);
+ READ_NODE_FIELD(args);
+
+ READ_DONE();
+}
+
+/*
* _readSetOperationStmt
*/
static SetOperationStmt *
@@ -1216,6 +1255,7 @@ _readRangeTblEntry(void)
case RTE_RELATION:
READ_OID_FIELD(relid);
READ_CHAR_FIELD(relkind);
+ READ_NODE_FIELD(tablesample);
break;
case RTE_SUBQUERY:
READ_NODE_FIELD(subquery);
@@ -1311,6 +1351,10 @@ parseNodeString(void)
return_value = _readRowMarkClause();
else if (MATCH("COMMONTABLEEXPR", 15))
return_value = _readCommonTableExpr();
+ else if (MATCH("RANGETABLESAMPLE", 16))
+ return_value = _readRangeTableSample();
+ else if (MATCH("TABLESAMPLECLAUSE", 17))
+ return_value = _readTableSampleClause();
else if (MATCH("SETOPERATIONSTMT", 16))
return_value = _readSetOperationStmt();
else if (MATCH("ALIAS", 5))
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 58d78e6..5f12477 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -71,6 +71,10 @@ static void set_plain_rel_size(PlannerInfo *root, RelOptInfo *rel,
RangeTblEntry *rte);
static void set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
RangeTblEntry *rte);
+static void set_tablesample_rel_size(PlannerInfo *root, RelOptInfo *rel,
+ RangeTblEntry *rte);
+static void set_tablesample_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
+ RangeTblEntry *rte);
static void set_foreign_size(PlannerInfo *root, RelOptInfo *rel,
RangeTblEntry *rte);
static void set_foreign_pathlist(PlannerInfo *root, RelOptInfo *rel,
@@ -265,6 +269,11 @@ set_rel_size(PlannerInfo *root, RelOptInfo *rel,
/* Foreign table */
set_foreign_size(root, rel, rte);
}
+ else if (rte->tablesample != NULL)
+ {
+ /* Sampled relation */
+ set_tablesample_rel_size(root, rel, rte);
+ }
else
{
/* Plain relation */
@@ -332,6 +341,11 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
/* Foreign table */
set_foreign_pathlist(root, rel, rte);
}
+ else if (rte->tablesample != NULL)
+ {
+ /* Build sample scan on relation */
+ set_tablesample_rel_pathlist(root, rel, rte);
+ }
else
{
/* Plain relation */
@@ -418,6 +432,41 @@ set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
}
/*
+ * set_tablesample_rel_size
+ * Set size estimates for a sampled relation.
+ */
+static void
+set_tablesample_rel_size(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
+{
+ /* Mark rel with estimated output rows, width, etc */
+ set_baserel_size_estimates(root, rel);
+}
+
+/*
+ * set_tablesample_rel_pathlist
+ * Build access paths for a sampled relation
+ *
+ * There is only one possible path - sampling scan
+ */
+static void
+set_tablesample_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
+{
+ Relids required_outer;
+ Path *path;
+
+ /*
+ * We don't support pushing join clauses into the quals of a seqscan, but
+ * it could still have required parameterization due to LATERAL refs in
+ * its tlist.
+ */
+ required_outer = rel->lateral_relids;
+
+ /* We only do sample scan if it was requested */
+ path = create_samplescan_path(root, rel, required_outer);
+ rel->pathlist = list_make1(path);
+}
+
+/*
* set_foreign_size
* Set size estimates for a foreign table RTE
*/
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 020558b..2167716 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -90,6 +90,7 @@
#include "utils/lsyscache.h"
#include "utils/selfuncs.h"
#include "utils/spccache.h"
+#include "utils/tablesample.h"
#include "utils/tuplesort.h"
@@ -219,6 +220,73 @@ cost_seqscan(Path *path, PlannerInfo *root,
}
/*
+ * cost_samplescan
+ * Determines and returns the cost of scanning a relation using sampling.
+ *
+ * From planner/optimizer perspective, we dont't care all that much about cost
+ * itself since there is always only one scan path to consider when sampling
+ * scan is present, but number of rows estimation is still important.
+ *
+ * 'baserel' is the relation to be scanned
+ * 'param_info' is the ParamPathInfo if this is a parameterized path, else NULL
+ */
+void
+cost_samplescan(Path *path, PlannerInfo *root, RelOptInfo *baserel)
+{
+ Cost startup_cost = 0;
+ Cost run_cost = 0;
+ double spc_seq_page_cost,
+ spc_random_page_cost,
+ spc_page_cost;
+ QualCost qpqual_cost;
+ Cost cpu_per_tuple;
+ BlockNumber pages;
+ double tuples;
+ RangeTblEntry *rte = planner_rt_fetch(baserel->relid, root);
+ TableSampleClause *tablesample = rte->tablesample;
+
+ /* Should only be applied to base relations */
+ Assert(baserel->relid > 0);
+ Assert(baserel->rtekind == RTE_RELATION);
+
+ /* Mark the path with the correct row estimate */
+ if (path->param_info)
+ path->rows = path->param_info->ppi_rows;
+ else
+ path->rows = baserel->rows;
+
+ /* Call the sampling method's costing function. */
+ OidFunctionCall6(tablesample->tsmcost, PointerGetDatum(root),
+ PointerGetDatum(path), PointerGetDatum(baserel),
+ PointerGetDatum(tablesample->args),
+ PointerGetDatum(&pages), PointerGetDatum(&tuples));
+
+ /* fetch estimated page cost for tablespace containing table */
+ get_tablespace_page_costs(baserel->reltablespace,
+ &spc_random_page_cost,
+ &spc_seq_page_cost);
+
+
+ spc_page_cost = tablesample->tsmseqscan ? spc_seq_page_cost :
+ spc_random_page_cost;
+
+ /*
+ * disk costs
+ */
+ run_cost += spc_page_cost * pages;
+
+ /* CPU costs */
+ get_restriction_qual_cost(root, baserel, path->param_info, &qpqual_cost);
+
+ startup_cost += qpqual_cost.startup;
+ cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple;
+ run_cost += cpu_per_tuple * tuples;
+
+ path->startup_cost = startup_cost;
+ path->total_cost = startup_cost + run_cost;
+}
+
+/*
* cost_index
* Determines and returns the cost of scanning a relation using an index.
*
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 655be81..10a5e02 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -58,6 +58,8 @@ static Material *create_material_plan(PlannerInfo *root, MaterialPath *best_path
static Plan *create_unique_plan(PlannerInfo *root, UniquePath *best_path);
static SeqScan *create_seqscan_plan(PlannerInfo *root, Path *best_path,
List *tlist, List *scan_clauses);
+static SampleScan *create_samplescan_plan(PlannerInfo *root, Path *best_path,
+ List *tlist, List *scan_clauses);
static Scan *create_indexscan_plan(PlannerInfo *root, IndexPath *best_path,
List *tlist, List *scan_clauses, bool indexonly);
static BitmapHeapScan *create_bitmap_scan_plan(PlannerInfo *root,
@@ -100,6 +102,7 @@ static List *order_qual_clauses(PlannerInfo *root, List *clauses);
static void copy_path_costsize(Plan *dest, Path *src);
static void copy_plan_costsize(Plan *dest, Plan *src);
static SeqScan *make_seqscan(List *qptlist, List *qpqual, Index scanrelid);
+static SampleScan *make_samplescan(List *qptlist, List *qpqual, Index scanrelid);
static IndexScan *make_indexscan(List *qptlist, List *qpqual, Index scanrelid,
Oid indexid, List *indexqual, List *indexqualorig,
List *indexorderby, List *indexorderbyorig,
@@ -228,6 +231,7 @@ create_plan_recurse(PlannerInfo *root, Path *best_path)
switch (best_path->pathtype)
{
case T_SeqScan:
+ case T_SampleScan:
case T_IndexScan:
case T_IndexOnlyScan:
case T_BitmapHeapScan:
@@ -343,6 +347,13 @@ create_scan_plan(PlannerInfo *root, Path *best_path)
scan_clauses);
break;
+ case T_SampleScan:
+ plan = (Plan *) create_samplescan_plan(root,
+ best_path,
+ tlist,
+ scan_clauses);
+ break;
+
case T_IndexScan:
plan = (Plan *) create_indexscan_plan(root,
(IndexPath *) best_path,
@@ -546,6 +557,7 @@ disuse_physical_tlist(PlannerInfo *root, Plan *plan, Path *path)
switch (path->pathtype)
{
case T_SeqScan:
+ case T_SampleScan:
case T_IndexScan:
case T_IndexOnlyScan:
case T_BitmapHeapScan:
@@ -1133,6 +1145,45 @@ create_seqscan_plan(PlannerInfo *root, Path *best_path,
}
/*
+ * create_samplescan_plan
+ * Returns a samplecan plan for the base relation scanned by 'best_path'
+ * with restriction clauses 'scan_clauses' and targetlist 'tlist'.
+ */
+static SampleScan *
+create_samplescan_plan(PlannerInfo *root, Path *best_path,
+ List *tlist, List *scan_clauses)
+{
+ SampleScan *scan_plan;
+ Index scan_relid = best_path->parent->relid;
+
+ /* it should be a base rel with tablesample clause... */
+ Assert(scan_relid > 0);
+ Assert(best_path->parent->rtekind == RTE_RELATION);
+ Assert(best_path->pathtype == T_SampleScan);
+
+ /* Sort clauses into best execution order */
+ scan_clauses = order_qual_clauses(root, scan_clauses);
+
+ /* Reduce RestrictInfo list to bare expressions; ignore pseudoconstants */
+ scan_clauses = extract_actual_clauses(scan_clauses, false);
+
+ /* Replace any outer-relation variables with nestloop params */
+ if (best_path->param_info)
+ {
+ scan_clauses = (List *)
+ replace_nestloop_params(root, (Node *) scan_clauses);
+ }
+
+ scan_plan = make_samplescan(tlist,
+ scan_clauses,
+ scan_relid);
+
+ copy_path_costsize(&scan_plan->plan, best_path);
+
+ return scan_plan;
+}
+
+/*
* create_indexscan_plan
* Returns an indexscan plan for the base relation scanned by 'best_path'
* with restriction clauses 'scan_clauses' and targetlist 'tlist'.
@@ -3318,6 +3369,24 @@ make_seqscan(List *qptlist,
return node;
}
+static SampleScan *
+make_samplescan(List *qptlist,
+ List *qpqual,
+ Index scanrelid)
+{
+ SampleScan *node = makeNode(SampleScan);
+ Plan *plan = &node->plan;
+
+ /* cost should be inserted by caller */
+ plan->targetlist = qptlist;
+ plan->qual = qpqual;
+ plan->lefttree = NULL;
+ plan->righttree = NULL;
+ node->scanrelid = scanrelid;
+
+ return node;
+}
+
static IndexScan *
make_indexscan(List *qptlist,
List *qpqual,
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 7703946..de33fc6 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -446,6 +446,17 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
fix_scan_list(root, splan->plan.qual, rtoffset);
}
break;
+ case T_SampleScan:
+ {
+ SampleScan *splan = (SampleScan *) plan;
+
+ splan->scanrelid += rtoffset;
+ splan->plan.targetlist =
+ fix_scan_list(root, splan->plan.targetlist, rtoffset);
+ splan->plan.qual =
+ fix_scan_list(root, splan->plan.qual, rtoffset);
+ }
+ break;
case T_IndexScan:
{
IndexScan *splan = (IndexScan *) plan;
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index 78fb6b1..191624c 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -2163,6 +2163,7 @@ finalize_plan(PlannerInfo *root, Plan *plan, Bitmapset *valid_params,
break;
case T_SeqScan:
+ case T_SampleScan:
context.paramids = bms_add_members(context.paramids, scan_params);
break;
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 1395a21..014d670 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -706,6 +706,26 @@ create_seqscan_path(PlannerInfo *root, RelOptInfo *rel, Relids required_outer)
}
/*
+ * create_samplescan_path
+ * Like seqscan but uses sampling function while scanning.
+ */
+Path *
+create_samplescan_path(PlannerInfo *root, RelOptInfo *rel, Relids required_outer)
+{
+ Path *pathnode = makeNode(Path);
+
+ pathnode->pathtype = T_SampleScan;
+ pathnode->parent = rel;
+ pathnode->param_info = get_baserel_parampathinfo(root, rel,
+ required_outer);
+ pathnode->pathkeys = NIL; /* samplescan has unordered result */
+
+ cost_samplescan(pathnode, root, rel);
+
+ return pathnode;
+}
+
+/*
* create_index_path
* Creates a path node for an index scan.
*
@@ -1921,6 +1941,8 @@ reparameterize_path(PlannerInfo *root, Path *path,
case T_SubqueryScan:
return create_subqueryscan_path(root, rel, path->pathkeys,
required_outer);
+ case T_SampleScan:
+ return (Path *) create_samplescan_path(root, rel, required_outer);
default:
break;
}
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 36dac29..ac5e095 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -447,6 +447,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
%type <range> relation_expr
%type <range> relation_expr_opt_alias
%type <target> target_el single_set_clause set_target insert_column_item
+%type <node> relation_expr_tablesample tablesample_clause opt_repeatable_clause
%type <str> generic_option_name
%type <node> generic_option_arg
@@ -611,8 +612,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
STATEMENT STATISTICS STDIN STDOUT STORAGE STRICT_P STRIP_P SUBSTRING
SYMMETRIC SYSID SYSTEM_P
- TABLE TABLES TABLESPACE TEMP TEMPLATE TEMPORARY TEXT_P THEN TIME TIMESTAMP
- TO TRAILING TRANSACTION TREAT TRIGGER TRIM TRUE_P
+ TABLE TABLES TABLESAMPLE TABLESPACE TEMP TEMPLATE TEMPORARY TEXT_P THEN
+ TIME TIMESTAMP TO TRAILING TRANSACTION TREAT TRIGGER TRIM TRUE_P
TRUNCATE TRUSTED TYPE_P TYPES_P
UNBOUNDED UNCOMMITTED UNENCRYPTED UNION UNIQUE UNKNOWN UNLISTEN UNLOGGED
@@ -10227,6 +10228,12 @@ table_ref: relation_expr opt_alias_clause
$1->alias = $2;
$$ = (Node *) $1;
}
+ | relation_expr_tablesample opt_alias_clause
+ {
+ RangeTableSample *n = (RangeTableSample *) $1;
+ n->relation->alias = $2;
+ $$ = (Node *) n;
+ }
| func_table func_alias_clause
{
RangeFunction *n = (RangeFunction *) $1;
@@ -10522,7 +10529,6 @@ relation_expr_list:
| relation_expr_list ',' relation_expr { $$ = lappend($1, $3); }
;
-
/*
* Given "UPDATE foo set set ...", we have to decide without looking any
* further ahead whether the first "set" is an alias or the UPDATE's SET
@@ -10552,6 +10558,31 @@ relation_expr_opt_alias: relation_expr %prec UMINUS
}
;
+
+relation_expr_tablesample: relation_expr tablesample_clause
+ {
+ RangeTableSample *n = (RangeTableSample *) $2;
+ n->relation = $1;
+ $$ = (Node *) n;
+ }
+ ;
+
+tablesample_clause:
+ TABLESAMPLE ColId '(' func_arg_list ')' opt_repeatable_clause
+ {
+ RangeTableSample *n = makeNode(RangeTableSample);
+ n->method = $2;
+ n->args = $4;
+ n->repeatable = $6;
+ $$ = (Node *) n;
+ }
+ ;
+
+opt_repeatable_clause:
+ REPEATABLE '(' AexprConst ')' { $$ = (Node *) $3; }
+ | /*EMPTY*/ { $$ = NULL; }
+ ;
+
/*
* func_table represents a function invocation in a FROM list. It can be
* a plain function call, like "foo(...)", or a ROWS FROM expression with
@@ -13334,7 +13365,6 @@ unreserved_keyword:
| RELATIVE_P
| RELEASE
| RENAME
- | REPEATABLE
| REPLACE
| REPLICA
| RESET
@@ -13509,6 +13539,7 @@ type_func_name_keyword:
| OVERLAPS
| RIGHT
| SIMILAR
+ | TABLESAMPLE
| VERBOSE
;
@@ -13577,6 +13608,7 @@ reserved_keyword:
| PLACING
| PRIMARY
| REFERENCES
+ | REPEATABLE
| RETURNING
| SELECT
| SESSION_USER
diff --git a/src/backend/parser/parse_clause.c b/src/backend/parser/parse_clause.c
index 654dce6..03632d2 100644
--- a/src/backend/parser/parse_clause.c
+++ b/src/backend/parser/parse_clause.c
@@ -16,6 +16,7 @@
#include "postgres.h"
#include "access/heapam.h"
+#include "access/htup_details.h"
#include "catalog/heap.h"
#include "catalog/pg_type.h"
#include "commands/defrem.h"
@@ -29,6 +30,7 @@
#include "parser/parse_coerce.h"
#include "parser/parse_collate.h"
#include "parser/parse_expr.h"
+#include "parser/parse_func.h"
#include "parser/parse_oper.h"
#include "parser/parse_relation.h"
#include "parser/parse_target.h"
@@ -36,6 +38,7 @@
#include "utils/guc.h"
#include "utils/lsyscache.h"
#include "utils/rel.h"
+#include "utils/syscache.h"
/* Convenience macro for the most common makeNamespaceItem() case */
@@ -413,6 +416,19 @@ transformJoinOnClause(ParseState *pstate, JoinExpr *j, List *namespace)
return result;
}
+static RangeTblEntry *
+transformTableSampleEntry(ParseState *pstate, RangeTableSample *r)
+{
+ RangeTblEntry *rte;
+ TableSampleClause *tablesample = NULL;
+
+ rte = transformTableEntry(pstate, r->relation);
+ tablesample = ParseTableSample(pstate, r->method, r->repeatable, r->args);
+ rte->tablesample = tablesample;
+
+ return rte;
+}
+
/*
* transformTableEntry --- transform a RangeVar (simple relation reference)
*/
@@ -421,7 +437,7 @@ transformTableEntry(ParseState *pstate, RangeVar *r)
{
RangeTblEntry *rte;
- /* We need only build a range table entry */
+ /* We first need to build a range table entry */
rte = addRangeTableEntry(pstate, r, r->alias,
interpretInhOption(r->inhOpt), true);
@@ -1121,6 +1137,26 @@ transformFromClauseItem(ParseState *pstate, Node *n,
return (Node *) j;
}
+ else if (IsA(n, RangeTableSample))
+ {
+ /* Tablesample reference */
+ RangeTableSample *rv = (RangeTableSample *) n;
+ RangeTblRef *rtr;
+ RangeTblEntry *rte = NULL;
+ int rtindex;
+
+ rte = transformTableSampleEntry(pstate, rv);
+
+ /* assume new rte is at end */
+ rtindex = list_length(pstate->p_rtable);
+ Assert(rte == rt_fetch(rtindex, pstate->p_rtable));
+ *top_rte = rte;
+ *top_rti = rtindex;
+ *namespace = list_make1(makeDefaultNSItem(rte));
+ rtr = makeNode(RangeTblRef);
+ rtr->rtindex = rtindex;
+ return (Node *) rtr;
+ }
else
elog(ERROR, "unrecognized node type: %d", (int) nodeTag(n));
return NULL; /* can't get here, keep compiler quiet */
diff --git a/src/backend/parser/parse_func.c b/src/backend/parser/parse_func.c
index a200804..541f415 100644
--- a/src/backend/parser/parse_func.c
+++ b/src/backend/parser/parse_func.c
@@ -18,6 +18,7 @@
#include "catalog/pg_aggregate.h"
#include "catalog/pg_proc.h"
#include "catalog/pg_type.h"
+#include "catalog/pg_tablesample_method.h"
#include "funcapi.h"
#include "lib/stringinfo.h"
#include "nodes/makefuncs.h"
@@ -26,6 +27,7 @@
#include "parser/parse_clause.h"
#include "parser/parse_coerce.h"
#include "parser/parse_func.h"
+#include "parser/parse_expr.h"
#include "parser/parse_relation.h"
#include "parser/parse_target.h"
#include "parser/parse_type.h"
@@ -760,6 +762,134 @@ ParseFuncOrColumn(ParseState *pstate, List *funcname, List *fargs,
}
+/*
+ * ParseTableSample
+ *
+ * Parse TABLESAMPLE clause and process the arguments
+ */
+extern TableSampleClause *
+ParseTableSample(ParseState *pstate, char *samplemethod, Node *repeatable,
+ List *sampleargs)
+{
+ HeapTuple tuple;
+ Form_pg_tablesample_method tsm;
+ Form_pg_proc procform;
+ TableSampleClause *tablesample;
+ List *fargs;
+ ListCell *larg;
+ int nargs, initnargs;
+ Oid actual_arg_types[FUNC_MAX_ARGS];
+ Oid init_arg_types[FUNC_MAX_ARGS];
+
+ /* Load the tablesample method */
+ tuple = SearchSysCache1(TABLESAMPLEMETHODNAME, PointerGetDatum(samplemethod));
+ if (!HeapTupleIsValid(tuple))
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_OBJECT),
+ errmsg("tablesample method \"%s\" does not exist",
+ samplemethod)));
+
+ tablesample = makeNode(TableSampleClause);
+ tablesample->tsmid = HeapTupleGetOid(tuple);
+
+ tsm = (Form_pg_tablesample_method) GETSTRUCT(tuple);
+
+ tablesample->tsmseqscan = tsm->tsmseqscan;
+ tablesample->tsminit = tsm->tsminit;
+ tablesample->tsmnextblock = tsm->tsmnextblock;
+ tablesample->tsmnexttuple = tsm->tsmnexttuple;
+ tablesample->tsmexaminetuple = tsm->tsmexaminetuple;
+ tablesample->tsmend = tsm->tsmend;
+ tablesample->tsmreset = tsm->tsmreset;
+ tablesample->tsmcost = tsm->tsmcost;
+
+ ReleaseSysCache(tuple);
+
+ /* Validate the parameters against init function definition. */
+ tuple = SearchSysCache1(PROCOID,
+ ObjectIdGetDatum(tablesample->tsminit));
+
+ if (!HeapTupleIsValid(tuple)) /* should not happen */
+ elog(ERROR, "cache lookup failed for function %u",
+ tablesample->tsminit);
+
+ procform = (Form_pg_proc) GETSTRUCT(tuple);
+ initnargs = procform->pronargs;
+ Assert(initnargs >= 3);
+
+ /*
+ * First parameter is used to pass the SampleScanState, second is
+ * seed (REPEATABLE), skip the processing for them here, just assert
+ * that the types are correct.
+ */
+ Assert(procform->proargtypes.values[0] == INTERNALOID);
+ Assert(procform->proargtypes.values[1] == INT4OID);
+ initnargs -= 2;
+ memcpy(init_arg_types, procform->proargtypes.values + 2,
+ initnargs * sizeof(Oid));
+
+ /* Now we are done with the catalog */
+ ReleaseSysCache(tuple);
+
+ /* Process repeatable (seed) */
+ if (repeatable != NULL)
+ {
+ Node *arg = repeatable;
+
+ if (arg && IsA(arg, A_Const))
+ {
+ A_Const *con = (A_Const *) arg;
+
+ if (con->val.type == T_Null)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("REPEATABLE clause must be NOT NULL numeric value"),
+ parser_errposition(pstate, con->location)));
+
+ }
+
+ arg = transformExpr(pstate, arg, EXPR_KIND_FROM_FUNCTION);
+ arg = coerce_to_specific_type(pstate, arg, INT4OID, "REPEATABLE");
+ tablesample->repeatable = arg;
+ }
+ else
+ tablesample->repeatable = NULL;
+
+ /* Transform the rest of arguments ... */
+ fargs = NIL;
+ nargs = 0;
+ foreach(larg, sampleargs)
+ {
+ Node *arg = transformExpr(pstate, (Node *) lfirst(larg), EXPR_KIND_FROM_FUNCTION);
+ Oid argtype = exprType(arg);
+
+ fargs = lappend(fargs, arg);
+
+ actual_arg_types[nargs++] = argtype;
+ }
+
+ /*
+ * Check if parameters are correct.
+ *
+ * XXX: can we do better at hinting here?
+ */
+ if (initnargs != nargs ||
+ !can_coerce_type(initnargs, actual_arg_types, init_arg_types,
+ COERCION_IMPLICIT))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("wrong parameters for tablesample method \"%s\"",
+ samplemethod)));
+
+ /* perform the necessary typecasting of arguments */
+ make_fn_arguments(pstate, fargs, actual_arg_types, init_arg_types);
+
+ /* Pass the arguments down */
+ tablesample->args = fargs;
+
+ return tablesample;
+}
+
/* func_match_argtypes()
*
* Given a list of candidate functions (having the right name and number
diff --git a/src/backend/utils/Makefile b/src/backend/utils/Makefile
index 8374533..9daa2ae 100644
--- a/src/backend/utils/Makefile
+++ b/src/backend/utils/Makefile
@@ -9,7 +9,8 @@ top_builddir = ../../..
include $(top_builddir)/src/Makefile.global
OBJS = fmgrtab.o
-SUBDIRS = adt cache error fmgr hash init mb misc mmgr resowner sort time
+SUBDIRS = adt cache error fmgr hash init mb misc mmgr resowner sort time \
+ tablesample
# location of Catalog.pm
catalogdir = $(top_srcdir)/src/backend/catalog
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index c1d860c..8198fc7 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -31,6 +31,7 @@
#include "catalog/pg_opclass.h"
#include "catalog/pg_operator.h"
#include "catalog/pg_proc.h"
+#include "catalog/pg_tablesample_method.h"
#include "catalog/pg_trigger.h"
#include "catalog/pg_type.h"
#include "commands/defrem.h"
@@ -343,6 +344,8 @@ static void make_ruledef(StringInfo buf, HeapTuple ruletup, TupleDesc rulettc,
int prettyFlags);
static void make_viewdef(StringInfo buf, HeapTuple ruletup, TupleDesc rulettc,
int prettyFlags, int wrapColumn);
+static void get_tablesample_def(TableSampleClause *tablesample,
+ deparse_context *context);
static void get_query_def(Query *query, StringInfo buf, List *parentnamespace,
TupleDesc resultDesc,
int prettyFlags, int wrapColumn, int startIndent);
@@ -4184,6 +4187,50 @@ make_viewdef(StringInfo buf, HeapTuple ruletup, TupleDesc rulettc,
heap_close(ev_relation, AccessShareLock);
}
+/* ----------
+ * get_tablesample_def - Convert TableSampleClause back to SQL
+ * ----------
+ */
+static void
+get_tablesample_def(TableSampleClause *tablesample, deparse_context *context)
+{
+ StringInfo buf = context->buf;
+ HeapTuple tuple;
+ Form_pg_tablesample_method tsm;
+ char *tsmname;
+ int nargs;
+ ListCell *l;
+
+ /* Load the tablesample method */
+ tuple = SearchSysCache1(TABLESAMPLEMETHODOID, ObjectIdGetDatum(tablesample->tsmid));
+ if (!HeapTupleIsValid(tuple))
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_OBJECT),
+ errmsg("cache lookup failed for tablesample method %u",
+ tablesample->tsmid)));
+
+ tsm = (Form_pg_tablesample_method) GETSTRUCT(tuple);
+ tsmname = NameStr(tsm->tsmname);
+ appendStringInfo(buf, " TABLESAMPLE %s (", quote_identifier(tsmname));
+
+ ReleaseSysCache(tuple);
+
+ nargs = 0;
+ foreach(l, tablesample->args)
+ {
+ if (nargs++ > 0)
+ appendStringInfoString(buf, ", ");
+ get_rule_expr((Node *) lfirst(l), context, true);
+ }
+ appendStringInfoChar(buf, ')');
+
+ if (tablesample->repeatable != NULL)
+ {
+ appendStringInfoString(buf, " REPEATABLE (");
+ get_rule_expr(tablesample->repeatable, context, true);
+ appendStringInfoChar(buf, ')');
+ }
+}
/* ----------
* get_query_def - Parse back one query parsetree
@@ -8411,6 +8458,9 @@ get_from_clause_item(Node *jtnode, Query *query, deparse_context *context)
only_marker(rte),
generate_relation_name(rte->relid,
context->namespaces));
+
+ if (rte->tablesample)
+ get_tablesample_def(rte->tablesample, context);
break;
case RTE_SUBQUERY:
/* Subquery RTE */
diff --git a/src/backend/utils/cache/syscache.c b/src/backend/utils/cache/syscache.c
index bd27168..3a8f01e 100644
--- a/src/backend/utils/cache/syscache.c
+++ b/src/backend/utils/cache/syscache.c
@@ -55,6 +55,7 @@
#include "catalog/pg_shdescription.h"
#include "catalog/pg_shseclabel.h"
#include "catalog/pg_statistic.h"
+#include "catalog/pg_tablesample_method.h"
#include "catalog/pg_tablespace.h"
#include "catalog/pg_ts_config.h"
#include "catalog/pg_ts_config_map.h"
@@ -642,6 +643,28 @@ static const struct cachedesc cacheinfo[] = {
},
128
},
+ {TableSampleMethodRelationId, /* TABLESAMPLEMETHODNAME */
+ TableSampleMethodNameIndexId,
+ 1,
+ {
+ Anum_pg_tablesample_method_tsmname,
+ 0,
+ 0,
+ 0,
+ },
+ 2
+ },
+ {TableSampleMethodRelationId, /* TABLESAMPLEMETHODOID */
+ TableSampleMethodOidIndexId,
+ 1,
+ {
+ ObjectIdAttributeNumber,
+ 0,
+ 0,
+ 0,
+ },
+ 2
+ },
{TableSpaceRelationId, /* TABLESPACEOID */
TablespaceOidIndexId,
1,
diff --git a/src/backend/utils/misc/sampling.c b/src/backend/utils/misc/sampling.c
index 1eeabaf..f213c46 100644
--- a/src/backend/utils/misc/sampling.c
+++ b/src/backend/utils/misc/sampling.c
@@ -46,6 +46,8 @@ BlockSampler_Init(BlockSampler bs, BlockNumber nblocks, int samplesize,
bs->n = samplesize;
bs->t = 0; /* blocks scanned so far */
bs->m = 0; /* blocks selected so far */
+
+ sampler_random_init_state(randseed, bs->randstate);
}
bool
@@ -92,7 +94,7 @@ BlockSampler_Next(BlockSampler bs)
* less than k, which means that we cannot fail to select enough blocks.
*----------
*/
- V = sampler_random_fract();
+ V = sampler_random_fract(bs->randstate);
p = 1.0 - (double) k / (double) K;
while (V < p)
{
@@ -126,8 +128,14 @@ BlockSampler_Next(BlockSampler bs)
void
reservoir_init_selection_state(ReservoirState rs, int n)
{
+ /*
+ * Reservoir sampling is not used anywhere where it would need to return
+ * repeatable results so we can initialize it randomly.
+ */
+ sampler_random_init_state(random(), rs->randstate);
+
/* Initial value of W (for use when Algorithm Z is first applied) */
- *rs = exp(-log(sampler_random_fract()) / n);
+ rs->W = exp(-log(sampler_random_fract(rs->randstate)) / n);
}
double
@@ -142,7 +150,7 @@ reservoir_get_next_S(ReservoirState rs, double t, int n)
double V,
quot;
- V = sampler_random_fract(); /* Generate V */
+ V = sampler_random_fract(rs->randstate); /* Generate V */
S = 0;
t += 1;
/* Note: "num" in Vitter's code is always equal to t - n */
@@ -158,7 +166,7 @@ reservoir_get_next_S(ReservoirState rs, double t, int n)
else
{
/* Now apply Algorithm Z */
- double W = *rs;
+ double W = rs->W;
double term = t - (double) n + 1;
for (;;)
@@ -174,7 +182,7 @@ reservoir_get_next_S(ReservoirState rs, double t, int n)
tmp;
/* Generate U and X */
- U = sampler_random_fract();
+ U = sampler_random_fract(rs->randstate);
X = t * (W - 1.0);
S = floor(X); /* S is tentatively set to floor(X) */
/* Test if U <= h(S)/cg(X) in the manner of (6.3) */
@@ -203,11 +211,11 @@ reservoir_get_next_S(ReservoirState rs, double t, int n)
y *= numer / denom;
denom -= 1;
}
- W = exp(-log(sampler_random_fract()) / n); /* Generate W in advance */
+ W = exp(-log(sampler_random_fract(rs->randstate)) / n); /* Generate W in advance */
if (exp(log(y) / n) <= (t + X) / t)
break;
}
- *rs = W;
+ rs->W = W;
}
return S;
}
@@ -217,10 +225,17 @@ reservoir_get_next_S(ReservoirState rs, double t, int n)
* Random number generator used by sampling
*----------
*/
+void
+sampler_random_init_state(long seed, SamplerRandomState randstate)
+{
+ randstate[0] = 0x330e;
+ randstate[1] = (unsigned short) seed;
+ randstate[2] = (unsigned short) (seed >> 16);
+}
/* Select a random value R uniformly distributed in (0 - 1) */
double
-sampler_random_fract()
+sampler_random_fract(SamplerRandomState randstate)
{
- return ((double) random() + 1) / ((double) MAX_RANDOM_VALUE + 2);
+ return pg_erand48(randstate);
}
diff --git a/src/backend/utils/tablesample/Makefile b/src/backend/utils/tablesample/Makefile
new file mode 100644
index 0000000..df92939
--- /dev/null
+++ b/src/backend/utils/tablesample/Makefile
@@ -0,0 +1,17 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+# Makefile for utils/tablesample
+#
+# IDENTIFICATION
+# src/backend/utils/tablesample/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/backend/utils/tablesample
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS = system.o bernoulli.o
+
+include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/utils/tablesample/bernoulli.c b/src/backend/utils/tablesample/bernoulli.c
new file mode 100644
index 0000000..f7e9688
--- /dev/null
+++ b/src/backend/utils/tablesample/bernoulli.c
@@ -0,0 +1,224 @@
+/*-------------------------------------------------------------------------
+ *
+ * bernoulli.c
+ * interface routines for BERNOULLI tablesample method
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/utils/tablesample/bernoulli.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "fmgr.h"
+
+#include "access/relscan.h"
+#include "nodes/execnodes.h"
+#include "nodes/relation.h"
+#include "optimizer/clauses.h"
+#include "storage/bufmgr.h"
+#include "utils/sampling.h"
+#include "utils/tablesample.h"
+
+
+/* State */
+typedef struct
+{
+ uint32 seed; /* random seed */
+ BlockNumber startblock; /* starting block, we use ths for syncscan support */
+ BlockNumber nblocks; /* number of blocks */
+ BlockNumber blockno; /* current block */
+ float4 probability; /* probabilty that tuple will be returned (0.0-1.0) */
+ OffsetNumber lt; /* last tuple returned from current block */
+ SamplerRandomState randstate; /* random generator state */
+} BernoulliSamplerData;
+
+/*
+ * Initialize the state.
+ */
+Datum
+tsm_bernoulli_init(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ uint32 seed = PG_GETARG_UINT32(1);
+ float4 percent = PG_ARGISNULL(2) ? -1 : PG_GETARG_FLOAT4(2);
+ HeapScanDesc scan = scanstate->ss.ss_currentScanDesc;
+ BernoulliSamplerData *sampler;
+
+ if (percent < 0 || percent > 100)
+ ereport(ERROR,
+ (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+ errmsg("invalid sample size"),
+ errhint("Sample size must be numeric value between 0 and 100 (inclusive).")));
+
+ sampler = palloc0(sizeof(BernoulliSamplerData));
+
+ /* Remember initial values for reinit */
+ sampler->seed = seed;
+ sampler->startblock = scan->rs_startblock;
+ sampler->nblocks = scan->rs_nblocks;
+ sampler->blockno = InvalidBlockNumber;
+ sampler->probability = percent / 100;
+ sampler->lt = InvalidOffsetNumber;
+ sampler_random_init_state(sampler->seed, sampler->randstate);
+
+ scanstate->tsmdata = (void *) sampler;
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Get next block number to read or InvalidBlockNumber if we are at the
+ * end of the relation.
+ */
+Datum
+tsm_bernoulli_nextblock(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ BernoulliSamplerData *sampler =
+ (BernoulliSamplerData *) scanstate->tsmdata;
+
+ /*
+ * Bernoulli sampling scans all blocks on the table and supports
+ * syncscan so loop from startblock to startblock instead of
+ * from 0 to nblocks.
+ */
+ if (sampler->blockno == InvalidBlockNumber)
+ sampler->blockno = sampler->startblock;
+ else
+ {
+ sampler->blockno++;
+
+ if (sampler->blockno >= sampler->nblocks)
+ sampler->blockno = 0;
+
+ if (sampler->blockno == sampler->startblock)
+ PG_RETURN_UINT32(InvalidBlockNumber);
+ }
+
+ PG_RETURN_UINT32(sampler->blockno);
+}
+
+/*
+ * Get next tuple from current block.
+ *
+ * This method implements the main logic in bernoulli sampling.
+ * The algorithm simply generates new random number (in 0.0-1.0 range) and if
+ * it falls within user specified probability (in the same range) return the
+ * tuple offset.
+ *
+ * If we reach end of the block return InvalidOffsetNumber which tells
+ * SampleScan to go to next block.
+ */
+Datum
+tsm_bernoulli_nexttuple(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ OffsetNumber maxoffset = PG_GETARG_UINT16(2);
+ BernoulliSamplerData *sampler =
+ (BernoulliSamplerData *) scanstate->tsmdata;
+ OffsetNumber tupoffset = sampler->lt;
+ float4 probability = sampler->probability;
+
+ if (tupoffset == InvalidOffsetNumber)
+ tupoffset = FirstOffsetNumber;
+ else
+ tupoffset++;
+
+ /*
+ * Loop over tuple offsets until the random generator returns value that
+ * is within the probability of returning the tuple or until we reach
+ * end of the block.
+ *
+ * (This is our implementation of bernoulli trial)
+ */
+ while (sampler_random_fract(sampler->randstate) > probability)
+ {
+ tupoffset++;
+
+ if (tupoffset > maxoffset)
+ break;
+ }
+
+ if (tupoffset > maxoffset)
+ /* Tell SampleScan that we want next block. */
+ tupoffset = InvalidOffsetNumber;
+
+ sampler->lt = tupoffset;
+
+ PG_RETURN_UINT16(tupoffset);
+}
+
+/*
+ * Cleanup method.
+ */
+Datum
+tsm_bernoulli_end(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+
+ pfree(scanstate->tsmdata);
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Reset state (called by ReScan).
+ */
+Datum
+tsm_bernoulli_reset(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ BernoulliSamplerData *sampler =
+ (BernoulliSamplerData *) scanstate->tsmdata;
+
+ sampler->blockno = InvalidBlockNumber;
+ sampler->lt = InvalidOffsetNumber;
+ sampler_random_init_state(sampler->seed, sampler->randstate);
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Costing function.
+ */
+Datum
+tsm_bernoulli_cost(PG_FUNCTION_ARGS)
+{
+ PlannerInfo *root = (PlannerInfo *) PG_GETARG_POINTER(0);
+ Path *path = (Path *) PG_GETARG_POINTER(1);
+ RelOptInfo *baserel = (RelOptInfo *) PG_GETARG_POINTER(2);
+ List *args = (List *) PG_GETARG_POINTER(3);
+ BlockNumber *pages = (BlockNumber *) PG_GETARG_POINTER(4);
+ double *tuples = (double *) PG_GETARG_POINTER(5);
+ Node *pctnode;
+ float4 samplesize;
+
+ *pages = baserel->pages;
+
+ pctnode = linitial(args);
+ pctnode = estimate_expression_value(root, pctnode);
+
+ if (IsA(pctnode, RelabelType))
+ pctnode = (Node *) ((RelabelType *) pctnode)->arg;
+
+ if (IsA(pctnode, Const))
+ {
+ samplesize = DatumGetFloat4(((Const *) pctnode)->constvalue);
+ samplesize /= 100.0;
+ }
+ else
+ {
+ /* Default samplesize if the estimation didn't return Const. */
+ samplesize = 0.1;
+ }
+
+ *tuples = path->rows * samplesize;
+ path->rows = *tuples;
+
+ PG_RETURN_VOID();
+}
diff --git a/src/backend/utils/tablesample/system.c b/src/backend/utils/tablesample/system.c
new file mode 100644
index 0000000..0c4da28
--- /dev/null
+++ b/src/backend/utils/tablesample/system.c
@@ -0,0 +1,185 @@
+/*-------------------------------------------------------------------------
+ *
+ * system.c
+ * interface routines for system tablesample method
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/utils/tablesample/system.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "fmgr.h"
+
+#include "nodes/execnodes.h"
+#include "nodes/relation.h"
+#include "optimizer/clauses.h"
+#include "storage/bufmgr.h"
+#include "utils/sampling.h"
+#include "utils/tablesample.h"
+
+
+/*
+ * State
+ */
+typedef struct
+{
+ BlockSamplerData bs;
+ uint32 seed; /* random seed */
+ BlockNumber tblocks; /* total blocks in relation */
+ int samplesize; /* number of blocks to return */
+ OffsetNumber lt; /* last tuple returned from current block */
+} SystemSamplerData;
+
+
+/*
+ * Initializes the state.
+ */
+Datum
+tsm_system_init(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ uint32 seed = PG_GETARG_UINT32(1);
+ float4 percent = PG_ARGISNULL(2) ? -1 : PG_GETARG_FLOAT4(2);
+ SystemSamplerData *sampler;
+
+ if (percent < 0 || percent > 100)
+ ereport(ERROR,
+ (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+ errmsg("invalid sample size"),
+ errhint("Sample size must be numeric value between 0 and 100 (inclusive).")));
+
+ sampler = palloc0(sizeof(SystemSamplerData));
+
+ /* Remember initial values for reinit */
+ sampler->seed = seed;
+ sampler->tblocks =
+ RelationGetNumberOfBlocks(scanstate->ss.ss_currentRelation);
+ sampler->samplesize = 1 + (int) (sampler->tblocks * (percent / 100.0));
+ sampler->lt = InvalidOffsetNumber;
+
+ BlockSampler_Init(&sampler->bs, sampler->tblocks, sampler->samplesize,
+ sampler->seed);
+
+ scanstate->tsmdata = (void *) sampler;
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Get next block number or InvalidBlockNumber when we're done.
+ *
+ * Uses the same logic as VACUUM for picking the random blocks.
+ */
+Datum
+tsm_system_nextblock(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ SystemSamplerData *sampler = (SystemSamplerData *) scanstate->tsmdata;
+ BlockNumber blockno;
+
+ if (!BlockSampler_HasMore(&sampler->bs))
+ PG_RETURN_UINT32(InvalidBlockNumber);
+
+ blockno = BlockSampler_Next(&sampler->bs);
+
+ PG_RETURN_UINT32(blockno);
+}
+
+/*
+ * Get next tuple offset in current block or InvalidOffsetNumber if we are done
+ * with this block.
+ */
+Datum
+tsm_system_nexttuple(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ OffsetNumber maxoffset = PG_GETARG_UINT16(2);
+ SystemSamplerData *sampler = (SystemSamplerData *) scanstate->tsmdata;
+ OffsetNumber tupoffset = sampler->lt;
+
+ if (tupoffset == InvalidOffsetNumber)
+ tupoffset = FirstOffsetNumber;
+ else
+ tupoffset++;
+
+ if (tupoffset > maxoffset)
+ tupoffset = InvalidOffsetNumber;
+
+ sampler->lt = tupoffset;
+
+ PG_RETURN_UINT16(tupoffset);
+}
+
+/*
+ * Cleanup method.
+ */
+Datum
+tsm_system_end(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+
+ pfree(scanstate->tsmdata);
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Reset state (called by ReScan).
+ */
+Datum
+tsm_system_reset(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ SystemSamplerData *sampler = (SystemSamplerData *) scanstate->tsmdata;
+
+ sampler->lt = InvalidOffsetNumber;
+ BlockSampler_Init(&sampler->bs, sampler->tblocks, sampler->samplesize,
+ sampler->seed);
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Costing function.
+ */
+Datum
+tsm_system_cost(PG_FUNCTION_ARGS)
+{
+ PlannerInfo *root = (PlannerInfo *) PG_GETARG_POINTER(0);
+ Path *path = (Path *) PG_GETARG_POINTER(1);
+ RelOptInfo *baserel = (RelOptInfo *) PG_GETARG_POINTER(2);
+ List *args = (List *) PG_GETARG_POINTER(3);
+ BlockNumber *pages = (BlockNumber *) PG_GETARG_POINTER(4);
+ double *tuples = (double *) PG_GETARG_POINTER(5);
+ Node *pctnode;
+ float4 samplesize;
+
+ pctnode = linitial(args);
+ pctnode = estimate_expression_value(root, pctnode);
+
+ if (IsA(pctnode, RelabelType))
+ pctnode = (Node *) ((RelabelType *) pctnode)->arg;
+
+ if (IsA(pctnode, Const))
+ {
+ samplesize = DatumGetFloat4(((Const *) pctnode)->constvalue);
+ samplesize /= 100.0;
+ }
+ else
+ {
+ /* Default samplesize if the estimation didn't return Const. */
+ samplesize = 0.1;
+ }
+
+ *pages = baserel->pages * samplesize;
+ *tuples = path->rows * samplesize;
+ path->rows = *tuples;
+
+ PG_RETURN_VOID();
+}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 939d93d..9dabcb0 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -115,6 +115,7 @@ extern HeapScanDesc heap_beginscan_bm(Relation relation, Snapshot snapshot,
int nkeys, ScanKey key);
extern void heap_setscanlimits(HeapScanDesc scan, BlockNumber startBlk,
BlockNumber endBlk);
+extern void heapgetpage(HeapScanDesc scan, BlockNumber page);
extern void heap_rescan(HeapScanDesc scan, ScanKey key);
extern void heap_endscan(HeapScanDesc scan);
extern HeapTuple heap_getnext(HeapScanDesc scan, ScanDirection direction);
diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
index 9bb6362..e2b2b4f 100644
--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -29,6 +29,7 @@ typedef struct HeapScanDescData
int rs_nkeys; /* number of scan keys */
ScanKey rs_key; /* array of scan key descriptors */
bool rs_bitmapscan; /* true if this is really a bitmap scan */
+ bool rs_samplescan; /* true if this is really a sample scan */
bool rs_pageatatime; /* verify visibility page-at-a-time? */
bool rs_allow_strat; /* allow or disallow use of access strategy */
bool rs_allow_sync; /* allow or disallow use of syncscan */
diff --git a/src/include/catalog/indexing.h b/src/include/catalog/indexing.h
index a680229..c711cca 100644
--- a/src/include/catalog/indexing.h
+++ b/src/include/catalog/indexing.h
@@ -305,6 +305,11 @@ DECLARE_UNIQUE_INDEX(pg_policy_oid_index, 3257, on pg_policy using btree(oid oid
DECLARE_UNIQUE_INDEX(pg_policy_polrelid_polname_index, 3258, on pg_policy using btree(polrelid oid_ops, polname name_ops));
#define PolicyPolrelidPolnameIndexId 3258
+DECLARE_UNIQUE_INDEX(pg_tablesample_method_name_index, 3281, on pg_tablesample_method using btree(tsmname name_ops));
+#define TableSampleMethodNameIndexId 3281
+DECLARE_UNIQUE_INDEX(pg_tablesample_method_oid_index, 3282, on pg_tablesample_method using btree(oid oid_ops));
+#define TableSampleMethodOidIndexId 3282
+
/* last step of initialization script: build the indexes declared above */
BUILD_INDICES
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 9edfdb8..e6d821d 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -5143,6 +5143,31 @@ DESCR("rank of hypothetical row without gaps");
DATA(insert OID = 3993 ( dense_rank_final PGNSP PGUID 12 1 0 2276 0 f f f f f f i 2 0 20 "2281 2276" "{2281,2276}" "{i,v}" _null_ _null_ hypothetical_dense_rank_final _null_ _null_ _null_ ));
DESCR("aggregate final function");
+DATA(insert OID = 3285 ( tsm_system_init PGNSP PGUID 12 1 0 0 0 f f f f t f v 3 0 2278 "2281 23 700" _null_ _null_ _null_ _null_ tsm_system_init _null_ _null_ _null_ ));
+DESCR("tsm_system_init(internal)");
+DATA(insert OID = 3286 ( tsm_system_nextblock PGNSP PGUID 12 1 0 0 0 f f f f t f v 2 0 23 "2281 16" _null_ _null_ _null_ _null_ tsm_system_nextblock _null_ _null_ _null_ ));
+DESCR("tsm_system_nextblock(internal)");
+DATA(insert OID = 3287 ( tsm_system_nexttuple PGNSP PGUID 12 1 0 0 0 f f f f t f v 4 0 21 "2281 23 21 16" _null_ _null_ _null_ _null_ tsm_system_nexttuple _null_ _null_ _null_ ));
+DESCR("tsm_system_nexttuple(internal)");
+DATA(insert OID = 3288 ( tsm_system_end PGNSP PGUID 12 1 0 0 0 f f f f t f v 1 0 2278 "2281" _null_ _null_ _null_ _null_ tsm_system_end _null_ _null_ _null_ ));
+DESCR("tsm_system_end(internal)");
+DATA(insert OID = 3289 ( tsm_system_reset PGNSP PGUID 12 1 0 0 0 f f f f t f v 1 0 2278 "2281" _null_ _null_ _null_ _null_ tsm_system_reset _null_ _null_ _null_ ));
+DESCR("tsm_system_reset(internal)");
+DATA(insert OID = 3290 ( tsm_system_cost PGNSP PGUID 12 1 0 0 0 f f f f t f v 7 0 2278 "2281 2281 2281 2281 2281 2281 2281" _null_ _null_ _null_ _null_ tsm_system_cost _null_ _null_ _null_ ));
+DESCR("tsm_system_cost(internal)");
+
+DATA(insert OID = 3291 ( tsm_bernoulli_init PGNSP PGUID 12 1 0 0 0 f f f f t f v 3 0 2278 "2281 23 700" _null_ _null_ _null_ _null_ tsm_bernoulli_init _null_ _null_ _null_ ));
+DESCR("tsm_bernoulli_init(internal)");
+DATA(insert OID = 3292 ( tsm_bernoulli_nextblock PGNSP PGUID 12 1 0 0 0 f f f f t f v 2 0 23 "2281 16" _null_ _null_ _null_ _null_ tsm_bernoulli_nextblock _null_ _null_ _null_ ));
+DESCR("tsm_bernoulli_nextblock(internal)");
+DATA(insert OID = 3293 ( tsm_bernoulli_nexttuple PGNSP PGUID 12 1 0 0 0 f f f f t f v 4 0 21 "2281 23 21 16" _null_ _null_ _null_ _null_ tsm_bernoulli_nexttuple _null_ _null_ _null_ ));
+DESCR("tsm_bernoulli_nexttuple(internal)");
+DATA(insert OID = 3294 ( tsm_bernoulli_end PGNSP PGUID 12 1 0 0 0 f f f f t f v 1 0 2278 "2281" _null_ _null_ _null_ _null_ tsm_bernoulli_end _null_ _null_ _null_ ));
+DESCR("tsm_bernoulli_end(internal)");
+DATA(insert OID = 3296 ( tsm_bernoulli_reset PGNSP PGUID 12 1 0 0 0 f f f f t f v 1 0 2278 "2281" _null_ _null_ _null_ _null_ tsm_bernoulli_reset _null_ _null_ _null_ ));
+DESCR("tsm_bernoulli_reset(internal)");
+DATA(insert OID = 3297 ( tsm_bernoulli_cost PGNSP PGUID 12 1 0 0 0 f f f f t f v 7 0 2278 "2281 2281 2281 2281 2281 2281 2281" _null_ _null_ _null_ _null_ tsm_bernoulli_cost _null_ _null_ _null_ ));
+DESCR("tsm_bernoulli_cost(internal)");
/*
* Symbolic values for provolatile column: these indicate whether the result
diff --git a/src/include/catalog/pg_tablesample_method.h b/src/include/catalog/pg_tablesample_method.h
new file mode 100644
index 0000000..fd76f77
--- /dev/null
+++ b/src/include/catalog/pg_tablesample_method.h
@@ -0,0 +1,75 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_tablesample_method.h
+ * definition of the table scan methods.
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/catalog/pg_tablesample_method.h
+ *
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PG_TABLESAMPLE_METHOD_H
+#define PG_TABLESAMPLE_METHOD_H
+
+#include "catalog/genbki.h"
+
+/* ----------------
+ * pg_tablesample_method definition. cpp turns this into
+ * typedef struct FormData_pg_tablesample_method
+ * ----------------
+ */
+#define TableSampleMethodRelationId 3280
+
+CATALOG(pg_tablesample_method,3280)
+{
+ NameData tsmname; /* tablesample method name */
+ bool tsmseqscan; /* does this method scan whole table sequentially? */
+ regproc tsminit; /* init scan function */
+ regproc tsmnextblock; /* function returning next block to sample
+ or InvalidBlockOffset if finished */
+ regproc tsmnexttuple; /* function returning next tuple offset from current block
+ or InvalidOffsetNumber if end of the block was reacher */
+ regproc tsmexaminetuple; /* optional function which can examine tuple contents and
+ decide if tuple should be returned or not */
+ regproc tsmend; /* end scan function*/
+ regproc tsmreset; /* reset state - used by rescan */
+ regproc tsmcost; /* costing function */
+} FormData_pg_tablesample_method;
+
+/* ----------------
+ * Form_pg_tablesample_method corresponds to a pointer to a tuple with
+ * the format of pg_tablesample_method relation.
+ * ----------------
+ */
+typedef FormData_pg_tablesample_method *Form_pg_tablesample_method;
+
+/* ----------------
+ * compiler constants for pg_tablesample_method
+ * ----------------
+ */
+#define Natts_pg_tablesample_method 9
+#define Anum_pg_tablesample_method_tsmname 1
+#define Anum_pg_tablesample_method_tsmseqscan 2
+#define Anum_pg_tablesample_method_tsminit 3
+#define Anum_pg_tablesample_method_tsmnextblock 4
+#define Anum_pg_tablesample_method_tsmnexttuple 5
+#define Anum_pg_tablesample_method_tsmexaminetuple 6
+#define Anum_pg_tablesample_method_tsmend 7
+#define Anum_pg_tablesample_method_tsmreset 8
+#define Anum_pg_tablesample_method_tsmcost 9
+
+/* ----------------
+ * initial contents of pg_tablesample_method
+ * ----------------
+ */
+
+DATA(insert OID = 3283 ( system false tsm_system_init tsm_system_nextblock tsm_system_nexttuple - tsm_system_end tsm_system_reset tsm_system_cost ));
+DESCR("SYSTEM table sampling method");
+DATA(insert OID = 3284 ( bernoulli true tsm_bernoulli_init tsm_bernoulli_nextblock tsm_bernoulli_nexttuple - tsm_bernoulli_end tsm_bernoulli_reset tsm_bernoulli_cost ));
+DESCR("BERNOULLI table sampling method");
+
+#endif /* PG_TABLESAMPLE_METHOD_H */
diff --git a/src/include/executor/nodeSamplescan.h b/src/include/executor/nodeSamplescan.h
new file mode 100644
index 0000000..4b769da
--- /dev/null
+++ b/src/include/executor/nodeSamplescan.h
@@ -0,0 +1,24 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeSamplescan.h
+ *
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/executor/nodeSamplescan.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef NODESAMPLESCAN_H
+#define NODESAMPLESCAN_H
+
+#include "nodes/execnodes.h"
+
+extern SampleScanState *ExecInitSampleScan(SampleScan *node, EState *estate, int eflags);
+extern TupleTableSlot *ExecSampleScan(SampleScanState *node);
+extern void ExecEndSampleScan(SampleScanState *node);
+extern void ExecReScanSampleScan(SampleScanState *node);
+
+#endif /* NODESAMPLESCAN_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 41288ed..e913924 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1212,6 +1212,24 @@ typedef struct ScanState
typedef ScanState SeqScanState;
/*
+ * SampleScan
+ */
+typedef struct SampleScanState
+{
+ ScanState ss;
+
+ /* Sampling method functions. */
+ FmgrInfo tsminit;
+ FmgrInfo tsmnextblock;
+ FmgrInfo tsmnexttuple;
+ FmgrInfo tsmexaminetuple;
+ FmgrInfo tsmend;
+ FmgrInfo tsmreset;
+
+ void *tsmdata; /* for use by table scan method */
+} SampleScanState;
+
+/*
* These structs store information about index quals that don't have simple
* constant right-hand sides. See comments for ExecIndexBuildScanKeys()
* for discussion.
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 97ef0fc..3276be8 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -61,6 +61,7 @@ typedef enum NodeTag
T_ValuesScan,
T_CteScan,
T_WorkTableScan,
+ T_SampleScan,
T_ForeignScan,
T_CustomScan,
T_Join,
@@ -97,6 +98,7 @@ typedef enum NodeTag
T_BitmapOrState,
T_ScanState,
T_SeqScanState,
+ T_SampleScanState,
T_IndexScanState,
T_IndexOnlyScanState,
T_BitmapIndexScanState,
@@ -413,6 +415,8 @@ typedef enum NodeTag
T_XmlSerialize,
T_WithClause,
T_CommonTableExpr,
+ T_RangeTableSample,
+ T_TableSampleClause,
/*
* TAGS FOR REPLICATION GRAMMAR PARSE NODES (replnodes.h)
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index b1dfa85..2f4df1d 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -307,6 +307,25 @@ typedef struct FuncCall
} FuncCall;
/*
+ * TableSampleClause - a sampling method information
+ */
+typedef struct TableSampleClause
+{
+ NodeTag type;
+ Oid tsmid;
+ bool tsmseqscan;
+ Oid tsminit;
+ Oid tsmnextblock;
+ Oid tsmnexttuple;
+ Oid tsmexaminetuple;
+ Oid tsmend;
+ Oid tsmreset;
+ Oid tsmcost;
+ Node *repeatable;
+ List *args;
+} TableSampleClause;
+
+/*
* A_Star - '*' representing all columns of a table or compound field
*
* This can appear within ColumnRef.fields, A_Indirection.indirection, and
@@ -507,6 +526,21 @@ typedef struct RangeFunction
} RangeFunction;
/*
+ * RangeTableSample - represents <table> TABLESAMPLE <method> (<params>) REPEATABLE (<num>)
+ *
+ * We are more generic than SQL Standard so we pass generic function
+ * arguments to the sampling method.
+ */
+typedef struct RangeTableSample
+{
+ NodeTag type;
+ RangeVar *relation;
+ char *method; /* sampling method */
+ Node *repeatable;
+ List *args; /* arguments for sampling method */
+} RangeTableSample;
+
+/*
* ColumnDef - column definition (used in various creates)
*
* If the column has a default value, we may have the value expression
@@ -751,6 +785,7 @@ typedef struct RangeTblEntry
*/
Oid relid; /* OID of the relation */
char relkind; /* relation kind (see pg_class.relkind) */
+ TableSampleClause *tablesample; /* sampling method and parameters */
/*
* Fields valid for a subquery RTE (else NULL):
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 316c9ce..8a2a146 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -278,6 +278,12 @@ typedef struct Scan
typedef Scan SeqScan;
/* ----------------
+ * table sample scan node
+ * ----------------
+ */
+typedef Scan SampleScan;
+
+/* ----------------
* index scan node
*
* indexqualorig is an implicitly-ANDed list of index qual expressions, each
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 9c2000b..24003ae 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -68,6 +68,7 @@ extern double index_pages_fetched(double tuples_fetched, BlockNumber pages,
double index_pages, PlannerInfo *root);
extern void cost_seqscan(Path *path, PlannerInfo *root, RelOptInfo *baserel,
ParamPathInfo *param_info);
+extern void cost_samplescan(Path *path, PlannerInfo *root, RelOptInfo *baserel);
extern void cost_index(IndexPath *path, PlannerInfo *root,
double loop_count);
extern void cost_bitmap_heap_scan(Path *path, PlannerInfo *root, RelOptInfo *baserel,
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 9923f0e..89c8ded 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -32,6 +32,8 @@ extern bool add_path_precheck(RelOptInfo *parent_rel,
extern Path *create_seqscan_path(PlannerInfo *root, RelOptInfo *rel,
Relids required_outer);
+extern Path *create_samplescan_path(PlannerInfo *root, RelOptInfo *rel,
+ Relids required_outer);
extern IndexPath *create_index_path(PlannerInfo *root,
IndexOptInfo *index,
List *indexclauses,
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index 7c243ec..6ff7b44 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -312,7 +312,7 @@ PG_KEYWORD("reindex", REINDEX, UNRESERVED_KEYWORD)
PG_KEYWORD("relative", RELATIVE_P, UNRESERVED_KEYWORD)
PG_KEYWORD("release", RELEASE, UNRESERVED_KEYWORD)
PG_KEYWORD("rename", RENAME, UNRESERVED_KEYWORD)
-PG_KEYWORD("repeatable", REPEATABLE, UNRESERVED_KEYWORD)
+PG_KEYWORD("repeatable", REPEATABLE, RESERVED_KEYWORD)
PG_KEYWORD("replace", REPLACE, UNRESERVED_KEYWORD)
PG_KEYWORD("replica", REPLICA, UNRESERVED_KEYWORD)
PG_KEYWORD("reset", RESET, UNRESERVED_KEYWORD)
@@ -366,6 +366,7 @@ PG_KEYWORD("sysid", SYSID, UNRESERVED_KEYWORD)
PG_KEYWORD("system", SYSTEM_P, UNRESERVED_KEYWORD)
PG_KEYWORD("table", TABLE, RESERVED_KEYWORD)
PG_KEYWORD("tables", TABLES, UNRESERVED_KEYWORD)
+PG_KEYWORD("tablesample", TABLESAMPLE, TYPE_FUNC_NAME_KEYWORD)
PG_KEYWORD("tablespace", TABLESPACE, UNRESERVED_KEYWORD)
PG_KEYWORD("temp", TEMP, UNRESERVED_KEYWORD)
PG_KEYWORD("template", TEMPLATE, UNRESERVED_KEYWORD)
diff --git a/src/include/parser/parse_func.h b/src/include/parser/parse_func.h
index 3264691..6727e55 100644
--- a/src/include/parser/parse_func.h
+++ b/src/include/parser/parse_func.h
@@ -33,6 +33,10 @@ typedef enum
extern Node *ParseFuncOrColumn(ParseState *pstate, List *funcname, List *fargs,
FuncCall *fn, int location);
+extern TableSampleClause *ParseTableSample(ParseState *pstate,
+ char *samplemethod,
+ Node *repeatable, List *args);
+
extern FuncDetailCode func_get_detail(List *funcname,
List *fargs, List *fargnames,
int nargs, Oid *argtypes,
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index 6bd786d..185bd81 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -63,7 +63,6 @@ typedef struct RelationAmInfo
FmgrInfo amcanreturn;
} RelationAmInfo;
-
/*
* Here are the contents of a relation cache entry.
*/
diff --git a/src/include/utils/sampling.h b/src/include/utils/sampling.h
index e3e7f9c..4ac208d 100644
--- a/src/include/utils/sampling.h
+++ b/src/include/utils/sampling.h
@@ -15,7 +15,12 @@
#include "storage/bufmgr.h"
-extern double sampler_random_fract(void);
+/* Random generator for sampling code */
+typedef unsigned short SamplerRandomState[3];
+
+extern void sampler_random_init_state(long seed,
+ SamplerRandomState randstate);
+extern double sampler_random_fract(SamplerRandomState randstate);
/* Block sampling methods */
/* Data structure for Algorithm S from Knuth 3.4.2 */
@@ -25,6 +30,7 @@ typedef struct
int n; /* desired sample size */
BlockNumber t; /* current block number */
int m; /* blocks selected so far */
+ SamplerRandomState randstate; /* random generator state */
} BlockSamplerData;
typedef BlockSamplerData *BlockSampler;
@@ -35,7 +41,12 @@ extern bool BlockSampler_HasMore(BlockSampler bs);
extern BlockNumber BlockSampler_Next(BlockSampler bs);
/* Reservoid sampling methods */
-typedef double ReservoirStateData;
+typedef struct
+{
+ double W;
+ SamplerRandomState randstate; /* random generator state */
+} ReservoirStateData;
+
typedef ReservoirStateData *ReservoirState;
extern void reservoir_init_selection_state(ReservoirState rs, int n);
diff --git a/src/include/utils/syscache.h b/src/include/utils/syscache.h
index ba0b090..6b628f6 100644
--- a/src/include/utils/syscache.h
+++ b/src/include/utils/syscache.h
@@ -79,6 +79,8 @@ enum SysCacheIdentifier
RELOID,
RULERELNAME,
STATRELATTINH,
+ TABLESAMPLEMETHODNAME,
+ TABLESAMPLEMETHODOID,
TABLESPACEOID,
TSCONFIGMAP,
TSCONFIGNAMENSP,
diff --git a/src/include/utils/tablesample.h b/src/include/utils/tablesample.h
new file mode 100644
index 0000000..1a24cb6
--- /dev/null
+++ b/src/include/utils/tablesample.h
@@ -0,0 +1,27 @@
+/*--------------------------------------------------------------------------
+ * tablesample.h
+ * Header file for builtin table sampling methods.
+ *
+ * Copyright (c) 2006-2014, PostgreSQL Global Development Group
+ *
+ * src/include/utils/tablesample.h
+ *--------------------------------------------------------------------------
+ */
+#ifndef TABLESAMPLE_H
+#define TABLESAMPLE_H
+
+extern Datum tsm_system_init(PG_FUNCTION_ARGS);
+extern Datum tsm_system_nextblock(PG_FUNCTION_ARGS);
+extern Datum tsm_system_nexttuple(PG_FUNCTION_ARGS);
+extern Datum tsm_system_end(PG_FUNCTION_ARGS);
+extern Datum tsm_system_reset(PG_FUNCTION_ARGS);
+extern Datum tsm_system_cost(PG_FUNCTION_ARGS);
+
+extern Datum tsm_bernoulli_init(PG_FUNCTION_ARGS);
+extern Datum tsm_bernoulli_nextblock(PG_FUNCTION_ARGS);
+extern Datum tsm_bernoulli_nexttuple(PG_FUNCTION_ARGS);
+extern Datum tsm_bernoulli_end(PG_FUNCTION_ARGS);
+extern Datum tsm_bernoulli_reset(PG_FUNCTION_ARGS);
+extern Datum tsm_bernoulli_cost(PG_FUNCTION_ARGS);
+
+#endif /* TABLESAMPLE_H */
diff --git a/src/test/regress/expected/sanity_check.out b/src/test/regress/expected/sanity_check.out
index c7be273..5946edf 100644
--- a/src/test/regress/expected/sanity_check.out
+++ b/src/test/regress/expected/sanity_check.out
@@ -127,6 +127,7 @@ pg_shdepend|t
pg_shdescription|t
pg_shseclabel|t
pg_statistic|t
+pg_tablesample_method|t
pg_tablespace|t
pg_trigger|t
pg_ts_config|t
diff --git a/src/test/regress/expected/tablesample.out b/src/test/regress/expected/tablesample.out
new file mode 100644
index 0000000..9b387a2
--- /dev/null
+++ b/src/test/regress/expected/tablesample.out
@@ -0,0 +1,165 @@
+CREATE TABLE test_tablesample (id INT, name text) WITH (fillfactor=10); -- force smaller pages so we don't have to load too much data to get multiple pages
+INSERT INTO test_tablesample SELECT i, repeat(i::text, 200) FROM generate_series(0, 9) s(i) ORDER BY i;
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50) REPEATABLE (10);
+ id
+----
+ 0
+ 1
+ 2
+ 3
+ 4
+ 5
+ 9
+(7 rows)
+
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (100.0/11) REPEATABLE (9999);
+ id
+----
+ 6
+ 7
+ 8
+(3 rows)
+
+SELECT count(*) FROM test_tablesample TABLESAMPLE SYSTEM (100);
+ count
+-------
+ 10
+(1 row)
+
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50) REPEATABLE (100);
+ id
+----
+ 0
+ 1
+ 2
+ 6
+ 7
+ 8
+ 9
+(7 rows)
+
+SELECT id FROM test_tablesample TABLESAMPLE BERNOULLI (50) REPEATABLE (100);
+ id
+----
+ 0
+ 1
+ 3
+ 4
+ 5
+(5 rows)
+
+SELECT id FROM test_tablesample TABLESAMPLE BERNOULLI (5.5) REPEATABLE (1);
+ id
+----
+ 0
+ 5
+(2 rows)
+
+CREATE VIEW test_tablesample_v1 AS SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (10*2) REPEATABLE (2);
+CREATE VIEW test_tablesample_v2 AS SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (99);
+SELECT pg_get_viewdef('test_tablesample_v1'::regclass);
+ pg_get_viewdef
+--------------------------------------------------------------------------------
+ SELECT test_tablesample.id +
+ FROM test_tablesample TABLESAMPLE system (((10 * 2))::real) REPEATABLE (2);
+(1 row)
+
+SELECT pg_get_viewdef('test_tablesample_v2'::regclass);
+ pg_get_viewdef
+-----------------------------------------------------------
+ SELECT test_tablesample.id +
+ FROM test_tablesample TABLESAMPLE system ((99)::real);
+(1 row)
+
+BEGIN;
+DECLARE tablesample_cur CURSOR FOR SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50) REPEATABLE (100);
+FETCH FIRST FROM tablesample_cur;
+ id
+----
+ 0
+(1 row)
+
+FETCH NEXT FROM tablesample_cur;
+ id
+----
+ 1
+(1 row)
+
+FETCH NEXT FROM tablesample_cur;
+ id
+----
+ 2
+(1 row)
+
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50) REPEATABLE (10);
+ id
+----
+ 0
+ 1
+ 2
+ 3
+ 4
+ 5
+ 9
+(7 rows)
+
+FETCH NEXT FROM tablesample_cur;
+ id
+----
+ 6
+(1 row)
+
+FETCH NEXT FROM tablesample_cur;
+ id
+----
+ 7
+(1 row)
+
+FETCH NEXT FROM tablesample_cur;
+ id
+----
+ 8
+(1 row)
+
+FETCH FIRST FROM tablesample_cur;
+ id
+----
+ 0
+(1 row)
+
+FETCH NEXT FROM tablesample_cur;
+ id
+----
+ 1
+(1 row)
+
+FETCH NEXT FROM tablesample_cur;
+ id
+----
+ 2
+(1 row)
+
+FETCH NEXT FROM tablesample_cur;
+ id
+----
+ 6
+(1 row)
+
+FETCH NEXT FROM tablesample_cur;
+ id
+----
+ 7
+(1 row)
+
+FETCH NEXT FROM tablesample_cur;
+ id
+----
+ 8
+(1 row)
+
+CLOSE tablesample_cur;
+END;
+DROP TABLE test_tablesample CASCADE;
+NOTICE: drop cascades to 2 other objects
+DETAIL: drop cascades to view test_tablesample_v1
+drop cascades to view test_tablesample_v2
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index e0ae2f2..e0240ac 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -83,7 +83,7 @@ test: select_into select_distinct select_distinct_on select_implicit select_havi
# ----------
# Another group of parallel tests
# ----------
-test: brin gin gist spgist privileges security_label collate matview lock replica_identity rowsecurity object_address
+test: brin gin gist spgist privileges security_label collate matview lock replica_identity rowsecurity object_address tablesample
# ----------
# Another group of parallel tests
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 7f762bd..9a7611b 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -152,3 +152,4 @@ test: with
test: xml
test: event_trigger
test: stats
+test: tablesample
diff --git a/src/test/regress/sql/tablesample.sql b/src/test/regress/sql/tablesample.sql
new file mode 100644
index 0000000..2b89b55
--- /dev/null
+++ b/src/test/regress/sql/tablesample.sql
@@ -0,0 +1,39 @@
+CREATE TABLE test_tablesample (id INT, name text) WITH (fillfactor=10); -- force smaller pages so we don't have to load too much data to get multiple pages
+
+INSERT INTO test_tablesample SELECT i, repeat(i::text, 200) FROM generate_series(0, 9) s(i) ORDER BY i;
+
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50) REPEATABLE (10);
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (100.0/11) REPEATABLE (9999);
+SELECT count(*) FROM test_tablesample TABLESAMPLE SYSTEM (100);
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50) REPEATABLE (100);
+SELECT id FROM test_tablesample TABLESAMPLE BERNOULLI (50) REPEATABLE (100);
+SELECT id FROM test_tablesample TABLESAMPLE BERNOULLI (5.5) REPEATABLE (1);
+
+CREATE VIEW test_tablesample_v1 AS SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (10*2) REPEATABLE (2);
+CREATE VIEW test_tablesample_v2 AS SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (99);
+SELECT pg_get_viewdef('test_tablesample_v1'::regclass);
+SELECT pg_get_viewdef('test_tablesample_v2'::regclass);
+
+BEGIN;
+DECLARE tablesample_cur CURSOR FOR SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50) REPEATABLE (100);
+FETCH FIRST FROM tablesample_cur;
+FETCH NEXT FROM tablesample_cur;
+FETCH NEXT FROM tablesample_cur;
+
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50) REPEATABLE (10);
+
+FETCH NEXT FROM tablesample_cur;
+FETCH NEXT FROM tablesample_cur;
+FETCH NEXT FROM tablesample_cur;
+
+FETCH FIRST FROM tablesample_cur;
+FETCH NEXT FROM tablesample_cur;
+FETCH NEXT FROM tablesample_cur;
+FETCH NEXT FROM tablesample_cur;
+FETCH NEXT FROM tablesample_cur;
+FETCH NEXT FROM tablesample_cur;
+
+CLOSE tablesample_cur;
+END;
+
+DROP TABLE test_tablesample CASCADE;
--
1.9.1
0003-tablesample-ddl-v4.patchtext/x-diff; name=0003-tablesample-ddl-v4.patchDownload
>From f04d4842d22e18b79a306fa3693bad02106cde2b Mon Sep 17 00:00:00 2001
From: Petr Jelinek <pjmodos@pjmodos.net>
Date: Wed, 7 Jan 2015 23:51:44 +0100
Subject: [PATCH 3/3] tablesample-ddl v4
---
doc/src/sgml/ref/allfiles.sgml | 2 +
doc/src/sgml/ref/create_tablesamplemethod.sgml | 173 +++++++++
doc/src/sgml/ref/drop_tablesamplemethod.sgml | 87 +++++
doc/src/sgml/reference.sgml | 2 +
src/backend/catalog/dependency.c | 15 +-
src/backend/catalog/objectaddress.c | 65 +++-
src/backend/commands/Makefile | 6 +-
src/backend/commands/dropcmds.c | 4 +
src/backend/commands/event_trigger.c | 3 +
src/backend/commands/tablecmds.c | 1 +
src/backend/commands/tablesample.c | 422 +++++++++++++++++++++
src/backend/parser/gram.y | 14 +-
src/backend/tcop/utility.c | 12 +
src/bin/pg_dump/common.c | 5 +
src/bin/pg_dump/pg_dump.c | 171 +++++++++
src/bin/pg_dump/pg_dump.h | 10 +-
src/bin/pg_dump/pg_dump_sort.c | 11 +-
src/include/catalog/dependency.h | 1 +
src/include/catalog/pg_tablesample_method.h | 11 +
src/include/nodes/parsenodes.h | 1 +
src/include/parser/kwlist.h | 1 +
src/test/modules/Makefile | 3 +-
src/test/modules/tablesample/.gitignore | 4 +
src/test/modules/tablesample/Makefile | 21 +
.../modules/tablesample/expected/tablesample.out | 38 ++
src/test/modules/tablesample/sql/tablesample.sql | 14 +
src/test/modules/tablesample/tsm_test--1.0.sql | 51 +++
src/test/modules/tablesample/tsm_test.c | 228 +++++++++++
src/test/modules/tablesample/tsm_test.control | 5 +
29 files changed, 1370 insertions(+), 11 deletions(-)
create mode 100644 doc/src/sgml/ref/create_tablesamplemethod.sgml
create mode 100644 doc/src/sgml/ref/drop_tablesamplemethod.sgml
create mode 100644 src/backend/commands/tablesample.c
create mode 100644 src/test/modules/tablesample/.gitignore
create mode 100644 src/test/modules/tablesample/Makefile
create mode 100644 src/test/modules/tablesample/expected/tablesample.out
create mode 100644 src/test/modules/tablesample/sql/tablesample.sql
create mode 100644 src/test/modules/tablesample/tsm_test--1.0.sql
create mode 100644 src/test/modules/tablesample/tsm_test.c
create mode 100644 src/test/modules/tablesample/tsm_test.control
diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index 7aa3128..2fad084 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -78,6 +78,7 @@ Complete list of usable sgml source files in this directory.
<!ENTITY createServer SYSTEM "create_server.sgml">
<!ENTITY createTable SYSTEM "create_table.sgml">
<!ENTITY createTableAs SYSTEM "create_table_as.sgml">
+<!ENTITY createTablesampleMethod SYSTEM "create_tablesamplemethod.sgml">
<!ENTITY createTableSpace SYSTEM "create_tablespace.sgml">
<!ENTITY createTrigger SYSTEM "create_trigger.sgml">
<!ENTITY createTSConfig SYSTEM "create_tsconfig.sgml">
@@ -119,6 +120,7 @@ Complete list of usable sgml source files in this directory.
<!ENTITY dropSequence SYSTEM "drop_sequence.sgml">
<!ENTITY dropServer SYSTEM "drop_server.sgml">
<!ENTITY dropTable SYSTEM "drop_table.sgml">
+<!ENTITY dropTablesampleMethod SYSTEM "drop_tablesamplemethod.sgml">
<!ENTITY dropTableSpace SYSTEM "drop_tablespace.sgml">
<!ENTITY dropTrigger SYSTEM "drop_trigger.sgml">
<!ENTITY dropTSConfig SYSTEM "drop_tsconfig.sgml">
diff --git a/doc/src/sgml/ref/create_tablesamplemethod.sgml b/doc/src/sgml/ref/create_tablesamplemethod.sgml
new file mode 100644
index 0000000..62a8ce4
--- /dev/null
+++ b/doc/src/sgml/ref/create_tablesamplemethod.sgml
@@ -0,0 +1,173 @@
+<!--
+doc/src/sgml/ref/create_tablesamplemethod.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="SQL-CREATETABLESAMPLEMETHOD">
+ <indexterm zone="sql-createtablesamplemethod">
+ <primary>CREATE TABLESAMPLE METHOD</primary>
+ </indexterm>
+
+ <refmeta>
+ <refentrytitle>CREATE TABLESAMPLE METHOD</refentrytitle>
+ <manvolnum>7</manvolnum>
+ <refmiscinfo>SQL - Language Statements</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+ <refname>CREATE TABLESAMPLE METHOD</refname>
+ <refpurpose>define custom tablesample method</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+<synopsis>
+CREATE TABLESAMPLE METHOD <replaceable class="parameter">name</replaceable> (
+ INIT = <replaceable class="parameter">init_function</replaceable> ,
+ NEXTBLOCK = <replaceable class="parameter">nextblock_function</replaceable> ,
+ NEXTTUPLE = <replaceable class="parameter">nexttuple_function</replaceable> ,
+ END = <replaceable class="parameter">end_function</replaceable> ,
+ RESET = <replaceable class="parameter">reset_function</replaceable> ,
+ COST = <replaceable class="parameter">cost_function</replaceable>
+ [ , EXAMINETUPLE = <replaceable class="parameter">examinetuple_function</replaceable> ]
+ [ , SEQSCAN = <replaceable class="parameter">seqscan</replaceable> ]
+)
+</synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+ <title>Description</title>
+
+ <para>
+ <command>CREATE TABLESAMPLE METHOD</command> creates a tablesample method.
+ A tablesample method provides alrorithm for reading sample part of a table
+ when used in <command>TABLESAMPLE</> clause of a <command>SELECT</>
+ statement.
+ </para>
+
+ <para>
+ You must be a superuser to use <command>CREATE TABLESAMPLE METHOD</command>.
+ </para>
+ </refsect1>
+
+ <refsect1>
+ <title>Parameters</title>
+
+ <variablelist>
+ <varlistentry>
+ <term><replaceable class="parameter">name</replaceable></term>
+ <listitem>
+ <para>
+ The name of the tablesample method to be created. This name must be
+ unique within the database.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable class="parameter">init_function</replaceable></term>
+ <listitem>
+ <para>
+ The name of the init function for the tablesample method.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable class="parameter">nextblock_function</replaceable></term>
+ <listitem>
+ <para>
+ The name of the get-next-block function for the tablesample method.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable class="parameter">nexttuple_function</replaceable></term>
+ <listitem>
+ <para>
+ The name of the get-next-tuple function for the tablesample method.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable class="parameter">end_function</replaceable></term>
+ <listitem>
+ <para>
+ The name of the end function for the tablesample method.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable class="parameter">reset_function</replaceable></term>
+ <listitem>
+ <para>
+ The name of the reset function for the tablesample method.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable class="parameter">cost_function</replaceable></term>
+ <listitem>
+ <para>
+ The name of the costing function for the tablesample method.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable class="parameter">examinetuple_function</replaceable></term>
+ <listitem>
+ <para>
+ The name of the function for inspecting the tuple contents in order
+ to make decision if it should be returned or not. This parameter
+ is optional.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable class="parameter">seqscan</replaceable></term>
+ <listitem>
+ <para>
+ True if the sampling method will do sequential scan of the whole table.
+ Used for cost estimation and syncscan. The default value if not specified
+ is False.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ </variablelist>
+
+ <para>
+ The function names can be schema-qualified if necessary. Argument types
+ are not given, since the argument list for each type of function is
+ predetermined. All functions are required.
+ </para>
+
+ <para>
+ The arguments can appear in any order, not only the one shown above.
+ </para>
+ </refsect1>
+
+ <refsect1>
+ <title>Compatibility</title>
+
+ <para>
+ There is no
+ <command>CREATE TABLESAMPLE METHOD</command> statement in the SQL
+ standard.
+ </para>
+ </refsect1>
+
+ <refsect1>
+ <title>See Also</title>
+
+ <simplelist type="inline">
+ <member><xref linkend="sql-droptablesamplemethod"></member>
+ <member><xref linkend="sql-select"></member>
+ </simplelist>
+ </refsect1>
+</refentry>
diff --git a/doc/src/sgml/ref/drop_tablesamplemethod.sgml b/doc/src/sgml/ref/drop_tablesamplemethod.sgml
new file mode 100644
index 0000000..dffd2ec
--- /dev/null
+++ b/doc/src/sgml/ref/drop_tablesamplemethod.sgml
@@ -0,0 +1,87 @@
+<!--
+doc/src/sgml/ref/drop_tablesamplemethod.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="SQL-DROPTABLESAMPLEMETHOD">
+ <indexterm zone="sql-droptablesamplemethod">
+ <primary>DROP TABLESAMPLE METHOD</primary>
+ </indexterm>
+
+ <refmeta>
+ <refentrytitle>DROP TABLESAMPLE METHOD</refentrytitle>
+ <manvolnum>7</manvolnum>
+ <refmiscinfo>SQL - Language Statements</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+ <refname>DROP TABLESAMPLE METHOD</refname>
+ <refpurpose>remove a custom tablesample method</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+<synopsis>
+DROP TABLESAMPLE METHOD [ IF EXISTS ] <replaceable class="parameter">name</replaceable>
+</synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+ <title>Description</title>
+
+ <para>
+ <command>DROP TABLESAMPLE METHOD</command> drop an existing tablesample
+ method.
+ </para>
+
+ <para>
+ You must be a superuser to use <command>CREATE TABLESAMPLE METHOD</command>.
+ </para>
+ </refsect1>
+
+ <refsect1>
+ <title>Parameters</title>
+
+ <variablelist>
+
+ <varlistentry>
+ <term><literal>IF EXISTS</literal></term>
+ <listitem>
+ <para>
+ Do not throw an error if the tablesample method does not exist.
+ A notice is issued in this case.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable class="parameter">name</replaceable></term>
+ <listitem>
+ <para>
+ The name of an existing tablesample method to be removed.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ </variablelist>
+
+ </refsect1>
+
+ <refsect1>
+ <title>Compatibility</title>
+
+ <para>
+ There is no
+ <command>DROP TABLESAMPLE METHOD</command> statement in the SQL
+ standard.
+ </para>
+ </refsect1>
+
+ <refsect1>
+ <title>See Also</title>
+
+ <simplelist type="inline">
+ <member><xref linkend="sql-createtablesamplemethod"></member>
+ <member><xref linkend="sql-select"></member>
+ </simplelist>
+ </refsect1>
+</refentry>
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index 10c9a6d..2c09a3c 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -106,6 +106,7 @@
&createServer;
&createTable;
&createTableAs;
+ &createTablesampleMethod;
&createTableSpace;
&createTSConfig;
&createTSDictionary;
@@ -147,6 +148,7 @@
&dropSequence;
&dropServer;
&dropTable;
+ &dropTablesampleMethod;
&dropTableSpace;
&dropTSConfig;
&dropTSDictionary;
diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index bacb242..6acb5b3 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -46,6 +46,7 @@
#include "catalog/pg_policy.h"
#include "catalog/pg_proc.h"
#include "catalog/pg_rewrite.h"
+#include "catalog/pg_tablesample_method.h"
#include "catalog/pg_tablespace.h"
#include "catalog/pg_trigger.h"
#include "catalog/pg_ts_config.h"
@@ -157,7 +158,8 @@ static const Oid object_classes[MAX_OCLASS] = {
DefaultAclRelationId, /* OCLASS_DEFACL */
ExtensionRelationId, /* OCLASS_EXTENSION */
EventTriggerRelationId, /* OCLASS_EVENT_TRIGGER */
- PolicyRelationId /* OCLASS_POLICY */
+ PolicyRelationId, /* OCLASS_POLICY */
+ TableSampleMethodRelationId /* OCLASS_TABLESAMPLEMETHOD */
};
@@ -1265,6 +1267,10 @@ doDeletion(const ObjectAddress *object, int flags)
RemovePolicyById(object->objectId);
break;
+ case OCLASS_TABLESAMPLEMETHOD:
+ RemoveTablesampleMethodById(object->objectId);
+ break;
+
default:
elog(ERROR, "unrecognized object class: %u",
object->classId);
@@ -1794,6 +1800,10 @@ find_expr_references_walker(Node *node,
case RTE_RELATION:
add_object_address(OCLASS_CLASS, rte->relid, 0,
context->addrs);
+ if (rte->tablesample)
+ add_object_address(OCLASS_TABLESAMPLEMETHOD,
+ rte->tablesample->tsmid, 0,
+ context->addrs);
break;
default:
break;
@@ -2373,6 +2383,9 @@ getObjectClass(const ObjectAddress *object)
case PolicyRelationId:
return OCLASS_POLICY;
+
+ case TableSampleMethodRelationId:
+ return OCLASS_TABLESAMPLEMETHOD;
}
/* shouldn't get here */
diff --git a/src/backend/catalog/objectaddress.c b/src/backend/catalog/objectaddress.c
index 825d8b2..02edc0a 100644
--- a/src/backend/catalog/objectaddress.c
+++ b/src/backend/catalog/objectaddress.c
@@ -44,6 +44,7 @@
#include "catalog/pg_proc.h"
#include "catalog/pg_policy.h"
#include "catalog/pg_rewrite.h"
+#include "catalog/pg_tablesample_method.h"
#include "catalog/pg_tablespace.h"
#include "catalog/pg_trigger.h"
#include "catalog/pg_ts_config.h"
@@ -429,7 +430,19 @@ static const ObjectPropertyType ObjectProperty[] =
Anum_pg_type_typacl,
ACL_KIND_TYPE,
true
- }
+ },
+ {
+ TableSampleMethodRelationId,
+ TableSampleMethodOidIndexId,
+ TABLESAMPLEMETHODOID,
+ TABLESAMPLEMETHODNAME,
+ Anum_pg_tablesample_method_tsmname,
+ InvalidAttrNumber,
+ InvalidAttrNumber,
+ InvalidAttrNumber,
+ -1,
+ true
+ },
};
/*
@@ -528,7 +541,9 @@ ObjectTypeMap[] =
/* OCLASS_EVENT_TRIGGER */
{ "event trigger", OBJECT_EVENT_TRIGGER },
/* OCLASS_POLICY */
- { "policy", OBJECT_POLICY }
+ { "policy", OBJECT_POLICY },
+ /* OCLASS_TABLESAMPLEMETHOD */
+ { "tablesample method", OBJECT_TABLESAMPLEMETHOD }
};
@@ -670,6 +685,7 @@ get_object_address(ObjectType objtype, List *objname, List *objargs,
case OBJECT_FDW:
case OBJECT_FOREIGN_SERVER:
case OBJECT_EVENT_TRIGGER:
+ case OBJECT_TABLESAMPLEMETHOD:
address = get_object_address_unqualified(objtype,
objname, missing_ok);
break;
@@ -896,6 +912,9 @@ get_object_address_unqualified(ObjectType objtype,
case OBJECT_EVENT_TRIGGER:
msg = gettext_noop("event trigger name cannot be qualified");
break;
+ case OBJECT_TABLESAMPLEMETHOD:
+ msg = gettext_noop("tablesample method name cannot be qualified");
+ break;
default:
elog(ERROR, "unrecognized objtype: %d", (int) objtype);
msg = NULL; /* placate compiler */
@@ -956,6 +975,11 @@ get_object_address_unqualified(ObjectType objtype,
address.objectId = get_event_trigger_oid(name, missing_ok);
address.objectSubId = 0;
break;
+ case OBJECT_TABLESAMPLEMETHOD:
+ address.classId = TableSampleMethodRelationId;
+ address.objectId = get_tablesample_method_oid(name, missing_ok);
+ address.objectSubId = 0;
+ break;
default:
elog(ERROR, "unrecognized objtype: %d", (int) objtype);
/* placate compiler, which doesn't know elog won't return */
@@ -1720,6 +1744,7 @@ check_object_ownership(Oid roleid, ObjectType objtype, ObjectAddress address,
break;
case OBJECT_TSPARSER:
case OBJECT_TSTEMPLATE:
+ case OBJECT_TABLESAMPLEMETHOD:
/* We treat these object types as being owned by superusers */
if (!superuser_arg(roleid))
ereport(ERROR,
@@ -2654,6 +2679,21 @@ getObjectDescription(const ObjectAddress *object)
break;
}
+ case OCLASS_TABLESAMPLEMETHOD:
+ {
+ HeapTuple tup;
+
+ tup = SearchSysCache1(TABLESAMPLEMETHODOID,
+ ObjectIdGetDatum(object->objectId));
+ if (!HeapTupleIsValid(tup))
+ elog(ERROR, "cache lookup failed for tablesample method %u",
+ object->objectId);
+ appendStringInfo(&buffer, _("tablesample method %s"),
+ NameStr(((Form_pg_tablesample_method) GETSTRUCT(tup))->tsmname));
+ ReleaseSysCache(tup);
+ break;
+ }
+
default:
appendStringInfo(&buffer, "unrecognized object %u %u %d",
object->classId,
@@ -3131,6 +3171,10 @@ getObjectTypeDescription(const ObjectAddress *object)
appendStringInfoString(&buffer, "policy");
break;
+ case OCLASS_TABLESAMPLEMETHOD:
+ appendStringInfoString(&buffer, "tablesample method");
+ break;
+
default:
appendStringInfo(&buffer, "unrecognized %u", object->classId);
break;
@@ -4025,6 +4069,23 @@ getObjectIdentityParts(const ObjectAddress *object,
break;
}
+ case OCLASS_TABLESAMPLEMETHOD:
+ {
+ HeapTuple tup;
+ Form_pg_tablesample_method tsmForm;
+
+ tup = SearchSysCache1(TABLESAMPLEMETHODOID,
+ ObjectIdGetDatum(object->objectId));
+ if (!HeapTupleIsValid(tup))
+ elog(ERROR, "cache lookup failed for tablesample method %u",
+ object->objectId);
+ tsmForm = (Form_pg_tablesample_method) GETSTRUCT(tup);
+ appendStringInfoString(&buffer,
+ quote_identifier(NameStr(tsmForm->tsmname)));
+ ReleaseSysCache(tup);
+ break;
+ }
+
default:
appendStringInfo(&buffer, "unrecognized object %u %u %d",
object->classId,
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index b1ac704..04fcd8c 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -18,8 +18,8 @@ OBJS = aggregatecmds.o alter.o analyze.o async.o cluster.o comment.o \
event_trigger.o explain.o extension.o foreigncmds.o functioncmds.o \
indexcmds.o lockcmds.o matview.o operatorcmds.o opclasscmds.o \
policy.o portalcmds.o prepare.o proclang.o \
- schemacmds.o seclabel.o sequence.o tablecmds.o tablespace.o trigger.o \
- tsearchcmds.o typecmds.o user.o vacuum.o vacuumlazy.o \
- variable.o view.o
+ schemacmds.o seclabel.o sequence.o tablecmds.o tablesample.o \
+ tablespace.o trigger.o tsearchcmds.o typecmds.o user.o vacuum.o \
+ vacuumlazy.o variable.o view.o
include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/commands/dropcmds.c b/src/backend/commands/dropcmds.c
index e5185ba..04d29a2 100644
--- a/src/backend/commands/dropcmds.c
+++ b/src/backend/commands/dropcmds.c
@@ -421,6 +421,10 @@ does_not_exist_skipping(ObjectType objtype, List *objname, List *objargs)
args = strVal(linitial(objargs));
}
break;
+ case OBJECT_TABLESAMPLEMETHOD:
+ msg = gettext_noop("tablesample method \"%s\" does not exist, skipping");
+ name = NameListToString(objname);
+ break;
default:
elog(ERROR, "unexpected object type (%d)", (int) objtype);
break;
diff --git a/src/backend/commands/event_trigger.c b/src/backend/commands/event_trigger.c
index a33a5ad..f20e9f7 100644
--- a/src/backend/commands/event_trigger.c
+++ b/src/backend/commands/event_trigger.c
@@ -97,6 +97,7 @@ static event_trigger_support_data event_trigger_support[] = {
{"SEQUENCE", true},
{"SERVER", true},
{"TABLE", true},
+ {"TABLESAMPLE METHOD", true},
{"TABLESPACE", false},
{"TRIGGER", true},
{"TEXT SEARCH CONFIGURATION", true},
@@ -1078,6 +1079,7 @@ EventTriggerSupportsObjectType(ObjectType obtype)
case OBJECT_SEQUENCE:
case OBJECT_TABCONSTRAINT:
case OBJECT_TABLE:
+ case OBJECT_TABLESAMPLEMETHOD:
case OBJECT_TRIGGER:
case OBJECT_TSCONFIGURATION:
case OBJECT_TSDICTIONARY:
@@ -1134,6 +1136,7 @@ EventTriggerSupportsObjectClass(ObjectClass objclass)
case OCLASS_DEFACL:
case OCLASS_EXTENSION:
case OCLASS_POLICY:
+ case OCLASS_TABLESAMPLEMETHOD:
return true;
case MAX_OCLASS:
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 66d5083..b67c560 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -8059,6 +8059,7 @@ ATExecAlterColumnType(AlteredTableInfo *tab, Relation rel,
case OCLASS_USER_MAPPING:
case OCLASS_DEFACL:
case OCLASS_EXTENSION:
+ case OCLASS_TABLESAMPLEMETHOD:
/*
* We don't expect any of these sorts of objects to depend on
diff --git a/src/backend/commands/tablesample.c b/src/backend/commands/tablesample.c
new file mode 100644
index 0000000..ed8102d
--- /dev/null
+++ b/src/backend/commands/tablesample.c
@@ -0,0 +1,422 @@
+/*-------------------------------------------------------------------------
+ *
+ * tablesample.c
+ * Commands to manipulate tablesample methods
+ *
+ * Table sampling methods provide algorithms for doing sample scan over
+ * the table.
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ * src/backend/commands/tablesample.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <unistd.h>
+#include <dirent.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "catalog/dependency.h"
+#include "catalog/indexing.h"
+#include "catalog/objectaccess.h"
+#include "catalog/pg_proc.h"
+#include "catalog/pg_tablesample_method.h"
+#include "catalog/pg_type.h"
+#include "commands/defrem.h"
+#include "miscadmin.h"
+#include "parser/parse_func.h"
+#include "utils/builtins.h"
+#include "utils/rel.h"
+#include "utils/lsyscache.h"
+#include "utils/syscache.h"
+
+
+static Datum
+get_tablesample_method_func(DefElem *defel, int attnum)
+{
+ List *funcName = defGetQualifiedName(defel);
+ /* Big enough size for our needs. */
+ Oid *typeId = palloc0(7 * sizeof(Oid));
+ Oid retTypeId;
+ int nargs;
+ Oid procOid = InvalidOid;
+ FuncCandidateList clist;
+
+ switch (attnum)
+ {
+ case Anum_pg_tablesample_method_tsminit:
+ /*
+ * tsminit needs special handling because it is defined as function
+ * with 3 or more arguments and only first two arguments must have
+ * specific type, the rest is up to the tablesample method creator.
+ */
+ {
+ nargs = 2;
+ typeId[0] = INTERNALOID;
+ typeId[1] = INT4OID;
+ retTypeId = VOIDOID;
+
+ clist = FuncnameGetCandidates(funcName, -1, NIL, false, false, false);
+
+ while (clist)
+ {
+ if (clist->nargs >= 3 &&
+ memcmp(typeId, clist->args, nargs * sizeof(Oid)) == 0)
+ {
+ procOid = clist->oid;
+ /* Save real function signature for future errors. */
+ nargs = clist->nargs;
+ pfree(typeId);
+ typeId = clist->args;
+ break;
+ }
+ clist = clist->next;
+ }
+
+ if (!OidIsValid(procOid))
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_FUNCTION),
+ errmsg("function \"%s\" does not exist or does not have valid signature",
+ NameListToString(funcName)),
+ errhint("The tamplesample method init function "
+ "must have at least 3 input parameters "
+ "with first one of type INTERNAL and second of type INTEGER.")));
+ }
+ break;
+
+ case Anum_pg_tablesample_method_tsmnextblock:
+ nargs = 1;
+ typeId[0] = INTERNALOID;
+ retTypeId = INT4OID;
+ break;
+ case Anum_pg_tablesample_method_tsmnexttuple:
+ nargs = 3;
+ typeId[0] = INTERNALOID;
+ typeId[1] = INT4OID;
+ typeId[2] = INT2OID;
+ retTypeId = INT2OID;
+ break;
+ case Anum_pg_tablesample_method_tsmexaminetuple:
+ nargs = 4;
+ typeId[0] = INTERNALOID;
+ typeId[1] = INT4OID;
+ typeId[2] = INTERNALOID;
+ typeId[3] = BOOLOID;
+ retTypeId = BOOLOID;
+ break;
+ case Anum_pg_tablesample_method_tsmend:
+ case Anum_pg_tablesample_method_tsmreset:
+ nargs = 1;
+ typeId[0] = INTERNALOID;
+ retTypeId = VOIDOID;
+ break;
+ case Anum_pg_tablesample_method_tsmcost:
+ nargs = 7;
+ typeId[0] = INTERNALOID;
+ typeId[1] = INTERNALOID;
+ typeId[2] = INTERNALOID;
+ typeId[3] = INTERNALOID;
+ typeId[4] = INTERNALOID;
+ typeId[5] = INTERNALOID;
+ typeId[6] = INTERNALOID;
+ retTypeId = VOIDOID;
+ break;
+ default:
+ /* should not be here */
+ elog(ERROR, "unrecognized attribute for tablesample method: %d",
+ attnum);
+ nargs = 0; /* keep compiler quiet */
+ }
+
+ if (!OidIsValid(procOid))
+ procOid = LookupFuncName(funcName, nargs, typeId, false);
+ if (get_func_rettype(procOid) != retTypeId)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("function %s should return type %s",
+ func_signature_string(funcName, nargs, NIL, typeId),
+ format_type_be(retTypeId))));
+
+ return ObjectIdGetDatum(procOid);
+}
+
+/*
+ * make pg_depend entries for a new pg_tablesample_method entry
+ */
+static void
+makeTablesampleMethodDeps(HeapTuple tuple)
+{
+ Form_pg_tablesample_method tsm = (Form_pg_tablesample_method) GETSTRUCT(tuple);
+ ObjectAddress myself,
+ referenced;
+
+ myself.classId = TableSampleMethodRelationId;
+ myself.objectId = HeapTupleGetOid(tuple);
+ myself.objectSubId = 0;
+
+ /* dependency on extension */
+ recordDependencyOnCurrentExtension(&myself, false);
+
+ /* dependencies on functions */
+ referenced.classId = ProcedureRelationId;
+ referenced.objectSubId = 0;
+
+ referenced.objectId = tsm->tsminit;
+ recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);
+
+ referenced.objectId = tsm->tsmnextblock;
+ recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);
+
+ referenced.objectId = tsm->tsmnexttuple;
+ recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);
+
+ if (OidIsValid(tsm->tsmexaminetuple))
+ {
+ referenced.objectId = tsm->tsmexaminetuple;
+ recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);
+ }
+
+ referenced.objectId = tsm->tsmend;
+ recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);
+
+ referenced.objectId = tsm->tsmreset;
+ recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);
+
+ referenced.objectId = tsm->tsmcost;
+ recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);
+}
+
+/*
+ * Create a table sampling method
+ *
+ * Only superusers can create a table sampling methods.
+ */
+Oid
+DefineTablesampleMethod(List *names, List *parameters)
+{
+ char *tsmname = strVal(linitial(names));
+ Oid tsmoid;
+ ListCell *pl;
+ Relation rel;
+ Datum values[Natts_pg_tablesample_method];
+ bool nulls[Natts_pg_tablesample_method];
+ HeapTuple tuple;
+
+ /* Must be super user. */
+ if (!superuser())
+ ereport(ERROR,
+ (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+ errmsg("permission denied to create tablesample method \"%s\"",
+ tsmname),
+ errhint("Must be superuser to create a tablesample method.")));
+
+ /* Must not already exist. */
+ tsmoid = get_tablesample_method_oid(tsmname, true);
+ if (OidIsValid(tsmoid))
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_OBJECT),
+ errmsg("tablesample method \"%s\" already exists",
+ tsmname)));
+
+ /* Initialize the values. */
+ memset(values, 0, sizeof(values));
+ memset(nulls, false, sizeof(nulls));
+
+ values[Anum_pg_tablesample_method_tsmname - 1] =
+ DirectFunctionCall1(namein, CStringGetDatum(tsmname));
+
+ /*
+ * loop over the definition list and extract the information we need.
+ */
+ foreach(pl, parameters)
+ {
+ DefElem *defel = (DefElem *) lfirst(pl);
+
+ if (pg_strcasecmp(defel->defname, "seqscan") == 0)
+ {
+ values[Anum_pg_tablesample_method_tsmseqscan - 1] =
+ BoolGetDatum(defGetBoolean(defel));
+ }
+ else if (pg_strcasecmp(defel->defname, "init") == 0)
+ {
+ values[Anum_pg_tablesample_method_tsminit - 1] =
+ get_tablesample_method_func(defel,
+ Anum_pg_tablesample_method_tsminit);
+ }
+ else if (pg_strcasecmp(defel->defname, "nextblock") == 0)
+ {
+ values[Anum_pg_tablesample_method_tsmnextblock - 1] =
+ get_tablesample_method_func(defel,
+ Anum_pg_tablesample_method_tsmnextblock);
+ }
+ else if (pg_strcasecmp(defel->defname, "nexttuple") == 0)
+ {
+ values[Anum_pg_tablesample_method_tsmnexttuple - 1] =
+ get_tablesample_method_func(defel,
+ Anum_pg_tablesample_method_tsmnexttuple);
+ }
+ else if (pg_strcasecmp(defel->defname, "examinetuple") == 0)
+ {
+ values[Anum_pg_tablesample_method_tsmexaminetuple - 1] =
+ get_tablesample_method_func(defel,
+ Anum_pg_tablesample_method_tsmexaminetuple);
+ }
+ else if (pg_strcasecmp(defel->defname, "end") == 0)
+ {
+ values[Anum_pg_tablesample_method_tsmend - 1] =
+ get_tablesample_method_func(defel,
+ Anum_pg_tablesample_method_tsmend);
+ }
+ else if (pg_strcasecmp(defel->defname, "reset") == 0)
+ {
+ values[Anum_pg_tablesample_method_tsmreset - 1] =
+ get_tablesample_method_func(defel,
+ Anum_pg_tablesample_method_tsmreset);
+ }
+ else if (pg_strcasecmp(defel->defname, "cost") == 0)
+ {
+ values[Anum_pg_tablesample_method_tsmcost - 1] =
+ get_tablesample_method_func(defel,
+ Anum_pg_tablesample_method_tsmcost);
+ }
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("tablesample method parameter \"%s\" not recognized",
+ defel->defname)));
+ }
+
+ /*
+ * Validation.
+ */
+ if (!OidIsValid(DatumGetObjectId(values[Anum_pg_tablesample_method_tsminit - 1])))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("tablesample method init function is required")));
+
+ if (!OidIsValid(DatumGetObjectId(values[Anum_pg_tablesample_method_tsmnextblock - 1])))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("tablesample method nextblock function is required")));
+
+ if (!OidIsValid(DatumGetObjectId(values[Anum_pg_tablesample_method_tsmnexttuple - 1])))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("tablesample method nexttuple function is required")));
+
+ if (!OidIsValid(DatumGetObjectId(values[Anum_pg_tablesample_method_tsmend - 1])))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("tablesample method end function is required")));
+
+ if (!OidIsValid(DatumGetObjectId(values[Anum_pg_tablesample_method_tsmreset - 1])))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("tablesample method reset function is required")));
+
+ if (!OidIsValid(DatumGetObjectId(values[Anum_pg_tablesample_method_tsmcost - 1])))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("tablesample method cost function is required")));
+
+ /*
+ * Insert tuple into pg_tablesample_method.
+ */
+ rel = heap_open(TableSampleMethodRelationId, RowExclusiveLock);
+
+ tuple = heap_form_tuple(rel->rd_att, values, nulls);
+
+ tsmoid = simple_heap_insert(rel, tuple);
+
+ CatalogUpdateIndexes(rel, tuple);
+
+ makeTablesampleMethodDeps(tuple);
+
+ heap_freetuple(tuple);
+
+ /* Post creation hook for new tablesample method */
+ InvokeObjectPostCreateHook(TableSampleMethodRelationId, tsmoid, 0);
+
+ heap_close(rel, RowExclusiveLock);
+
+ return tsmoid;
+}
+
+/*
+ * Drop a tablesample method.
+ */
+void
+RemoveTablesampleMethodById(Oid tsmoid)
+{
+ Relation rel;
+ HeapTuple tuple;
+ Form_pg_tablesample_method tsm;
+
+ /*
+ * Find the target tuple
+ */
+ rel = heap_open(TableSampleMethodRelationId, RowExclusiveLock);
+
+ tuple = SearchSysCache1(TABLESAMPLEMETHODOID, ObjectIdGetDatum(tsmoid));
+ if (!HeapTupleIsValid(tuple))
+ elog(ERROR, "cache lookup failed for tablesample method %u",
+ tsmoid);
+
+ tsm = (Form_pg_tablesample_method) GETSTRUCT(tuple);
+ /* Can't drop builtin tablesample methods. */
+ if (tsmoid == TABLESAMPLE_METHOD_SYSTEM_OID ||
+ tsmoid == TABLESAMPLE_METHOD_BERNOULLI_OID)
+ ereport(ERROR,
+ (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+ errmsg("permission denied for tablesample method %s",
+ NameStr(tsm->tsmname))));
+
+ /*
+ * Remove the pg_tablespace tuple (this will roll back if we fail below)
+ */
+ simple_heap_delete(rel, &tuple->t_self);
+
+ ReleaseSysCache(tuple);
+
+ heap_close(rel, RowExclusiveLock);
+}
+
+/*
+ * get_tablesample_method_oid - given a tablesample method name,
+ * look up the OID
+ *
+ * If missing_ok is false, throw an error if tablesample method name not found.
+ * If true, just return InvalidOid.
+ */
+Oid
+get_tablesample_method_oid(const char *tsmname, bool missing_ok)
+{
+ Oid result;
+ HeapTuple tuple;
+
+ tuple = SearchSysCache1(TABLESAMPLEMETHODNAME, PointerGetDatum(tsmname));
+ if (HeapTupleIsValid(tuple))
+ {
+ result = HeapTupleGetOid(tuple);
+ ReleaseSysCache(tuple);
+ }
+ else
+ result = InvalidOid;
+
+ if (!OidIsValid(result) && !missing_ok)
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_OBJECT),
+ errmsg("tablesample method \"%s\" does not exist",
+ tsmname)));
+
+ return result;
+}
+
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index ac5e095..4578b5e 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -586,7 +586,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
LEADING LEAKPROOF LEAST LEFT LEVEL LIKE LIMIT LISTEN LOAD LOCAL
LOCALTIME LOCALTIMESTAMP LOCATION LOCK_P LOCKED LOGGED
- MAPPING MATCH MATERIALIZED MAXVALUE MINUTE_P MINVALUE MODE MONTH_P MOVE
+ MAPPING MATCH MATERIALIZED MAXVALUE METHOD MINUTE_P MINVALUE MODE MONTH_P
+ MOVE
NAME_P NAMES NATIONAL NATURAL NCHAR NEXT NO NONE
NOT NOTHING NOTIFY NOTNULL NOWAIT NULL_P NULLIF
@@ -5094,6 +5095,15 @@ DefineStmt:
n->definition = list_make1(makeDefElem("from", (Node *) $5));
$$ = (Node *)n;
}
+ | CREATE TABLESAMPLE METHOD name definition
+ {
+ DefineStmt *n = makeNode(DefineStmt);
+ n->kind = OBJECT_TABLESAMPLEMETHOD;
+ n->args = NIL;
+ n->defnames = list_make1(makeString($4));
+ n->definition = $5;
+ $$ = (Node *)n;
+ }
;
definition: '(' def_list ')' { $$ = $2; }
@@ -5552,6 +5562,7 @@ drop_type: TABLE { $$ = OBJECT_TABLE; }
| TEXT_P SEARCH DICTIONARY { $$ = OBJECT_TSDICTIONARY; }
| TEXT_P SEARCH TEMPLATE { $$ = OBJECT_TSTEMPLATE; }
| TEXT_P SEARCH CONFIGURATION { $$ = OBJECT_TSCONFIGURATION; }
+ | TABLESAMPLE METHOD { $$ = OBJECT_TABLESAMPLEMETHOD; }
;
any_name_list:
@@ -13313,6 +13324,7 @@ unreserved_keyword:
| MATCH
| MATERIALIZED
| MAXVALUE
+ | METHOD
| MINUTE_P
| MINVALUE
| MODE
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 3533cfa..532256d 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -23,6 +23,7 @@
#include "access/xlog.h"
#include "catalog/catalog.h"
#include "catalog/namespace.h"
+#include "catalog/pg_tablesample_method.h"
#include "catalog/toasting.h"
#include "commands/alter.h"
#include "commands/async.h"
@@ -1106,6 +1107,11 @@ ProcessUtilitySlow(Node *parsetree,
Assert(stmt->args == NIL);
DefineCollation(stmt->defnames, stmt->definition);
break;
+ case OBJECT_TABLESAMPLEMETHOD:
+ Assert(stmt->args == NIL);
+ Assert(list_length(stmt->defnames) == 1);
+ DefineTablesampleMethod(stmt->defnames, stmt->definition);
+ break;
default:
elog(ERROR, "unrecognized define stmt type: %d",
(int) stmt->kind);
@@ -1960,6 +1966,9 @@ CreateCommandTag(Node *parsetree)
case OBJECT_POLICY:
tag = "DROP POLICY";
break;
+ case OBJECT_TABLESAMPLEMETHOD:
+ tag = "DROP TABLESAMPLE METHOD";
+ break;
default:
tag = "???";
}
@@ -2056,6 +2065,9 @@ CreateCommandTag(Node *parsetree)
case OBJECT_COLLATION:
tag = "CREATE COLLATION";
break;
+ case OBJECT_TABLESAMPLEMETHOD:
+ tag = "CREATE TABLESAMPLE METHOD";
+ break;
default:
tag = "???";
}
diff --git a/src/bin/pg_dump/common.c b/src/bin/pg_dump/common.c
index 1a0a587..8a64e4b 100644
--- a/src/bin/pg_dump/common.c
+++ b/src/bin/pg_dump/common.c
@@ -103,6 +103,7 @@ getSchemaData(Archive *fout, DumpOptions *dopt, int *numTablesPtr)
int numForeignServers;
int numDefaultACLs;
int numEventTriggers;
+ int numTSMs;
if (g_verbose)
write_msg(NULL, "reading schemas\n");
@@ -251,6 +252,10 @@ getSchemaData(Archive *fout, DumpOptions *dopt, int *numTablesPtr)
write_msg(NULL, "reading policies\n");
getPolicies(fout, tblinfo, numTables);
+ if (g_verbose)
+ write_msg(NULL, "reading tablesample methods\n");
+ getTableSampleMethods(fout, &numTSMs);
+
*numTablesPtr = numTables;
return tblinfo;
}
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 7e92b74..9f19799 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -182,6 +182,7 @@ static void dumpSequenceData(Archive *fout, TableDataInfo *tdinfo);
static void dumpIndex(Archive *fout, DumpOptions *dopt, IndxInfo *indxinfo);
static void dumpConstraint(Archive *fout, DumpOptions *dopt, ConstraintInfo *coninfo);
static void dumpTableConstraintComment(Archive *fout, DumpOptions *dopt, ConstraintInfo *coninfo);
+static void dumpTableSampleMethod(Archive *fout, DumpOptions *dopt, TSMInfo *tbinfo);
static void dumpTSParser(Archive *fout, DumpOptions *dopt, TSParserInfo *prsinfo);
static void dumpTSDictionary(Archive *fout, DumpOptions *dopt, TSDictInfo *dictinfo);
static void dumpTSTemplate(Archive *fout, DumpOptions *dopt, TSTemplateInfo *tmplinfo);
@@ -7174,6 +7175,75 @@ getTableAttrs(Archive *fout, DumpOptions *dopt, TableInfo *tblinfo, int numTable
}
/*
+ * getTableSampleMethods:
+ * read all tablesample methods in the system catalogs and return them
+ * in the TSMInfo* structure
+ *
+ * numTSMs is set to the number of tablesample methods read in
+ */
+TSMInfo *
+getTableSampleMethods(Archive *fout, int *numTSMs)
+{
+ PGresult *res;
+ int ntups;
+ int i;
+ PQExpBuffer query;
+ TSMInfo *tsminfo;
+ int i_tableoid,
+ i_oid,
+ i_tsmname,
+ i_tsmseqscan;
+
+ /* Before 9.5, there were no tablesample methods */
+ if (fout->remoteVersion < 90500)
+ {
+ *numTSMs = 0;
+ return NULL;
+ }
+
+ query = createPQExpBuffer();
+
+ appendPQExpBuffer(query,
+ "SELECT tableoid, oid, tsmname, tsmseqscan "
+ "FROM pg_catalog.pg_tablesample_method "
+ "WHERE oid >= '%u'::pg_catalog.oid",
+ FirstNormalObjectId);
+
+ res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+
+ ntups = PQntuples(res);
+ *numTSMs = ntups;
+
+ tsminfo = (TSMInfo *) pg_malloc(ntups * sizeof(TSMInfo));
+
+ i_tableoid = PQfnumber(res, "tableoid");
+ i_oid = PQfnumber(res, "oid");
+ i_tsmname = PQfnumber(res, "tsmname");
+ i_tsmseqscan = PQfnumber(res, "tsmseqscan");
+
+ for (i = 0; i < ntups; i++)
+ {
+ tsminfo[i].dobj.objType = DO_TABLESAMPLE_METHOD;
+ tsminfo[i].dobj.catId.tableoid = atooid(PQgetvalue(res, i, i_tableoid));
+ tsminfo[i].dobj.catId.oid = atooid(PQgetvalue(res, i, i_oid));
+ AssignDumpId(&tsminfo[i].dobj);
+ tsminfo[i].dobj.name = pg_strdup(PQgetvalue(res, i, i_tsmname));
+ tsminfo[i].dobj.namespace = NULL;
+ tsminfo[i].tsmseqscan = PQgetvalue(res, i, i_tsmseqscan)[0] == 't';
+
+ /* Decide whether we want to dump it */
+ selectDumpableObject(&(tsminfo[i].dobj));
+ }
+
+ PQclear(res);
+
+ destroyPQExpBuffer(query);
+
+ return tsminfo;
+}
+
+
+/*
* Test whether a column should be printed as part of table's CREATE TABLE.
* Column number is zero-based.
*
@@ -8266,6 +8336,9 @@ dumpDumpableObject(Archive *fout, DumpOptions *dopt, DumpableObject *dobj)
case DO_POLICY:
dumpPolicy(fout, dopt, (PolicyInfo *) dobj);
break;
+ case DO_TABLESAMPLE_METHOD:
+ dumpTableSampleMethod(fout, dopt, (TSMInfo *) dobj);
+ break;
case DO_PRE_DATA_BOUNDARY:
case DO_POST_DATA_BOUNDARY:
/* never dumped, nothing to do */
@@ -12266,6 +12339,103 @@ dumpAgg(Archive *fout, DumpOptions *dopt, AggInfo *agginfo)
}
/*
+ * dumpTableSampleMethod
+ * write the declaration of one user-defined tablesample method
+ */
+static void
+dumpTableSampleMethod(Archive *fout, DumpOptions *dopt, TSMInfo *tsminfo)
+{
+ PGresult *res;
+ PQExpBuffer q;
+ PQExpBuffer delq;
+ PQExpBuffer labelq;
+ PQExpBuffer query;
+ char *tsminit;
+ char *tsmnextblock;
+ char *tsmnexttuple;
+ char *tsmexaminetuple;
+ char *tsmend;
+ char *tsmreset;
+ char *tsmcost;
+
+ /* Skip if not to be dumped */
+ if (!tsminfo->dobj.dump || dopt->dataOnly)
+ return;
+
+ q = createPQExpBuffer();
+ delq = createPQExpBuffer();
+ labelq = createPQExpBuffer();
+ query = createPQExpBuffer();
+
+ /* Make sure we are in proper schema */
+ selectSourceSchema(fout, "pg_catalog");
+
+ appendPQExpBuffer(query, "SELECT tsminit, tsmnextblock, "
+ "tsmnexttuple, tsmexaminetuple, "
+ "tsmend, tsmreset, tsmcost "
+ "FROM pg_catalog.pg_tablesample_method "
+ "WHERE oid = '%u'::pg_catalog.oid",
+ tsminfo->dobj.catId.oid);
+
+ res = ExecuteSqlQueryForSingleRow(fout, query->data);
+
+ tsminit = PQgetvalue(res, 0, PQfnumber(res, "tsminit"));
+ tsmnexttuple = PQgetvalue(res, 0, PQfnumber(res, "tsmnexttuple"));
+ tsmnextblock = PQgetvalue(res, 0, PQfnumber(res, "tsmnextblock"));
+ tsmexaminetuple = PQgetvalue(res, 0, PQfnumber(res, "tsmexaminetuple"));
+ tsmend = PQgetvalue(res, 0, PQfnumber(res, "tsmend"));
+ tsmreset = PQgetvalue(res, 0, PQfnumber(res, "tsmreset"));
+ tsmcost = PQgetvalue(res, 0, PQfnumber(res, "tsmcost"));
+
+ appendPQExpBuffer(q, "CREATE TABLESAMPLE METHOD %s (\n",
+ fmtId(tsminfo->dobj.name));
+
+ appendPQExpBuffer(q, " INIT = %s,\n", tsminit);
+ appendPQExpBuffer(q, " NEXTTUPLE = %s,\n", tsmnexttuple);
+ appendPQExpBuffer(q, " NEXTBLOCK = %s,\n", tsmnextblock);
+ appendPQExpBuffer(q, " END = %s,\n", tsmend);
+ appendPQExpBuffer(q, " RESET = %s,\n", tsmreset);
+ appendPQExpBuffer(q, " COST = %s", tsmcost);
+
+ if (strcmp(tsmexaminetuple, "-") != 0)
+ appendPQExpBuffer(q, ",\n EXAMINETUPLE = %s", tsmexaminetuple);
+
+ if (tsminfo->tsmseqscan)
+ appendPQExpBufferStr(q, ",\n SEQSCAN = true");
+
+ appendPQExpBufferStr(q, "\n);");
+
+ appendPQExpBuffer(delq, "DROP TABLESAMPLE METHOD %s",
+ fmtId(tsminfo->dobj.name));
+
+ appendPQExpBuffer(labelq, "TABLESAMPLE METHOD %s",
+ fmtId(tsminfo->dobj.name));
+
+ if (dopt->binary_upgrade)
+ binary_upgrade_extension_member(q, &tsminfo->dobj, labelq->data);
+
+ ArchiveEntry(fout, tsminfo->dobj.catId, tsminfo->dobj.dumpId,
+ tsminfo->dobj.name,
+ NULL,
+ NULL,
+ "",
+ false, "TABLESAMPLE METHOD", SECTION_PRE_DATA,
+ q->data, delq->data, NULL,
+ NULL, 0,
+ NULL, NULL);
+
+ /* Dump Parser Comments */
+ dumpComment(fout, dopt, labelq->data,
+ NULL, "",
+ tsminfo->dobj.catId, 0, tsminfo->dobj.dumpId);
+
+ PQclear(res);
+ destroyPQExpBuffer(q);
+ destroyPQExpBuffer(delq);
+ destroyPQExpBuffer(labelq);
+}
+
+/*
* dumpTSParser
* write out a single text search parser
*/
@@ -15619,6 +15789,7 @@ addBoundaryDependencies(DumpableObject **dobjs, int numObjs,
case DO_FDW:
case DO_FOREIGN_SERVER:
case DO_BLOB:
+ case DO_TABLESAMPLE_METHOD:
/* Pre-data objects: must come before the pre-data boundary */
addObjectDependency(preDataBound, dobj->dumpId);
break;
diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h
index f42c42d..645c07c 100644
--- a/src/bin/pg_dump/pg_dump.h
+++ b/src/bin/pg_dump/pg_dump.h
@@ -76,7 +76,8 @@ typedef enum
DO_POST_DATA_BOUNDARY,
DO_EVENT_TRIGGER,
DO_REFRESH_MATVIEW,
- DO_POLICY
+ DO_POLICY,
+ DO_TABLESAMPLE_METHOD
} DumpableObjectType;
typedef struct _dumpableObject
@@ -384,6 +385,12 @@ typedef struct _inhInfo
Oid inhparent; /* OID of its parent */
} InhInfo;
+typedef struct _tsmInfo
+{
+ DumpableObject dobj;
+ bool tsmseqscan;
+} TSMInfo;
+
typedef struct _prsInfo
{
DumpableObject dobj;
@@ -537,6 +544,7 @@ extern ProcLangInfo *getProcLangs(Archive *fout, int *numProcLangs);
extern CastInfo *getCasts(Archive *fout, DumpOptions *dopt, int *numCasts);
extern void getTableAttrs(Archive *fout, DumpOptions *dopt, TableInfo *tbinfo, int numTables);
extern bool shouldPrintColumn(DumpOptions *dopt, TableInfo *tbinfo, int colno);
+extern TSMInfo *getTableSampleMethods(Archive *fout, int *numTSMs);
extern TSParserInfo *getTSParsers(Archive *fout, int *numTSParsers);
extern TSDictInfo *getTSDictionaries(Archive *fout, int *numTSDicts);
extern TSTemplateInfo *getTSTemplates(Archive *fout, int *numTSTemplates);
diff --git a/src/bin/pg_dump/pg_dump_sort.c b/src/bin/pg_dump/pg_dump_sort.c
index 4b9bba0..cb009ce 100644
--- a/src/bin/pg_dump/pg_dump_sort.c
+++ b/src/bin/pg_dump/pg_dump_sort.c
@@ -73,7 +73,8 @@ static const int oldObjectTypePriority[] =
13, /* DO_POST_DATA_BOUNDARY */
20, /* DO_EVENT_TRIGGER */
15, /* DO_REFRESH_MATVIEW */
- 21 /* DO_POLICY */
+ 21, /* DO_POLICY */
+ 5 /* DO_TABLESAMPLE_METHOD */
};
/*
@@ -122,7 +123,8 @@ static const int newObjectTypePriority[] =
25, /* DO_POST_DATA_BOUNDARY */
32, /* DO_EVENT_TRIGGER */
33, /* DO_REFRESH_MATVIEW */
- 34 /* DO_POLICY */
+ 34, /* DO_POLICY */
+ 17 /* DO_TABLESAMPLE_METHOD */
};
static DumpId preDataBoundId;
@@ -1443,6 +1445,11 @@ describeDumpableObject(DumpableObject *obj, char *buf, int bufsize)
"POLICY (ID %d OID %u)",
obj->dumpId, obj->catId.oid);
return;
+ case DO_TABLESAMPLE_METHOD:
+ snprintf(buf, bufsize,
+ "TABLESAMPLE METHOD %s (ID %d OID %u)",
+ obj->name, obj->dumpId, obj->catId.oid);
+ return;
case DO_PRE_DATA_BOUNDARY:
snprintf(buf, bufsize,
"PRE-DATA BOUNDARY (ID %d)",
diff --git a/src/include/catalog/dependency.h b/src/include/catalog/dependency.h
index 6481ac8..30653f8 100644
--- a/src/include/catalog/dependency.h
+++ b/src/include/catalog/dependency.h
@@ -148,6 +148,7 @@ typedef enum ObjectClass
OCLASS_EXTENSION, /* pg_extension */
OCLASS_EVENT_TRIGGER, /* pg_event_trigger */
OCLASS_POLICY, /* pg_policy */
+ OCLASS_TABLESAMPLEMETHOD, /* pg_tablesample_method */
MAX_OCLASS /* MUST BE LAST */
} ObjectClass;
diff --git a/src/include/catalog/pg_tablesample_method.h b/src/include/catalog/pg_tablesample_method.h
index fd76f77..6a55669 100644
--- a/src/include/catalog/pg_tablesample_method.h
+++ b/src/include/catalog/pg_tablesample_method.h
@@ -69,7 +69,18 @@ typedef FormData_pg_tablesample_method *Form_pg_tablesample_method;
DATA(insert OID = 3283 ( system false tsm_system_init tsm_system_nextblock tsm_system_nexttuple - tsm_system_end tsm_system_reset tsm_system_cost ));
DESCR("SYSTEM table sampling method");
+#define TABLESAMPLE_METHOD_SYSTEM_OID 3283
DATA(insert OID = 3284 ( bernoulli true tsm_bernoulli_init tsm_bernoulli_nextblock tsm_bernoulli_nexttuple - tsm_bernoulli_end tsm_bernoulli_reset tsm_bernoulli_cost ));
DESCR("BERNOULLI table sampling method");
+#define TABLESAMPLE_METHOD_BERNOULLI_OID 3284
+
+/* ----------------
+ * functions for manipulation of pg_tablesample_method
+ * ----------------
+ */
+
+extern Oid DefineTablesampleMethod(List *names, List *parameters);
+extern void RemoveTablesampleMethodById(Oid tsmoid);
+extern Oid get_tablesample_method_oid(const char *tsmname, bool missing_ok);
#endif /* PG_TABLESAMPLE_METHOD_H */
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 2f4df1d..bc1fef2 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -1269,6 +1269,7 @@ typedef enum ObjectType
OBJECT_SEQUENCE,
OBJECT_TABCONSTRAINT,
OBJECT_TABLE,
+ OBJECT_TABLESAMPLEMETHOD,
OBJECT_TABLESPACE,
OBJECT_TRIGGER,
OBJECT_TSCONFIGURATION,
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index 6ff7b44..c3269c0 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -236,6 +236,7 @@ PG_KEYWORD("mapping", MAPPING, UNRESERVED_KEYWORD)
PG_KEYWORD("match", MATCH, UNRESERVED_KEYWORD)
PG_KEYWORD("materialized", MATERIALIZED, UNRESERVED_KEYWORD)
PG_KEYWORD("maxvalue", MAXVALUE, UNRESERVED_KEYWORD)
+PG_KEYWORD("method", METHOD, UNRESERVED_KEYWORD)
PG_KEYWORD("minute", MINUTE_P, UNRESERVED_KEYWORD)
PG_KEYWORD("minvalue", MINVALUE, UNRESERVED_KEYWORD)
PG_KEYWORD("mode", MODE, UNRESERVED_KEYWORD)
diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile
index 93d93af..37ea524 100644
--- a/src/test/modules/Makefile
+++ b/src/test/modules/Makefile
@@ -9,7 +9,8 @@ SUBDIRS = \
worker_spi \
dummy_seclabel \
test_shm_mq \
- test_parser
+ test_parser \
+ tablesample
all: submake-errcodes
diff --git a/src/test/modules/tablesample/.gitignore b/src/test/modules/tablesample/.gitignore
new file mode 100644
index 0000000..5dcb3ff
--- /dev/null
+++ b/src/test/modules/tablesample/.gitignore
@@ -0,0 +1,4 @@
+# Generated subdirectories
+/log/
+/results/
+/tmp_check/
diff --git a/src/test/modules/tablesample/Makefile b/src/test/modules/tablesample/Makefile
new file mode 100644
index 0000000..469b004
--- /dev/null
+++ b/src/test/modules/tablesample/Makefile
@@ -0,0 +1,21 @@
+# src/test/modules/tsm_test/Makefile
+
+MODULE_big = tsm_test
+OBJS = tsm_test.o $(WIN32RES)
+PGFILEDESC = "tsm_test - example of a custom tablesample method"
+
+EXTENSION = tsm_test
+DATA = tsm_test--1.0.sql
+
+REGRESS = tablesample
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/tablesample
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/src/test/modules/tablesample/expected/tablesample.out b/src/test/modules/tablesample/expected/tablesample.out
new file mode 100644
index 0000000..3cb3848
--- /dev/null
+++ b/src/test/modules/tablesample/expected/tablesample.out
@@ -0,0 +1,38 @@
+CREATE EXTENSION tsm_test;
+CREATE TABLE test_tsm AS SELECT md5(i::text) a, 0.5::float b FROM generate_series(1,10) g(i);
+SELECT * FROM test_tsm TABLESAMPLE tsm_test('b') REPEATABLE (1);
+ a | b
+----------------------------------+-----
+ c4ca4238a0b923820dcc509a6f75849b | 0.5
+ c81e728d9d4c2f636f067f89cc14862c | 0.5
+ a87ff679a2f3e71d9181a67b7542122c | 0.5
+ 1679091c5a880faf6fb5e6087eb1b2dc | 0.5
+ 8f14e45fceea167a5a36dedd4bea2543 | 0.5
+ d3d9446802a44259755d38e6d163e820 | 0.5
+(6 rows)
+
+CREATE VIEW test_tsm_v AS SELECT * FROM test_tsm TABLESAMPLE tsm_test('b') REPEATABLE (9999);
+SELECT * FROM test_tsm_v;
+ a | b
+----------------------------------+-----
+ c4ca4238a0b923820dcc509a6f75849b | 0.5
+ e4da3b7fbbce2345d7772b0674a318d5 | 0.5
+ 1679091c5a880faf6fb5e6087eb1b2dc | 0.5
+ 8f14e45fceea167a5a36dedd4bea2543 | 0.5
+ c9f0f895fb98ab9159f51fd0297e236d | 0.5
+(5 rows)
+
+DROP TABLESAMPLE METHOD tsm_test;
+ERROR: cannot drop tablesample method tsm_test because extension tsm_test requires it
+HINT: You can drop extension tsm_test instead.
+DROP EXTENSION tsm_test;
+ERROR: cannot drop extension tsm_test because other objects depend on it
+DETAIL: view test_tsm_v depends on tablesample method tsm_test
+HINT: Use DROP ... CASCADE to drop the dependent objects too.
+DROP EXTENSION tsm_test CASCADE;
+NOTICE: drop cascades to view test_tsm_v
+SELECT * FROM pg_tablesample_method WHERE tsmname = 'tsm_test';
+ tsmname | tsmseqscan | tsminit | tsmnextblock | tsmnexttuple | tsmexaminetuple | tsmend | tsmreset | tsmcost
+---------+------------+---------+--------------+--------------+-----------------+--------+----------+---------
+(0 rows)
+
diff --git a/src/test/modules/tablesample/sql/tablesample.sql b/src/test/modules/tablesample/sql/tablesample.sql
new file mode 100644
index 0000000..b1104d6
--- /dev/null
+++ b/src/test/modules/tablesample/sql/tablesample.sql
@@ -0,0 +1,14 @@
+CREATE EXTENSION tsm_test;
+
+CREATE TABLE test_tsm AS SELECT md5(i::text) a, 0.5::float b FROM generate_series(1,10) g(i);
+
+SELECT * FROM test_tsm TABLESAMPLE tsm_test('b') REPEATABLE (1);
+
+CREATE VIEW test_tsm_v AS SELECT * FROM test_tsm TABLESAMPLE tsm_test('b') REPEATABLE (9999);
+SELECT * FROM test_tsm_v;
+
+DROP TABLESAMPLE METHOD tsm_test;
+DROP EXTENSION tsm_test;
+DROP EXTENSION tsm_test CASCADE;
+
+SELECT * FROM pg_tablesample_method WHERE tsmname = 'tsm_test';
diff --git a/src/test/modules/tablesample/tsm_test--1.0.sql b/src/test/modules/tablesample/tsm_test--1.0.sql
new file mode 100644
index 0000000..2280ab0
--- /dev/null
+++ b/src/test/modules/tablesample/tsm_test--1.0.sql
@@ -0,0 +1,51 @@
+/* src/test/modules/tablesample/tsm_test--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION tsm_test" to load this file. \quit
+
+CREATE FUNCTION tsm_test_init(internal, int4, text)
+RETURNS void
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION tsm_test_nextblock(internal)
+RETURNS int4
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION tsm_test_nexttuple(internal, int4, int2)
+RETURNS int2
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION tsm_test_examinetuple(internal, int4, internal, bool)
+RETURNS bool
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION tsm_test_end(internal)
+RETURNS void
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION tsm_test_reset(internal)
+RETURNS void
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION tsm_test_cost(internal, internal, internal, internal, internal, internal, internal)
+RETURNS void
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+
+CREATE TABLESAMPLE METHOD tsm_test (
+ SEQSCAN = true,
+ INIT = tsm_test_init,
+ NEXTBLOCK = tsm_test_nextblock,
+ NEXTTUPLE = tsm_test_nexttuple,
+ EXAMINETUPLE = tsm_test_examinetuple,
+ END = tsm_test_end,
+ RESET = tsm_test_reset,
+ COST = tsm_test_cost
+);
diff --git a/src/test/modules/tablesample/tsm_test.c b/src/test/modules/tablesample/tsm_test.c
new file mode 100644
index 0000000..be4dcb9
--- /dev/null
+++ b/src/test/modules/tablesample/tsm_test.c
@@ -0,0 +1,228 @@
+/*-------------------------------------------------------------------------
+ *
+ * tsm_test.c
+ * Simple example of a custom tablesample method
+ *
+ * Copyright (c) 2007-2014, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/test/modules/tablesample/tsm_test.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "fmgr.h"
+
+#include "access/htup_details.h"
+#include "access/relscan.h"
+#include "access/tupdesc.h"
+#include "catalog/pg_type.h"
+#include "nodes/execnodes.h"
+#include "nodes/relation.h"
+#include "optimizer/clauses.h"
+#include "storage/bufmgr.h"
+#include "utils/builtins.h"
+#include "utils/lsyscache.h"
+#include "utils/rel.h"
+#include "utils/sampling.h"
+#include "utils/tablesample.h"
+
+PG_MODULE_MAGIC;
+
+/* State */
+typedef struct
+{
+ uint32 seed; /* random seed */
+ AttrNumber attnum; /* column to check */
+ TupleDesc tupDesc; /* tuple descriptor of table */
+ BlockNumber startblock; /* starting block, we use ths for syncscan support */
+ BlockNumber nblocks; /* total blocks in relation */
+ BlockNumber blockno; /* current block */
+ OffsetNumber lt; /* last tuple returned from current block */
+ SamplerRandomState randstate; /* random generator state */
+} tsm_test_state;
+
+
+PG_FUNCTION_INFO_V1(tsm_test_init);
+PG_FUNCTION_INFO_V1(tsm_test_nextblock);
+PG_FUNCTION_INFO_V1(tsm_test_nexttuple);
+PG_FUNCTION_INFO_V1(tsm_test_examinetuple);
+PG_FUNCTION_INFO_V1(tsm_test_end);
+PG_FUNCTION_INFO_V1(tsm_test_reset);
+PG_FUNCTION_INFO_V1(tsm_test_cost);
+
+/*
+ * Initialize the state.
+ */
+Datum
+tsm_test_init(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ uint32 seed = PG_GETARG_UINT32(1);
+ char *attname;
+ AttrNumber attnum;
+ Oid atttype;
+ Relation rel = scanstate->ss.ss_currentRelation;
+ HeapScanDesc scan = scanstate->ss.ss_currentScanDesc;
+ tsm_test_state *state;
+ TupleDesc tupDesc = RelationGetDescr(rel);
+
+ if (PG_ARGISNULL(2))
+ ereport(ERROR,
+ (errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED),
+ errmsg("invalid parameter for tablesample method tsm_test"),
+ errhint("attnum cannot be NULL.")));
+
+ attname = text_to_cstring(PG_GETARG_TEXT_P(2));
+
+ attnum = get_attnum(rel->rd_id, attname);
+ if (!AttrNumberIsForUserDefinedAttr(attnum))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid parameter for tablesample method tsm_test"),
+ errhint("column %s does not exist", attname)));
+
+ atttype = get_atttype(rel->rd_id, attnum);
+ if (atttype != FLOAT8OID)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid parameter for tablesample method tsm_test"),
+ errhint("column %s is not of type float.", attname)));
+
+ state = palloc0(sizeof(tsm_test_state));
+
+ /* Remember initial values for reinit */
+ state->seed = seed;
+ state->attnum = attnum;
+ state->tupDesc = tupDesc;
+ state->startblock = scan->rs_startblock;
+ state->nblocks = scan->rs_nblocks;
+ state->blockno = InvalidBlockNumber;
+ state->lt = InvalidOffsetNumber;
+ sampler_random_init_state(state->seed, state->randstate);
+
+ scanstate->tsmdata = (void *) state;
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Get next block number to read or InvalidBlockNumber if we are at the
+ * end of the relation.
+ */
+Datum
+tsm_test_nextblock(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ tsm_test_state *state = (tsm_test_state *) scanstate->tsmdata;
+
+ /* Cycle from startblock to startblock to support syncscan. */
+ if (state->blockno == InvalidBlockNumber)
+ state->blockno = state->startblock;
+ else
+ {
+ state->blockno++;
+
+ if (state->blockno >= state->nblocks)
+ state->blockno = 0;
+
+ if (state->blockno == state->startblock)
+ PG_RETURN_UINT32(InvalidBlockNumber);
+ }
+
+ PG_RETURN_UINT32(state->blockno);
+}
+
+/*
+ * Get next tuple from current block.
+ */
+Datum
+tsm_test_nexttuple(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ OffsetNumber maxoffset = PG_GETARG_UINT16(2);
+ tsm_test_state *state = (tsm_test_state *) scanstate->tsmdata;
+
+ if (state->lt == InvalidOffsetNumber)
+ state->lt = FirstOffsetNumber;
+ else if (++state->lt > maxoffset)
+ PG_RETURN_UINT16(InvalidOffsetNumber);
+
+ PG_RETURN_UINT16(state->lt);
+}
+
+/*
+ * Examine tuple and decide if it should be returned.
+ */
+Datum
+tsm_test_examinetuple(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ HeapTuple tuple = (HeapTuple) PG_GETARG_POINTER(2);
+ bool visible = PG_GETARG_BOOL(3);
+ tsm_test_state *state = (tsm_test_state *) scanstate->tsmdata;
+ bool isnull;
+ float8 val, rand;
+
+ if (!visible)
+ PG_RETURN_BOOL(false);
+
+ val = DatumGetFloat8(heap_getattr(tuple, state->attnum, state->tupDesc, &isnull));
+ rand = sampler_random_fract(state->randstate);
+ if (isnull || val < rand)
+ PG_RETURN_BOOL(false);
+ else
+ PG_RETURN_BOOL(true);
+}
+
+/*
+ * Cleanup method.
+ */
+Datum
+tsm_test_end(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+
+ pfree(scanstate->tsmdata);
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Reset state (called by ReScan).
+ */
+Datum
+tsm_test_reset(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ tsm_test_state *state = (tsm_test_state *) scanstate->tsmdata;
+
+ state->blockno = InvalidBlockNumber;
+ state->lt = InvalidOffsetNumber;
+
+ sampler_random_init_state(state->seed, state->randstate);
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Costing function.
+ */
+Datum
+tsm_test_cost(PG_FUNCTION_ARGS)
+{
+ Path *path = (Path *) PG_GETARG_POINTER(1);
+ RelOptInfo *baserel = (RelOptInfo *) PG_GETARG_POINTER(2);
+ BlockNumber *pages = (BlockNumber *) PG_GETARG_POINTER(4);
+ double *tuples = (double *) PG_GETARG_POINTER(5);
+
+ *pages = baserel->pages;
+
+ /* This is very bad estimation */
+ *tuples = path->rows = path->rows/2;
+
+ PG_RETURN_VOID();
+}
+
diff --git a/src/test/modules/tablesample/tsm_test.control b/src/test/modules/tablesample/tsm_test.control
new file mode 100644
index 0000000..a7b2741
--- /dev/null
+++ b/src/test/modules/tablesample/tsm_test.control
@@ -0,0 +1,5 @@
+# tsm_test extension
+comment = 'test module for custom tablesample method'
+default_version = '1.0'
+module_pathname = '$libdir/tsm_test'
+relocatable = true
--
1.9.1
Tomas noticed that the patch is missing error check when TABLESAMPLE is
used on view, so here is a new version that checks it's only used
against table or matview.
No other changes.
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
0001-separate-block-sampling-functions-v2.patchtext/x-diff; name=0001-separate-block-sampling-functions-v2.patchDownload
>From 3d7b57ffca70a067d31ed3a99bc9a07f2b836372 Mon Sep 17 00:00:00 2001
From: Petr Jelinek <pjmodos@pjmodos.net>
Date: Wed, 7 Jan 2015 23:36:56 +0100
Subject: [PATCH 1/3] separate block sampling functions v2
---
contrib/file_fdw/file_fdw.c | 9 +-
contrib/postgres_fdw/postgres_fdw.c | 10 +-
src/backend/commands/analyze.c | 225 +----------------------------------
src/backend/utils/misc/Makefile | 2 +-
src/backend/utils/misc/sampling.c | 226 ++++++++++++++++++++++++++++++++++++
src/include/commands/vacuum.h | 3 -
src/include/utils/sampling.h | 44 +++++++
7 files changed, 287 insertions(+), 232 deletions(-)
create mode 100644 src/backend/utils/misc/sampling.c
create mode 100644 src/include/utils/sampling.h
diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index 4368897..249d541 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -34,6 +34,7 @@
#include "optimizer/var.h"
#include "utils/memutils.h"
#include "utils/rel.h"
+#include "utils/sampling.h"
PG_MODULE_MAGIC;
@@ -1005,7 +1006,7 @@ file_acquire_sample_rows(Relation onerel, int elevel,
{
int numrows = 0;
double rowstoskip = -1; /* -1 means not set yet */
- double rstate;
+ ReservoirStateData rstate;
TupleDesc tupDesc;
Datum *values;
bool *nulls;
@@ -1043,7 +1044,7 @@ file_acquire_sample_rows(Relation onerel, int elevel,
ALLOCSET_DEFAULT_MAXSIZE);
/* Prepare for sampling rows */
- rstate = anl_init_selection_state(targrows);
+ reservoir_init_selection_state(&rstate, targrows);
/* Set up callback to identify error line number. */
errcallback.callback = CopyFromErrorCallback;
@@ -1087,7 +1088,7 @@ file_acquire_sample_rows(Relation onerel, int elevel,
* not-yet-incremented value of totalrows as t.
*/
if (rowstoskip < 0)
- rowstoskip = anl_get_next_S(*totalrows, targrows, &rstate);
+ rowstoskip = reservoir_get_next_S(&rstate, *totalrows, targrows);
if (rowstoskip <= 0)
{
@@ -1095,7 +1096,7 @@ file_acquire_sample_rows(Relation onerel, int elevel,
* Found a suitable tuple, so save it, replacing one old tuple
* at random
*/
- int k = (int) (targrows * anl_random_fract());
+ int k = (int) (targrows * sampler_random_fract());
Assert(k >= 0 && k < targrows);
heap_freetuple(rows[k]);
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 63f0577..59aaff7 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -37,6 +37,7 @@
#include "utils/lsyscache.h"
#include "utils/memutils.h"
#include "utils/rel.h"
+#include "utils/sampling.h"
PG_MODULE_MAGIC;
@@ -202,7 +203,7 @@ typedef struct PgFdwAnalyzeState
/* for random sampling */
double samplerows; /* # of rows fetched */
double rowstoskip; /* # of rows to skip before next sample */
- double rstate; /* random state */
+ ReservoirStateData rstate; /* state for reservoir sampling*/
/* working memory contexts */
MemoryContext anl_cxt; /* context for per-analyze lifespan data */
@@ -2393,7 +2394,7 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,
astate.numrows = 0;
astate.samplerows = 0;
astate.rowstoskip = -1; /* -1 means not set yet */
- astate.rstate = anl_init_selection_state(targrows);
+ reservoir_init_selection_state(&astate.rstate, targrows);
/* Remember ANALYZE context, and create a per-tuple temp context */
astate.anl_cxt = CurrentMemoryContext;
@@ -2533,13 +2534,12 @@ analyze_row_processor(PGresult *res, int row, PgFdwAnalyzeState *astate)
* analyze.c; see Jeff Vitter's paper.
*/
if (astate->rowstoskip < 0)
- astate->rowstoskip = anl_get_next_S(astate->samplerows, targrows,
- &astate->rstate);
+ astate->rowstoskip = reservoir_get_next_S(&astate->rstate, astate->samplerows, targrows);
if (astate->rowstoskip <= 0)
{
/* Choose a random reservoir element to replace. */
- pos = (int) (targrows * anl_random_fract());
+ pos = (int) (targrows * sampler_random_fract());
Assert(pos >= 0 && pos < targrows);
heap_freetuple(astate->rows[pos]);
}
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index d2856a3..fc9dd44 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -50,23 +50,13 @@
#include "utils/lsyscache.h"
#include "utils/memutils.h"
#include "utils/pg_rusage.h"
+#include "utils/sampling.h"
#include "utils/sortsupport.h"
#include "utils/syscache.h"
#include "utils/timestamp.h"
#include "utils/tqual.h"
-/* Data structure for Algorithm S from Knuth 3.4.2 */
-typedef struct
-{
- BlockNumber N; /* number of blocks, known in advance */
- int n; /* desired sample size */
- BlockNumber t; /* current block number */
- int m; /* blocks selected so far */
-} BlockSamplerData;
-
-typedef BlockSamplerData *BlockSampler;
-
/* Per-index data for ANALYZE */
typedef struct AnlIndexData
{
@@ -88,10 +78,6 @@ static BufferAccessStrategy vac_strategy;
static void do_analyze_rel(Relation onerel, VacuumStmt *vacstmt,
AcquireSampleRowsFunc acquirefunc, BlockNumber relpages,
bool inh, bool in_outer_xact, int elevel);
-static void BlockSampler_Init(BlockSampler bs, BlockNumber nblocks,
- int samplesize);
-static bool BlockSampler_HasMore(BlockSampler bs);
-static BlockNumber BlockSampler_Next(BlockSampler bs);
static void compute_index_stats(Relation onerel, double totalrows,
AnlIndexData *indexdata, int nindexes,
HeapTuple *rows, int numrows,
@@ -947,94 +933,6 @@ examine_attribute(Relation onerel, int attnum, Node *index_expr)
}
/*
- * BlockSampler_Init -- prepare for random sampling of blocknumbers
- *
- * BlockSampler is used for stage one of our new two-stage tuple
- * sampling mechanism as discussed on pgsql-hackers 2004-04-02 (subject
- * "Large DB"). It selects a random sample of samplesize blocks out of
- * the nblocks blocks in the table. If the table has less than
- * samplesize blocks, all blocks are selected.
- *
- * Since we know the total number of blocks in advance, we can use the
- * straightforward Algorithm S from Knuth 3.4.2, rather than Vitter's
- * algorithm.
- */
-static void
-BlockSampler_Init(BlockSampler bs, BlockNumber nblocks, int samplesize)
-{
- bs->N = nblocks; /* measured table size */
-
- /*
- * If we decide to reduce samplesize for tables that have less or not much
- * more than samplesize blocks, here is the place to do it.
- */
- bs->n = samplesize;
- bs->t = 0; /* blocks scanned so far */
- bs->m = 0; /* blocks selected so far */
-}
-
-static bool
-BlockSampler_HasMore(BlockSampler bs)
-{
- return (bs->t < bs->N) && (bs->m < bs->n);
-}
-
-static BlockNumber
-BlockSampler_Next(BlockSampler bs)
-{
- BlockNumber K = bs->N - bs->t; /* remaining blocks */
- int k = bs->n - bs->m; /* blocks still to sample */
- double p; /* probability to skip block */
- double V; /* random */
-
- Assert(BlockSampler_HasMore(bs)); /* hence K > 0 and k > 0 */
-
- if ((BlockNumber) k >= K)
- {
- /* need all the rest */
- bs->m++;
- return bs->t++;
- }
-
- /*----------
- * It is not obvious that this code matches Knuth's Algorithm S.
- * Knuth says to skip the current block with probability 1 - k/K.
- * If we are to skip, we should advance t (hence decrease K), and
- * repeat the same probabilistic test for the next block. The naive
- * implementation thus requires an anl_random_fract() call for each block
- * number. But we can reduce this to one anl_random_fract() call per
- * selected block, by noting that each time the while-test succeeds,
- * we can reinterpret V as a uniform random number in the range 0 to p.
- * Therefore, instead of choosing a new V, we just adjust p to be
- * the appropriate fraction of its former value, and our next loop
- * makes the appropriate probabilistic test.
- *
- * We have initially K > k > 0. If the loop reduces K to equal k,
- * the next while-test must fail since p will become exactly zero
- * (we assume there will not be roundoff error in the division).
- * (Note: Knuth suggests a "<=" loop condition, but we use "<" just
- * to be doubly sure about roundoff error.) Therefore K cannot become
- * less than k, which means that we cannot fail to select enough blocks.
- *----------
- */
- V = anl_random_fract();
- p = 1.0 - (double) k / (double) K;
- while (V < p)
- {
- /* skip */
- bs->t++;
- K--; /* keep K == N - t */
-
- /* adjust p to be new cutoff point in reduced range */
- p *= 1.0 - (double) k / (double) K;
- }
-
- /* select */
- bs->m++;
- return bs->t++;
-}
-
-/*
* acquire_sample_rows -- acquire a random sample of rows from the table
*
* Selected rows are returned in the caller-allocated array rows[], which
@@ -1080,7 +978,7 @@ acquire_sample_rows(Relation onerel, int elevel,
BlockNumber totalblocks;
TransactionId OldestXmin;
BlockSamplerData bs;
- double rstate;
+ ReservoirStateData rstate;
Assert(targrows > 0);
@@ -1090,9 +988,9 @@ acquire_sample_rows(Relation onerel, int elevel,
OldestXmin = GetOldestXmin(onerel, true);
/* Prepare for sampling block numbers */
- BlockSampler_Init(&bs, totalblocks, targrows);
+ BlockSampler_Init(&bs, totalblocks, targrows, random());
/* Prepare for sampling rows */
- rstate = anl_init_selection_state(targrows);
+ reservoir_init_selection_state(&rstate, targrows);
/* Outer loop over blocks to sample */
while (BlockSampler_HasMore(&bs))
@@ -1240,8 +1138,7 @@ acquire_sample_rows(Relation onerel, int elevel,
* t.
*/
if (rowstoskip < 0)
- rowstoskip = anl_get_next_S(samplerows, targrows,
- &rstate);
+ rowstoskip = reservoir_get_next_S(&rstate, samplerows, targrows);
if (rowstoskip <= 0)
{
@@ -1249,7 +1146,7 @@ acquire_sample_rows(Relation onerel, int elevel,
* Found a suitable tuple, so save it, replacing one
* old tuple at random
*/
- int k = (int) (targrows * anl_random_fract());
+ int k = (int) (targrows * sampler_random_fract());
Assert(k >= 0 && k < targrows);
heap_freetuple(rows[k]);
@@ -1308,116 +1205,6 @@ acquire_sample_rows(Relation onerel, int elevel,
return numrows;
}
-/* Select a random value R uniformly distributed in (0 - 1) */
-double
-anl_random_fract(void)
-{
- return ((double) random() + 1) / ((double) MAX_RANDOM_VALUE + 2);
-}
-
-/*
- * These two routines embody Algorithm Z from "Random sampling with a
- * reservoir" by Jeffrey S. Vitter, in ACM Trans. Math. Softw. 11, 1
- * (Mar. 1985), Pages 37-57. Vitter describes his algorithm in terms
- * of the count S of records to skip before processing another record.
- * It is computed primarily based on t, the number of records already read.
- * The only extra state needed between calls is W, a random state variable.
- *
- * anl_init_selection_state computes the initial W value.
- *
- * Given that we've already read t records (t >= n), anl_get_next_S
- * determines the number of records to skip before the next record is
- * processed.
- */
-double
-anl_init_selection_state(int n)
-{
- /* Initial value of W (for use when Algorithm Z is first applied) */
- return exp(-log(anl_random_fract()) / n);
-}
-
-double
-anl_get_next_S(double t, int n, double *stateptr)
-{
- double S;
-
- /* The magic constant here is T from Vitter's paper */
- if (t <= (22.0 * n))
- {
- /* Process records using Algorithm X until t is large enough */
- double V,
- quot;
-
- V = anl_random_fract(); /* Generate V */
- S = 0;
- t += 1;
- /* Note: "num" in Vitter's code is always equal to t - n */
- quot = (t - (double) n) / t;
- /* Find min S satisfying (4.1) */
- while (quot > V)
- {
- S += 1;
- t += 1;
- quot *= (t - (double) n) / t;
- }
- }
- else
- {
- /* Now apply Algorithm Z */
- double W = *stateptr;
- double term = t - (double) n + 1;
-
- for (;;)
- {
- double numer,
- numer_lim,
- denom;
- double U,
- X,
- lhs,
- rhs,
- y,
- tmp;
-
- /* Generate U and X */
- U = anl_random_fract();
- X = t * (W - 1.0);
- S = floor(X); /* S is tentatively set to floor(X) */
- /* Test if U <= h(S)/cg(X) in the manner of (6.3) */
- tmp = (t + 1) / term;
- lhs = exp(log(((U * tmp * tmp) * (term + S)) / (t + X)) / n);
- rhs = (((t + X) / (term + S)) * term) / t;
- if (lhs <= rhs)
- {
- W = rhs / lhs;
- break;
- }
- /* Test if U <= f(S)/cg(X) */
- y = (((U * (t + 1)) / term) * (t + S + 1)) / (t + X);
- if ((double) n < S)
- {
- denom = t;
- numer_lim = term + S;
- }
- else
- {
- denom = t - (double) n + S;
- numer_lim = t + 1;
- }
- for (numer = t + S; numer >= numer_lim; numer -= 1)
- {
- y *= numer / denom;
- denom -= 1;
- }
- W = exp(-log(anl_random_fract()) / n); /* Generate W in advance */
- if (exp(log(y) / n) <= (t + X) / t)
- break;
- }
- *stateptr = W;
- }
- return S;
-}
-
/*
* qsort comparator for sorting rows[] array
*/
diff --git a/src/backend/utils/misc/Makefile b/src/backend/utils/misc/Makefile
index 378b77e..7889101 100644
--- a/src/backend/utils/misc/Makefile
+++ b/src/backend/utils/misc/Makefile
@@ -15,7 +15,7 @@ include $(top_builddir)/src/Makefile.global
override CPPFLAGS := -I. -I$(srcdir) $(CPPFLAGS)
OBJS = guc.o help_config.o pg_rusage.o ps_status.o rls.o \
- superuser.o timeout.o tzparser.o
+ sampling.o superuser.o timeout.o tzparser.o
# This location might depend on the installation directories. Therefore
# we can't subsitute it into pg_config.h.
diff --git a/src/backend/utils/misc/sampling.c b/src/backend/utils/misc/sampling.c
new file mode 100644
index 0000000..1eeabaf
--- /dev/null
+++ b/src/backend/utils/misc/sampling.c
@@ -0,0 +1,226 @@
+/*-------------------------------------------------------------------------
+ *
+ * sampling.c
+ * Relation block sampling routines.
+ *
+ * Portions Copyright (c) 1996-2012, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ * src/backend/utils/misc/sampling.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include <math.h>
+
+#include "utils/sampling.h"
+
+
+/*
+ * BlockSampler_Init -- prepare for random sampling of blocknumbers
+ *
+ * BlockSampler provides algorithm for block level sampling of a relation
+ * as discussed on pgsql-hackers 2004-04-02 (subject "Large DB")
+ * It selects a random sample of samplesize blocks out of
+ * the nblocks blocks in the table. If the table has less than
+ * samplesize blocks, all blocks are selected.
+ *
+ * Since we know the total number of blocks in advance, we can use the
+ * straightforward Algorithm S from Knuth 3.4.2, rather than Vitter's
+ * algorithm.
+ */
+void
+BlockSampler_Init(BlockSampler bs, BlockNumber nblocks, int samplesize,
+ long randseed)
+{
+ bs->N = nblocks; /* measured table size */
+
+ /*
+ * If we decide to reduce samplesize for tables that have less or not much
+ * more than samplesize blocks, here is the place to do it.
+ */
+ bs->n = samplesize;
+ bs->t = 0; /* blocks scanned so far */
+ bs->m = 0; /* blocks selected so far */
+}
+
+bool
+BlockSampler_HasMore(BlockSampler bs)
+{
+ return (bs->t < bs->N) && (bs->m < bs->n);
+}
+
+BlockNumber
+BlockSampler_Next(BlockSampler bs)
+{
+ BlockNumber K = bs->N - bs->t; /* remaining blocks */
+ int k = bs->n - bs->m; /* blocks still to sample */
+ double p; /* probability to skip block */
+ double V; /* random */
+
+ Assert(BlockSampler_HasMore(bs)); /* hence K > 0 and k > 0 */
+
+ if ((BlockNumber) k >= K)
+ {
+ /* need all the rest */
+ bs->m++;
+ return bs->t++;
+ }
+
+ /*----------
+ * It is not obvious that this code matches Knuth's Algorithm S.
+ * Knuth says to skip the current block with probability 1 - k/K.
+ * If we are to skip, we should advance t (hence decrease K), and
+ * repeat the same probabilistic test for the next block. The naive
+ * implementation thus requires an sampler_random_fract() call for each
+ * block number. But we can reduce this to one sampler_random_fract()
+ * call per selected block, by noting that each time the while-test
+ * succeeds, we can reinterpret V as a uniform random number in the range
+ * 0 to p. Therefore, instead of choosing a new V, we just adjust p to be
+ * the appropriate fraction of its former value, and our next loop
+ * makes the appropriate probabilistic test.
+ *
+ * We have initially K > k > 0. If the loop reduces K to equal k,
+ * the next while-test must fail since p will become exactly zero
+ * (we assume there will not be roundoff error in the division).
+ * (Note: Knuth suggests a "<=" loop condition, but we use "<" just
+ * to be doubly sure about roundoff error.) Therefore K cannot become
+ * less than k, which means that we cannot fail to select enough blocks.
+ *----------
+ */
+ V = sampler_random_fract();
+ p = 1.0 - (double) k / (double) K;
+ while (V < p)
+ {
+ /* skip */
+ bs->t++;
+ K--; /* keep K == N - t */
+
+ /* adjust p to be new cutoff point in reduced range */
+ p *= 1.0 - (double) k / (double) K;
+ }
+
+ /* select */
+ bs->m++;
+ return bs->t++;
+}
+
+/*
+ * These two routines embody Algorithm Z from "Random sampling with a
+ * reservoir" by Jeffrey S. Vitter, in ACM Trans. Math. Softw. 11, 1
+ * (Mar. 1985), Pages 37-57. Vitter describes his algorithm in terms
+ * of the count S of records to skip before processing another record.
+ * It is computed primarily based on t, the number of records already read.
+ * The only extra state needed between calls is W, a random state variable.
+ *
+ * reservoir_init_selection_state computes the initial W value.
+ *
+ * Given that we've already read t records (t >= n), reservoir_get_next_S
+ * determines the number of records to skip before the next record is
+ * processed.
+ */
+void
+reservoir_init_selection_state(ReservoirState rs, int n)
+{
+ /* Initial value of W (for use when Algorithm Z is first applied) */
+ *rs = exp(-log(sampler_random_fract()) / n);
+}
+
+double
+reservoir_get_next_S(ReservoirState rs, double t, int n)
+{
+ double S;
+
+ /* The magic constant here is T from Vitter's paper */
+ if (t <= (22.0 * n))
+ {
+ /* Process records using Algorithm X until t is large enough */
+ double V,
+ quot;
+
+ V = sampler_random_fract(); /* Generate V */
+ S = 0;
+ t += 1;
+ /* Note: "num" in Vitter's code is always equal to t - n */
+ quot = (t - (double) n) / t;
+ /* Find min S satisfying (4.1) */
+ while (quot > V)
+ {
+ S += 1;
+ t += 1;
+ quot *= (t - (double) n) / t;
+ }
+ }
+ else
+ {
+ /* Now apply Algorithm Z */
+ double W = *rs;
+ double term = t - (double) n + 1;
+
+ for (;;)
+ {
+ double numer,
+ numer_lim,
+ denom;
+ double U,
+ X,
+ lhs,
+ rhs,
+ y,
+ tmp;
+
+ /* Generate U and X */
+ U = sampler_random_fract();
+ X = t * (W - 1.0);
+ S = floor(X); /* S is tentatively set to floor(X) */
+ /* Test if U <= h(S)/cg(X) in the manner of (6.3) */
+ tmp = (t + 1) / term;
+ lhs = exp(log(((U * tmp * tmp) * (term + S)) / (t + X)) / n);
+ rhs = (((t + X) / (term + S)) * term) / t;
+ if (lhs <= rhs)
+ {
+ W = rhs / lhs;
+ break;
+ }
+ /* Test if U <= f(S)/cg(X) */
+ y = (((U * (t + 1)) / term) * (t + S + 1)) / (t + X);
+ if ((double) n < S)
+ {
+ denom = t;
+ numer_lim = term + S;
+ }
+ else
+ {
+ denom = t - (double) n + S;
+ numer_lim = t + 1;
+ }
+ for (numer = t + S; numer >= numer_lim; numer -= 1)
+ {
+ y *= numer / denom;
+ denom -= 1;
+ }
+ W = exp(-log(sampler_random_fract()) / n); /* Generate W in advance */
+ if (exp(log(y) / n) <= (t + X) / t)
+ break;
+ }
+ *rs = W;
+ }
+ return S;
+}
+
+
+/*----------
+ * Random number generator used by sampling
+ *----------
+ */
+
+/* Select a random value R uniformly distributed in (0 - 1) */
+double
+sampler_random_fract()
+{
+ return ((double) random() + 1) / ((double) MAX_RANDOM_VALUE + 2);
+}
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index 4275484..d38fead 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -178,8 +178,5 @@ extern void lazy_vacuum_rel(Relation onerel, VacuumStmt *vacstmt,
extern void analyze_rel(Oid relid, VacuumStmt *vacstmt,
bool in_outer_xact, BufferAccessStrategy bstrategy);
extern bool std_typanalyze(VacAttrStats *stats);
-extern double anl_random_fract(void);
-extern double anl_init_selection_state(int n);
-extern double anl_get_next_S(double t, int n, double *stateptr);
#endif /* VACUUM_H */
diff --git a/src/include/utils/sampling.h b/src/include/utils/sampling.h
new file mode 100644
index 0000000..e3e7f9c
--- /dev/null
+++ b/src/include/utils/sampling.h
@@ -0,0 +1,44 @@
+/*-------------------------------------------------------------------------
+ *
+ * sampling.h
+ * definitions for sampling functions
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/utils/sampling.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef SAMPLING_H
+#define SAMPLING_H
+
+#include "storage/bufmgr.h"
+
+extern double sampler_random_fract(void);
+
+/* Block sampling methods */
+/* Data structure for Algorithm S from Knuth 3.4.2 */
+typedef struct
+{
+ BlockNumber N; /* number of blocks, known in advance */
+ int n; /* desired sample size */
+ BlockNumber t; /* current block number */
+ int m; /* blocks selected so far */
+} BlockSamplerData;
+
+typedef BlockSamplerData *BlockSampler;
+
+extern void BlockSampler_Init(BlockSampler bs, BlockNumber nblocks,
+ int samplesize, long randseed);
+extern bool BlockSampler_HasMore(BlockSampler bs);
+extern BlockNumber BlockSampler_Next(BlockSampler bs);
+
+/* Reservoid sampling methods */
+typedef double ReservoirStateData;
+typedef ReservoirStateData *ReservoirState;
+
+extern void reservoir_init_selection_state(ReservoirState rs, int n);
+extern double reservoir_get_next_S(ReservoirState rs, double t, int n);
+
+#endif /* SAMPLING_H */
--
1.9.1
0002-tablesample-v9.patchtext/x-diff; name=0002-tablesample-v9.patchDownload
>From 1de801e5e46891389641bfc1303634141839ba01 Mon Sep 17 00:00:00 2001
From: Petr Jelinek <pjmodos@pjmodos.net>
Date: Wed, 7 Jan 2015 23:37:55 +0100
Subject: [PATCH 2/3] tablesample v9
---
contrib/file_fdw/file_fdw.c | 2 +-
contrib/postgres_fdw/postgres_fdw.c | 2 +-
doc/src/sgml/catalogs.sgml | 112 +++++++
doc/src/sgml/ref/select.sgml | 38 ++-
src/backend/access/Makefile | 3 +-
src/backend/access/heap/heapam.c | 7 +-
src/backend/catalog/Makefile | 2 +-
src/backend/commands/analyze.c | 2 +-
src/backend/commands/explain.c | 7 +
src/backend/executor/Makefile | 2 +-
src/backend/executor/execAmi.c | 8 +
src/backend/executor/execCurrent.c | 1 +
src/backend/executor/execProcnode.c | 14 +
src/backend/executor/nodeSamplescan.c | 500 ++++++++++++++++++++++++++++
src/backend/nodes/copyfuncs.c | 59 ++++
src/backend/nodes/equalfuncs.c | 36 ++
src/backend/nodes/nodeFuncs.c | 12 +
src/backend/nodes/outfuncs.c | 47 +++
src/backend/nodes/readfuncs.c | 44 +++
src/backend/optimizer/path/allpaths.c | 49 +++
src/backend/optimizer/path/costsize.c | 68 ++++
src/backend/optimizer/plan/createplan.c | 69 ++++
src/backend/optimizer/plan/setrefs.c | 11 +
src/backend/optimizer/plan/subselect.c | 1 +
src/backend/optimizer/util/pathnode.c | 22 ++
src/backend/parser/gram.y | 40 ++-
src/backend/parser/parse_clause.c | 48 ++-
src/backend/parser/parse_func.c | 130 ++++++++
src/backend/utils/Makefile | 3 +-
src/backend/utils/adt/ruleutils.c | 50 +++
src/backend/utils/cache/syscache.c | 23 ++
src/backend/utils/misc/sampling.c | 33 +-
src/backend/utils/tablesample/Makefile | 17 +
src/backend/utils/tablesample/bernoulli.c | 224 +++++++++++++
src/backend/utils/tablesample/system.c | 185 ++++++++++
src/include/access/heapam.h | 1 +
src/include/access/relscan.h | 1 +
src/include/catalog/indexing.h | 5 +
src/include/catalog/pg_proc.h | 25 ++
src/include/catalog/pg_tablesample_method.h | 75 +++++
src/include/executor/nodeSamplescan.h | 24 ++
src/include/nodes/execnodes.h | 18 +
src/include/nodes/nodes.h | 4 +
src/include/nodes/parsenodes.h | 35 ++
src/include/nodes/plannodes.h | 6 +
src/include/optimizer/cost.h | 1 +
src/include/optimizer/pathnode.h | 2 +
src/include/parser/kwlist.h | 3 +-
src/include/parser/parse_func.h | 4 +
src/include/utils/rel.h | 1 -
src/include/utils/sampling.h | 15 +-
src/include/utils/syscache.h | 2 +
src/include/utils/tablesample.h | 27 ++
src/test/regress/expected/sanity_check.out | 1 +
src/test/regress/expected/tablesample.out | 168 ++++++++++
src/test/regress/parallel_schedule | 2 +-
src/test/regress/serial_schedule | 1 +
src/test/regress/sql/tablesample.sql | 42 +++
58 files changed, 2304 insertions(+), 30 deletions(-)
create mode 100644 src/backend/executor/nodeSamplescan.c
create mode 100644 src/backend/utils/tablesample/Makefile
create mode 100644 src/backend/utils/tablesample/bernoulli.c
create mode 100644 src/backend/utils/tablesample/system.c
create mode 100644 src/include/catalog/pg_tablesample_method.h
create mode 100644 src/include/executor/nodeSamplescan.h
create mode 100644 src/include/utils/tablesample.h
create mode 100644 src/test/regress/expected/tablesample.out
create mode 100644 src/test/regress/sql/tablesample.sql
diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index 249d541..6a813a3 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -1096,7 +1096,7 @@ file_acquire_sample_rows(Relation onerel, int elevel,
* Found a suitable tuple, so save it, replacing one old tuple
* at random
*/
- int k = (int) (targrows * sampler_random_fract());
+ int k = (int) (targrows * sampler_random_fract(rstate.randstate));
Assert(k >= 0 && k < targrows);
heap_freetuple(rows[k]);
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 59aaff7..5b2335f 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -2539,7 +2539,7 @@ analyze_row_processor(PGresult *res, int row, PgFdwAnalyzeState *astate)
if (astate->rowstoskip <= 0)
{
/* Choose a random reservoir element to replace. */
- pos = (int) (targrows * sampler_random_fract());
+ pos = (int) (targrows * sampler_random_fract(astate->rstate.randstate));
Assert(pos >= 0 && pos < targrows);
heap_freetuple(astate->rows[pos]);
}
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 515a40e..6b4e32b 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -269,6 +269,11 @@
</row>
<row>
+ <entry><link linkend="catalog-pg-tablesample-method"><structname>pg_tablesample_method</structname></link></entry>
+ <entry>table sampling methods</entry>
+ </row>
+
+ <row>
<entry><link linkend="catalog-pg-tablespace"><structname>pg_tablespace</structname></link></entry>
<entry>tablespaces within this database cluster</entry>
</row>
@@ -5989,6 +5994,113 @@
</sect1>
+ <sect1 id="catalog-pg-tablesample-method">
+ <title><structname>pg_tabesample_method</structname></title>
+
+ <indexterm zone="catalog-pg-tablesample-method">
+ <primary>pg_am</primary>
+ </indexterm>
+
+ <para>
+ The catalog <structname>pg_tablesample_method</structname> stores
+ information about table sampling methods which can be used in
+ <command>TABLESAMPLE</command> clause of a <command>SELECT</command>
+ statement.
+ </para>
+
+ <table>
+ <title><structname>pg_tablesample_method</> Columns</title>
+
+ <tgroup cols="4">
+ <thead>
+ <row>
+ <entry>Name</entry>
+ <entry>Type</entry>
+ <entry>References</entry>
+ <entry>Description</entry>
+ </row>
+ </thead>
+ <tbody>
+
+ <row>
+ <entry><structfield>oid</structfield></entry>
+ <entry><type>oid</type></entry>
+ <entry></entry>
+ <entry>Row identifier (hidden attribute; must be explicitly selected)</entry>
+ </row>
+
+ <row>
+ <entry><structfield>tsmname</structfield></entry>
+ <entry><type>name</type></entry>
+ <entry></entry>
+ <entry>Name of the sampling method</entry>
+ </row>
+
+ <row>
+ <entry><structfield>tsmseqscan</structfield></entry>
+ <entry><type>bool</type></entry>
+ <entry></entry>
+ <entry>Does the sampling method scan the whole table sequentially?
+ </entry>
+ </row>
+
+ <row>
+ <entry><structfield>tsminit</structfield></entry>
+ <entry><type>regproc</type></entry>
+ <entry><literal><link linkend="catalog-pg-proc"><structname>pg_proc</structname></link>.oid</literal></entry>
+ <entry><quote>Initialize the sampling scan</quote> function</entry>
+ </row>
+
+ <row>
+ <entry><structfield>tsmnextblock</structfield></entry>
+ <entry><type>regproc</type></entry>
+ <entry><literal><link linkend="catalog-pg-proc"><structname>pg_proc</structname></link>.oid</literal></entry>
+ <entry><quote>Get next block number</quote> function</entry>
+ </row>
+
+ <row>
+ <entry><structfield>tsmnexttuple</structfield></entry>
+ <entry><type>regproc</type></entry>
+ <entry><literal><link linkend="catalog-pg-proc"><structname>pg_proc</structname></link>.oid</literal></entry>
+ <entry><quote>Get next tuple offset</quote> function</entry>
+ </row>
+
+ <row>
+ <entry><structfield>tsmexaminetuple</structfield></entry>
+ <entry><type>regproc</type></entry>
+ <entry><literal><link linkend="catalog-pg-proc"><structname>pg_proc</structname></link>.oid</literal></entry>
+ <entry>Function which examines the tuple contents and decides if to
+ return in, or zero if none</entry>
+ </row>
+
+ <row>
+ <entry><structfield>tsmend</structfield></entry>
+ <entry><type>regproc</type></entry>
+ <entry><literal><link linkend="catalog-pg-proc"><structname>pg_proc</structname></link>.oid</literal></entry>
+ <entry><quote>End the sampling scan</quote> function</entry>
+ </row>
+
+ <row>
+ <entry><structfield>tsmreset</structfield></entry>
+ <entry><type>regproc</type></entry>
+ <entry><literal><link linkend="catalog-pg-proc"><structname>pg_proc</structname></link>.oid</literal></entry>
+ <entry><quote>Restart the state of sampling scan</quote> function</entry>
+ </row>
+
+ <row>
+ <entry><structfield>tsmcost</structfield></entry>
+ <entry><type>regproc</type></entry>
+ <entry><literal><link linkend="catalog-pg-proc"><structname>pg_proc</structname></link>.oid</literal></entry>
+ <entry>Costing function</entry>
+ </row>
+
+ </tbody>
+ </tgroup>
+ </table>
+
+ </sect1>
+
+
<sect1 id="catalog-pg-tablespace">
<title><structname>pg_tablespace</structname></title>
diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml
index 01d24a5..407bf9d 100644
--- a/doc/src/sgml/ref/select.sgml
+++ b/doc/src/sgml/ref/select.sgml
@@ -49,7 +49,7 @@ SELECT [ ALL | DISTINCT [ ON ( <replaceable class="parameter">expression</replac
<phrase>where <replaceable class="parameter">from_item</replaceable> can be one of:</phrase>
- [ ONLY ] <replaceable class="parameter">table_name</replaceable> [ * ] [ [ AS ] <replaceable class="parameter">alias</replaceable> [ ( <replaceable class="parameter">column_alias</replaceable> [, ...] ) ] ]
+ [ ONLY ] <replaceable class="parameter">table_name</replaceable> [ * ] [ TABLESAMPLE <replaceable class="parameter">sampling_method</replaceable> ( <replaceable class="parameter">argument</replaceable> [, ...] ) [ REPEATABLE ( <replaceable class="parameter">seed</replaceable> ) ] ] [ [ AS ] <replaceable class="parameter">alias</replaceable> [ ( <replaceable class="parameter">column_alias</replaceable> [, ...] ) ] ]
[ LATERAL ] ( <replaceable class="parameter">select</replaceable> ) [ AS ] <replaceable class="parameter">alias</replaceable> [ ( <replaceable class="parameter">column_alias</replaceable> [, ...] ) ]
<replaceable class="parameter">with_query_name</replaceable> [ [ AS ] <replaceable class="parameter">alias</replaceable> [ ( <replaceable class="parameter">column_alias</replaceable> [, ...] ) ] ]
[ LATERAL ] <replaceable class="parameter">function_name</replaceable> ( [ <replaceable class="parameter">argument</replaceable> [, ...] ] )
@@ -317,6 +317,42 @@ TABLE [ ONLY ] <replaceable class="parameter">table_name</replaceable> [ * ]
</varlistentry>
<varlistentry>
+ <term>TABLESAMPLE <replaceable class="parameter">sampling_method</replaceable> ( <replaceable class="parameter">argument</replaceable> [, ...] ) [ REPEATABLE ( <replaceable class="parameter">seed</replaceable> ) ]</term>
+ <listitem>
+ <para>
+ Table sample clause after
+ <replaceable class="parameter">table_name</replaceable> indicates that
+ a <replaceable class="parameter">sampling_method</replaceable> should
+ be used to retrieve subset of rows in the table.
+ The <replaceable class="parameter">sampling_method</replaceable> can be
+ one of:
+ <itemizedlist>
+ <listitem>
+ <para><literal>SYSTEM</literal></para>
+ </listitem>
+ <listitem>
+ <para><literal>BERNOULLI</literal></para>
+ </listitem>
+ </itemizedlist>
+ Both of those sampling methods currently accept only single argument
+ which is the percent (floating point from 0 to 100) of the rows to
+ be returned.
+ The <literal>SYSTEM</literal> sampling method does block level
+ sampling with each block having same chance of being selected and
+ returns all rows from each selected block.
+ The <literal>BERNOULLI</literal> scans whole table and returns
+ individual rows with equal probability.
+ The optional numeric parameter <literal>REPEATABLE</literal> is used
+ as random seed for sampling. Note that subsequent commands may return
+ different results even if same <literal>REPEATABLE</literal> clause
+ was specified. This happens because <acronym>DML</acronym> statements
+ and maintenance operations such as <command>VACUUM</> affect physical
+ distribution of data.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><replaceable class="parameter">alias</replaceable></term>
<listitem>
<para>
diff --git a/src/backend/access/Makefile b/src/backend/access/Makefile
index 21721b4..238057a 100644
--- a/src/backend/access/Makefile
+++ b/src/backend/access/Makefile
@@ -8,6 +8,7 @@ subdir = src/backend/access
top_builddir = ../../..
include $(top_builddir)/src/Makefile.global
-SUBDIRS = brin common gin gist hash heap index nbtree rmgrdesc spgist transam
+SUBDIRS = brin common gin gist hash heap index nbtree rmgrdesc spgist \
+ transam
include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index cb6f8a3..4cd3223 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -293,9 +293,10 @@ initscan(HeapScanDesc scan, ScanKey key, bool is_rescan)
/*
* Currently, we don't have a stats counter for bitmap heap scans (but the
- * underlying bitmap index scans will be counted).
+ * underlying bitmap index scans will be counted) or sample scans (we only
+ * update stats for tuple fetches there)
*/
- if (!scan->rs_bitmapscan)
+ if (!scan->rs_bitmapscan && !scan->rs_samplescan)
pgstat_count_heap_scan(scan->rs_rd);
}
@@ -314,7 +315,7 @@ heap_setscanlimits(HeapScanDesc scan, BlockNumber startBlk, BlockNumber numBlks)
* In page-at-a-time mode it performs additional work, namely determining
* which tuples on the page are visible.
*/
-static void
+void
heapgetpage(HeapScanDesc scan, BlockNumber page)
{
Buffer buffer;
diff --git a/src/backend/catalog/Makefile b/src/backend/catalog/Makefile
index a403c64..34db3e6 100644
--- a/src/backend/catalog/Makefile
+++ b/src/backend/catalog/Makefile
@@ -39,7 +39,7 @@ POSTGRES_BKI_SRCS = $(addprefix $(top_srcdir)/src/include/catalog/,\
pg_ts_config.h pg_ts_config_map.h pg_ts_dict.h \
pg_ts_parser.h pg_ts_template.h pg_extension.h \
pg_foreign_data_wrapper.h pg_foreign_server.h pg_user_mapping.h \
- pg_foreign_table.h pg_policy.h \
+ pg_foreign_table.h pg_policy.h pg_tablesample_method.h \
pg_default_acl.h pg_seclabel.h pg_shseclabel.h pg_collation.h pg_range.h \
toasting.h indexing.h \
)
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index fc9dd44..63feb07 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -1146,7 +1146,7 @@ acquire_sample_rows(Relation onerel, int elevel,
* Found a suitable tuple, so save it, replacing one
* old tuple at random
*/
- int k = (int) (targrows * sampler_random_fract());
+ int k = (int) (targrows * sampler_random_fract(rstate.randstate));
Assert(k >= 0 && k < targrows);
heap_freetuple(rows[k]);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index a951c55..20cc7a1 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -732,6 +732,7 @@ ExplainPreScanNode(PlanState *planstate, Bitmapset **rels_used)
case T_WorkTableScan:
case T_ForeignScan:
case T_CustomScan:
+ case T_SampleScan:
*rels_used = bms_add_member(*rels_used,
((Scan *) plan)->scanrelid);
break;
@@ -957,6 +958,9 @@ ExplainNode(PlanState *planstate, List *ancestors,
else
pname = sname;
break;
+ case T_SampleScan:
+ pname = sname = "Sample Scan";
+ break;
case T_Material:
pname = sname = "Materialize";
break;
@@ -1074,6 +1078,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
case T_WorkTableScan:
case T_ForeignScan:
case T_CustomScan:
+ case T_SampleScan:
ExplainScanTarget((Scan *) plan, es);
break;
case T_IndexScan:
@@ -1326,6 +1331,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
case T_CteScan:
case T_WorkTableScan:
case T_SubqueryScan:
+ case T_SampleScan:
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
@@ -2220,6 +2226,7 @@ ExplainTargetRel(Plan *plan, Index rti, ExplainState *es)
case T_TidScan:
case T_ForeignScan:
case T_CustomScan:
+ case T_SampleScan:
case T_ModifyTable:
/* Assert it's on a real relation */
Assert(rte->rtekind == RTE_RELATION);
diff --git a/src/backend/executor/Makefile b/src/backend/executor/Makefile
index af707b0..75f799c 100644
--- a/src/backend/executor/Makefile
+++ b/src/backend/executor/Makefile
@@ -21,7 +21,7 @@ OBJS = execAmi.o execCurrent.o execGrouping.o execJunk.o execMain.o \
nodeLimit.o nodeLockRows.o \
nodeMaterial.o nodeMergeAppend.o nodeMergejoin.o nodeModifyTable.o \
nodeNestloop.o nodeFunctionscan.o nodeRecursiveunion.o nodeResult.o \
- nodeSeqscan.o nodeSetOp.o nodeSort.o nodeUnique.o \
+ nodeSamplescan.o nodeSeqscan.o nodeSetOp.o nodeSort.o nodeUnique.o \
nodeValuesscan.o nodeCtescan.o nodeWorktablescan.o \
nodeGroup.o nodeSubplan.o nodeSubqueryscan.o nodeTidscan.o \
nodeForeignscan.o nodeWindowAgg.o tstoreReceiver.o spi.o
diff --git a/src/backend/executor/execAmi.c b/src/backend/executor/execAmi.c
index 6ebad2f..4948a26 100644
--- a/src/backend/executor/execAmi.c
+++ b/src/backend/executor/execAmi.c
@@ -39,6 +39,7 @@
#include "executor/nodeNestloop.h"
#include "executor/nodeRecursiveunion.h"
#include "executor/nodeResult.h"
+#include "executor/nodeSamplescan.h"
#include "executor/nodeSeqscan.h"
#include "executor/nodeSetOp.h"
#include "executor/nodeSort.h"
@@ -155,6 +156,10 @@ ExecReScan(PlanState *node)
ExecReScanSeqScan((SeqScanState *) node);
break;
+ case T_SampleScanState:
+ ExecReScanSampleScan((SampleScanState *) node);
+ break;
+
case T_IndexScanState:
ExecReScanIndexScan((IndexScanState *) node);
break;
@@ -480,6 +485,9 @@ ExecSupportsBackwardScan(Plan *node)
}
return false;
+ case T_SampleScan:
+ return false;
+
case T_Material:
case T_Sort:
/* these don't evaluate tlist */
diff --git a/src/backend/executor/execCurrent.c b/src/backend/executor/execCurrent.c
index 1c8be25..5cfe549 100644
--- a/src/backend/executor/execCurrent.c
+++ b/src/backend/executor/execCurrent.c
@@ -261,6 +261,7 @@ search_plan_tree(PlanState *node, Oid table_oid)
* Relation scan nodes can all be treated alike
*/
case T_SeqScanState:
+ case T_SampleScanState:
case T_IndexScanState:
case T_IndexOnlyScanState:
case T_BitmapHeapScanState:
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 9892499..03c2feb 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -102,6 +102,7 @@
#include "executor/nodeNestloop.h"
#include "executor/nodeRecursiveunion.h"
#include "executor/nodeResult.h"
+#include "executor/nodeSamplescan.h"
#include "executor/nodeSeqscan.h"
#include "executor/nodeSetOp.h"
#include "executor/nodeSort.h"
@@ -190,6 +191,11 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
estate, eflags);
break;
+ case T_SampleScan:
+ result = (PlanState *) ExecInitSampleScan((SampleScan *) node,
+ estate, eflags);
+ break;
+
case T_IndexScan:
result = (PlanState *) ExecInitIndexScan((IndexScan *) node,
estate, eflags);
@@ -406,6 +412,10 @@ ExecProcNode(PlanState *node)
result = ExecSeqScan((SeqScanState *) node);
break;
+ case T_SampleScanState:
+ result = ExecSampleScan((SampleScanState *) node);
+ break;
+
case T_IndexScanState:
result = ExecIndexScan((IndexScanState *) node);
break;
@@ -644,6 +654,10 @@ ExecEndNode(PlanState *node)
ExecEndSeqScan((SeqScanState *) node);
break;
+ case T_SampleScanState:
+ ExecEndSampleScan((SampleScanState *) node);
+ break;
+
case T_IndexScanState:
ExecEndIndexScan((IndexScanState *) node);
break;
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
new file mode 100644
index 0000000..34ea4ab
--- /dev/null
+++ b/src/backend/executor/nodeSamplescan.c
@@ -0,0 +1,500 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeSamplescan.c
+ * Support routines for sample scans of relations (table sampling).
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ * src/backend/executor/nodeSamplescan.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/relscan.h"
+#include "catalog/pg_tablesample_method.h"
+#include "executor/executor.h"
+#include "executor/nodeSamplescan.h"
+#include "miscadmin.h"
+#include "parser/parsetree.h"
+#include "pgstat.h"
+#include "storage/bufmgr.h"
+#include "storage/predicate.h"
+#include "utils/rel.h"
+#include "utils/syscache.h"
+#include "utils/tqual.h"
+
+static void InitScanRelation(SampleScanState *node, EState *estate,
+ int eflags, TableSampleClause *tablesample);
+static TupleTableSlot *SampleNext(SampleScanState *node);
+static HeapTuple samplenexttup(SampleScanState *node, HeapScanDesc scan);
+
+
+/*
+ * Initialize the sampling method - loads function info and
+ * calls the tsminit function.
+ *
+ * We need special handling for this because the tsminit function
+ * is allowed to take optional additional arguments.
+ */
+static void
+InitSamplingMethod(SampleScanState *scanstate, TableSampleClause *tablesample)
+{
+ FunctionCallInfoData fcinfo;
+ int i;
+ List *args = tablesample->args;
+ ListCell *arg;
+ ExprContext *econtext = scanstate->ss.ps.ps_ExprContext;
+
+ /* Load functions */
+ fmgr_info(tablesample->tsminit, &(scanstate->tsminit));
+ fmgr_info(tablesample->tsmnextblock, &(scanstate->tsmnextblock));
+ fmgr_info(tablesample->tsmnexttuple, &(scanstate->tsmnexttuple));
+ if (OidIsValid(tablesample->tsmexaminetuple))
+ fmgr_info(tablesample->tsmexaminetuple, &(scanstate->tsmexaminetuple));
+ else
+ scanstate->tsmexaminetuple.fn_oid = InvalidOid;
+ fmgr_info(tablesample->tsmend, &(scanstate->tsmend));
+ fmgr_info(tablesample->tsmreset, &(scanstate->tsmreset));
+
+ InitFunctionCallInfoData(fcinfo, &scanstate->tsminit,
+ list_length(args) + 2,
+ InvalidOid, NULL, NULL);
+
+ /* First arg is always SampleScanState */
+ fcinfo.arg[0] = PointerGetDatum(scanstate);
+ fcinfo.argnull[0] = false;
+
+ /*
+ * Second arg is always REPEATABLE
+ * When tablesample->repeatable is NULL then REPEATABLE clause was not
+ * specified.
+ * When specified, the expression cannot evaluate to NULL.
+ */
+ if (tablesample->repeatable)
+ {
+ ExprState *argstate = ExecInitExpr((Expr *) tablesample->repeatable,
+ (PlanState *) scanstate);
+ fcinfo.arg[1] = ExecEvalExpr(argstate, econtext,
+ &fcinfo.argnull[1], NULL);
+ if (fcinfo.argnull[1])
+ ereport(ERROR,
+ (errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED),
+ errmsg("REPEATABLE clause cannot be NULL")));
+ }
+ else
+ {
+ fcinfo.arg[1] = UInt32GetDatum(random());
+ fcinfo.argnull[1] = false;
+ }
+
+
+ i = 2;
+ foreach(arg, args)
+ {
+ Expr *argexpr = (Expr *) lfirst(arg);
+ ExprState *argstate = ExecInitExpr(argexpr, (PlanState *) scanstate);
+
+ if (argstate == NULL)
+ {
+ fcinfo.argnull[i] = true;
+ fcinfo.arg[i] = (Datum) 0;;
+ }
+
+ fcinfo.arg[i] = ExecEvalExpr(argstate, econtext,
+ &fcinfo.argnull[i], NULL);
+ i++;
+ }
+ Assert(i == fcinfo.nargs);
+
+ (void) FunctionCallInvoke(&fcinfo);
+}
+
+
+/* ----------------------------------------------------------------
+ * Scan Support
+ * ----------------------------------------------------------------
+ */
+
+/* ----------------------------------------------------------------
+ * SampleNext
+ *
+ * This is a workhorse for ExecSampleScan
+ * ----------------------------------------------------------------
+ */
+static TupleTableSlot *
+SampleNext(SampleScanState *node)
+{
+ TupleTableSlot *slot;
+ HeapScanDesc scan;
+ HeapTuple tuple;
+
+ /*
+ * get information from the scan state
+ */
+ slot = node->ss.ss_ScanTupleSlot;
+ scan = node->ss.ss_currentScanDesc;
+
+ tuple = samplenexttup(node, scan);
+
+ if (tuple)
+ ExecStoreTuple(tuple, /* tuple to store */
+ slot, /* slot to store in */
+ scan->rs_cbuf, /* buffer associated with this tuple */
+ false); /* don't pfree this pointer */
+ else
+ ExecClearTuple(slot);
+
+ return slot;
+}
+
+static HeapTuple
+samplenexttup(SampleScanState *node, HeapScanDesc scan)
+{
+ HeapTuple tuple = &(scan->rs_ctup);
+ Snapshot snapshot = scan->rs_snapshot;
+ BlockNumber blockno;
+ Page page;
+ ItemId itemid;
+ OffsetNumber tupoffset,
+ maxoffset;
+
+ if (!scan->rs_inited)
+ {
+ /*
+ * return null immediately if relation is empty
+ */
+ if (scan->rs_nblocks == 0)
+ {
+ Assert(!BufferIsValid(scan->rs_cbuf));
+ tuple->t_data = NULL;
+ return NULL;
+ }
+ blockno = DatumGetInt32(FunctionCall1(&node->tsmnextblock,
+ PointerGetDatum(node)));
+ if (!BlockNumberIsValid(blockno))
+ {
+ tuple->t_data = NULL;
+ return NULL;
+ }
+
+ heapgetpage(scan, blockno);
+ scan->rs_inited = true;
+ }
+ else
+ {
+ /* continue from previously returned page/tuple */
+ blockno = scan->rs_cblock; /* current page */
+ }
+
+ LockBuffer(scan->rs_cbuf, BUFFER_LOCK_SHARE);
+
+ page = (Page) BufferGetPage(scan->rs_cbuf);
+ maxoffset = PageGetMaxOffsetNumber(page);
+
+ for (;;)
+ {
+ CHECK_FOR_INTERRUPTS();
+
+ tupoffset = DatumGetUInt16(FunctionCall3(&node->tsmnexttuple,
+ PointerGetDatum(node),
+ UInt32GetDatum(blockno),
+ UInt16GetDatum(maxoffset)));
+
+ if (OffsetNumberIsValid(tupoffset))
+ {
+ bool visible;
+ bool found;
+
+ /* Skip invalid tuple pointers. */
+ itemid = PageGetItemId(page, tupoffset);
+ if (!ItemIdIsNormal(itemid))
+ continue;
+
+ tuple->t_data = (HeapTupleHeader) PageGetItem((Page) page, itemid);
+ tuple->t_len = ItemIdGetLength(itemid);
+ ItemPointerSet(&(tuple->t_self), blockno, tupoffset);
+
+ visible = HeapTupleSatisfiesVisibility(tuple, snapshot, scan->rs_cbuf);
+
+ CheckForSerializableConflictOut(visible, scan->rs_rd, tuple,
+ scan->rs_cbuf, snapshot);
+
+ /*
+ * Let the sampling method examine the actual tuple and decide if we
+ * should return it.
+ *
+ * Note that we let it examine even invisible tuples for
+ * statistical purposes, but not return them since user should
+ * never see invisible tuples.
+ */
+ if (OidIsValid(node->tsmexaminetuple.fn_oid))
+ {
+ found = DatumGetBool(FunctionCall4(&node->tsmexaminetuple,
+ PointerGetDatum(node),
+ UInt32GetDatum(blockno),
+ PointerGetDatum(tuple),
+ BoolGetDatum(visible)));
+ /* Should not happen if sampling method is well written. */
+ if (found && !visible)
+ elog(ERROR, "Sampling method wanted to return invisible tuple");
+ }
+ else
+ found = visible;
+
+ /* Found visible tuple, return it. */
+ if (found)
+ {
+ LockBuffer(scan->rs_cbuf, BUFFER_LOCK_UNLOCK);
+ break;
+ }
+ else
+ {
+ /* Try next tuple from same page. */
+ continue;
+ }
+ }
+
+ LockBuffer(scan->rs_cbuf, BUFFER_LOCK_UNLOCK);
+ blockno = DatumGetInt32(FunctionCall1(&node->tsmnextblock,
+ PointerGetDatum(node)));
+
+ /*
+ * Report our new scan position for synchronization purposes. We
+ * don't do that when moving backwards, however. That would just
+ * mess up any other forward-moving scanners.
+ *
+ * Note: we do this before checking for end of scan so that the
+ * final state of the position hint is back at the start of the
+ * rel. That's not strictly necessary, but otherwise when you run
+ * the same query multiple times the starting position would shift
+ * a little bit backwards on every invocation, which is confusing.
+ * We don't guarantee any specific ordering in general, though.
+ */
+ if (scan->rs_syncscan)
+ ss_report_location(scan->rs_rd, BlockNumberIsValid(blockno) ?
+ blockno : scan->rs_startblock);
+
+ /*
+ * Reached end of scan.
+ */
+ if (!BlockNumberIsValid(blockno))
+ {
+ if (BufferIsValid(scan->rs_cbuf))
+ ReleaseBuffer(scan->rs_cbuf);
+ scan->rs_cbuf = InvalidBuffer;
+ scan->rs_cblock = InvalidBlockNumber;
+ tuple->t_data = NULL;
+ scan->rs_inited = false;
+ return NULL;
+ }
+
+ heapgetpage(scan, blockno);
+ LockBuffer(scan->rs_cbuf, BUFFER_LOCK_SHARE);
+
+ page = (Page) BufferGetPage(scan->rs_cbuf);
+ maxoffset = PageGetMaxOffsetNumber(page);
+ }
+
+ pgstat_count_heap_getnext(scan->rs_rd);
+
+ return &(scan->rs_ctup);
+}
+
+/*
+ * SampleRecheck -- access method routine to recheck a tuple in EvalPlanQual
+ */
+static bool
+SampleRecheck(SampleScanState *node, TupleTableSlot *slot)
+{
+ /* No need to recheck for SampleScan */
+ return true;
+}
+
+/* ----------------------------------------------------------------
+ * ExecSampleScan(node)
+ *
+ * Scans the relation using the sampling method and returns
+ * the next qualifying tuple.
+ * We call the ExecScan() routine and pass it the appropriate
+ * access method functions.
+ * ----------------------------------------------------------------
+ */
+TupleTableSlot *
+ExecSampleScan(SampleScanState *node)
+{
+ return ExecScan((ScanState *) node,
+ (ExecScanAccessMtd) SampleNext,
+ (ExecScanRecheckMtd) SampleRecheck);
+}
+
+/* ----------------------------------------------------------------
+ * InitScanRelation
+ *
+ * Set up to access the scan relation.
+ * ----------------------------------------------------------------
+ */
+static void
+InitScanRelation(SampleScanState *node, EState *estate, int eflags,
+ TableSampleClause *tablesample)
+{
+ Relation currentRelation;
+
+ /*
+ * get the relation object id from the relid'th entry in the range table,
+ * open that relation and acquire appropriate lock on it.
+ */
+ currentRelation = ExecOpenScanRelation(estate,
+ ((SampleScan *) node->ss.ps.plan)->scanrelid,
+ eflags);
+
+ node->ss.ss_currentRelation = currentRelation;
+
+ /*
+ * Even though we aren't going to do a conventional seqscan, it is useful
+ * to create a HeapScanDesc --- many of the fields in it are usable.
+ */
+ node->ss.ss_currentScanDesc =
+ heap_beginscan_strat(currentRelation,
+ estate->es_snapshot,
+ 0, NULL, false,
+ tablesample->tsmseqscan);
+
+ /*
+ * Page at a time mode is useless for us as we need to check visibility
+ * of tuples individually because tuple offsets returned by sampling
+ * methods map to rs_vistuples values and not to its indexes.
+ */
+ node->ss.ss_currentScanDesc->rs_pageatatime = false;
+
+ /* and report the scan tuple slot's rowtype */
+ ExecAssignScanType(&node->ss, RelationGetDescr(currentRelation));
+}
+
+
+/* ----------------------------------------------------------------
+ * ExecInitSampleScan
+ * ----------------------------------------------------------------
+ */
+SampleScanState *
+ExecInitSampleScan(SampleScan *node, EState *estate, int eflags)
+{
+ SampleScanState *scanstate;
+ RangeTblEntry *rte = rt_fetch(node->scanrelid,
+ estate->es_range_table);
+
+ Assert(outerPlan(node) == NULL);
+ Assert(innerPlan(node) == NULL);
+ Assert(rte->tablesample != NULL);
+
+ /*
+ * create state structure
+ */
+ scanstate = makeNode(SampleScanState);
+ scanstate->ss.ps.plan = (Plan *) node;
+ scanstate->ss.ps.state = estate;
+
+ /*
+ * Miscellaneous initialization
+ *
+ * create expression context for node
+ */
+ ExecAssignExprContext(estate, &scanstate->ss.ps);
+
+ /*
+ * initialize child expressions
+ */
+ scanstate->ss.ps.targetlist = (List *)
+ ExecInitExpr((Expr *) node->plan.targetlist,
+ (PlanState *) scanstate);
+ scanstate->ss.ps.qual = (List *)
+ ExecInitExpr((Expr *) node->plan.qual,
+ (PlanState *) scanstate);
+
+ /*
+ * tuple table initialization
+ */
+ ExecInitResultTupleSlot(estate, &scanstate->ss.ps);
+ ExecInitScanTupleSlot(estate, &scanstate->ss);
+
+ /*
+ * initialize scan relation
+ */
+ InitScanRelation(scanstate, estate, eflags, rte->tablesample);
+
+ scanstate->ss.ps.ps_TupFromTlist = false;
+
+ /*
+ * Initialize result tuple type and projection info.
+ */
+ ExecAssignResultTypeFromTL(&scanstate->ss.ps);
+ ExecAssignScanProjectionInfo(&scanstate->ss);
+
+ InitSamplingMethod(scanstate, rte->tablesample);
+
+ return scanstate;
+}
+
+/* ----------------------------------------------------------------
+ * ExecEndSampleScan
+ *
+ * frees any storage allocated through C routines.
+ * ----------------------------------------------------------------
+ */
+void
+ExecEndSampleScan(SampleScanState *node)
+{
+ /*
+ * Tell sampling function that we finished thes can.
+ */
+ FunctionCall1(&node->tsmend, PointerGetDatum(node));
+
+ /*
+ * Free the exprcontext
+ */
+ ExecFreeExprContext(&node->ss.ps);
+
+ /*
+ * clean out the tuple table
+ */
+ ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
+
+ /*
+ * close heap scan
+ */
+ heap_endscan(node->ss.ss_currentScanDesc);
+
+ /*
+ * close the heap relation.
+ */
+ ExecCloseScanRelation(node->ss.ss_currentRelation);
+}
+
+/* ----------------------------------------------------------------
+ * Join Support
+ * ----------------------------------------------------------------
+ */
+
+/* ----------------------------------------------------------------
+ * ExecReScanSampleScan
+ *
+ * Rescans the relation.
+ *
+ * ----------------------------------------------------------------
+ */
+void
+ExecReScanSampleScan(SampleScanState *node)
+{
+ heap_rescan(node->ss.ss_currentScanDesc, NULL);
+
+ /*
+ * Tell sampling function to reset its state for rescan.
+ */
+ FunctionCall1(&node->tsmreset, PointerGetDatum(node));
+
+ ExecScanReScan(&node->ss);
+}
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index e5b0dce..d47b6ca 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -629,6 +629,22 @@ _copyCustomScan(const CustomScan *from)
}
/*
+ * _copySampleScan
+ */
+static SampleScan *
+_copySampleScan(const SampleScan *from)
+{
+ SampleScan *newnode = makeNode(SampleScan);
+
+ /*
+ * copy node superclass fields
+ */
+ CopyScanFields((const Scan *) from, (Scan *) newnode);
+
+ return newnode;
+}
+
+/*
* CopyJoinFields
*
* This function copies the fields of the Join node. It is used by
@@ -2007,6 +2023,7 @@ _copyRangeTblEntry(const RangeTblEntry *from)
COPY_SCALAR_FIELD(rtekind);
COPY_SCALAR_FIELD(relid);
COPY_SCALAR_FIELD(relkind);
+ COPY_NODE_FIELD(tablesample);
COPY_NODE_FIELD(subquery);
COPY_SCALAR_FIELD(security_barrier);
COPY_SCALAR_FIELD(jointype);
@@ -2139,6 +2156,39 @@ _copyCommonTableExpr(const CommonTableExpr *from)
return newnode;
}
+static RangeTableSample *
+_copyRangeTableSample(const RangeTableSample *from)
+{
+ RangeTableSample *newnode = makeNode(RangeTableSample);
+
+ COPY_NODE_FIELD(relation);
+ COPY_STRING_FIELD(method);
+ COPY_NODE_FIELD(repeatable);
+ COPY_NODE_FIELD(args);
+
+ return newnode;
+}
+
+static TableSampleClause *
+_copyTableSampleClause(const TableSampleClause *from)
+{
+ TableSampleClause *newnode = makeNode(TableSampleClause);
+
+ COPY_SCALAR_FIELD(tsmid);
+ COPY_SCALAR_FIELD(tsmseqscan);
+ COPY_SCALAR_FIELD(tsminit);
+ COPY_SCALAR_FIELD(tsmnextblock);
+ COPY_SCALAR_FIELD(tsmnexttuple);
+ COPY_SCALAR_FIELD(tsmexaminetuple);
+ COPY_SCALAR_FIELD(tsmend);
+ COPY_SCALAR_FIELD(tsmreset);
+ COPY_SCALAR_FIELD(tsmcost);
+ COPY_NODE_FIELD(repeatable);
+ COPY_NODE_FIELD(args);
+
+ return newnode;
+}
+
static A_Expr *
_copyAExpr(const A_Expr *from)
{
@@ -4076,6 +4126,9 @@ copyObject(const void *from)
case T_CustomScan:
retval = _copyCustomScan(from);
break;
+ case T_SampleScan:
+ retval = _copySampleScan(from);
+ break;
case T_Join:
retval = _copyJoin(from);
break;
@@ -4724,6 +4777,12 @@ copyObject(const void *from)
case T_CommonTableExpr:
retval = _copyCommonTableExpr(from);
break;
+ case T_RangeTableSample:
+ retval = _copyRangeTableSample(from);
+ break;
+ case T_TableSampleClause:
+ retval = _copyTableSampleClause(from);
+ break;
case T_PrivGrantee:
retval = _copyPrivGrantee(from);
break;
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 6e8b308..165c4e5 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -2323,6 +2323,7 @@ _equalRangeTblEntry(const RangeTblEntry *a, const RangeTblEntry *b)
COMPARE_SCALAR_FIELD(rtekind);
COMPARE_SCALAR_FIELD(relid);
COMPARE_SCALAR_FIELD(relkind);
+ COMPARE_NODE_FIELD(tablesample);
COMPARE_NODE_FIELD(subquery);
COMPARE_SCALAR_FIELD(security_barrier);
COMPARE_SCALAR_FIELD(jointype);
@@ -2442,6 +2443,35 @@ _equalCommonTableExpr(const CommonTableExpr *a, const CommonTableExpr *b)
}
static bool
+_equalRangeTableSample(const RangeTableSample *a, const RangeTableSample *b)
+{
+ COMPARE_NODE_FIELD(relation);
+ COMPARE_STRING_FIELD(method);
+ COMPARE_NODE_FIELD(repeatable);
+ COMPARE_NODE_FIELD(args);
+
+ return true;
+}
+
+static bool
+_equalTableSampleClause(const TableSampleClause *a, const TableSampleClause *b)
+{
+ COMPARE_SCALAR_FIELD(tsmid);
+ COMPARE_SCALAR_FIELD(tsmseqscan);
+ COMPARE_SCALAR_FIELD(tsminit);
+ COMPARE_SCALAR_FIELD(tsmnextblock);
+ COMPARE_SCALAR_FIELD(tsmnexttuple);
+ COMPARE_SCALAR_FIELD(tsmexaminetuple);
+ COMPARE_SCALAR_FIELD(tsmend);
+ COMPARE_SCALAR_FIELD(tsmreset);
+ COMPARE_SCALAR_FIELD(tsmcost);
+ COMPARE_NODE_FIELD(repeatable);
+ COMPARE_NODE_FIELD(args);
+
+ return true;
+}
+
+static bool
_equalXmlSerialize(const XmlSerialize *a, const XmlSerialize *b)
{
COMPARE_SCALAR_FIELD(xmloption);
@@ -3150,6 +3180,12 @@ equal(const void *a, const void *b)
case T_CommonTableExpr:
retval = _equalCommonTableExpr(a, b);
break;
+ case T_RangeTableSample:
+ retval = _equalRangeTableSample(a, b);
+ break;
+ case T_TableSampleClause:
+ retval = _equalTableSampleClause(a, b);
+ break;
case T_PrivGrantee:
retval = _equalPrivGrantee(a, b);
break;
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index 21dfda7..bd9ce09 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -3209,6 +3209,18 @@ raw_expression_tree_walker(Node *node,
return walker(((WithClause *) node)->ctes, context);
case T_CommonTableExpr:
return walker(((CommonTableExpr *) node)->ctequery, context);
+ case T_RangeTableSample:
+ {
+ RangeTableSample *rts = (RangeTableSample *) node;
+
+ if (walker(rts->relation, context))
+ return true;
+ if (walker(rts->repeatable, context))
+ return true;
+ if (walker(rts->args, context))
+ return true;
+ }
+ break;
default:
elog(ERROR, "unrecognized node type: %d",
(int) nodeTag(node));
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 8486448..a4f92ad 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -579,6 +579,14 @@ _outCustomScan(StringInfo str, const CustomScan *node)
}
static void
+_outSampleScan(StringInfo str, const SampleScan *node)
+{
+ WRITE_NODE_TYPE("SAMPLESCAN");
+
+ _outScanInfo(str, (const Scan *) node);
+}
+
+static void
_outJoin(StringInfo str, const Join *node)
{
WRITE_NODE_TYPE("JOIN");
@@ -2392,6 +2400,35 @@ _outCommonTableExpr(StringInfo str, const CommonTableExpr *node)
}
static void
+_outRangeTableSample(StringInfo str, const RangeTableSample *node)
+{
+ WRITE_NODE_TYPE("RANGETABLESAMPLE");
+
+ WRITE_NODE_FIELD(relation);
+ WRITE_STRING_FIELD(method);
+ WRITE_NODE_FIELD(repeatable);
+ WRITE_NODE_FIELD(args);
+}
+
+static void
+_outTableSampleClause(StringInfo str, const TableSampleClause *node)
+{
+ WRITE_NODE_TYPE("TABLESAMPLECLAUSE");
+
+ WRITE_OID_FIELD(tsmid);
+ WRITE_BOOL_FIELD(tsmseqscan);
+ WRITE_OID_FIELD(tsminit);
+ WRITE_OID_FIELD(tsmnextblock);
+ WRITE_OID_FIELD(tsmnexttuple);
+ WRITE_OID_FIELD(tsmexaminetuple);
+ WRITE_OID_FIELD(tsmend);
+ WRITE_OID_FIELD(tsmreset);
+ WRITE_OID_FIELD(tsmcost);
+ WRITE_NODE_FIELD(repeatable);
+ WRITE_NODE_FIELD(args);
+}
+
+static void
_outSetOperationStmt(StringInfo str, const SetOperationStmt *node)
{
WRITE_NODE_TYPE("SETOPERATIONSTMT");
@@ -2421,6 +2458,7 @@ _outRangeTblEntry(StringInfo str, const RangeTblEntry *node)
case RTE_RELATION:
WRITE_OID_FIELD(relid);
WRITE_CHAR_FIELD(relkind);
+ WRITE_NODE_FIELD(tablesample);
break;
case RTE_SUBQUERY:
WRITE_NODE_FIELD(subquery);
@@ -2888,6 +2926,9 @@ _outNode(StringInfo str, const void *obj)
case T_CustomScan:
_outCustomScan(str, obj);
break;
+ case T_SampleScan:
+ _outSampleScan(str, obj);
+ break;
case T_Join:
_outJoin(str, obj);
break;
@@ -3229,6 +3270,12 @@ _outNode(StringInfo str, const void *obj)
case T_CommonTableExpr:
_outCommonTableExpr(str, obj);
break;
+ case T_RangeTableSample:
+ _outRangeTableSample(str, obj);
+ break;
+ case T_TableSampleClause:
+ _outTableSampleClause(str, obj);
+ break;
case T_SetOperationStmt:
_outSetOperationStmt(str, obj);
break;
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index ae24d05..076c958 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -350,6 +350,45 @@ _readCommonTableExpr(void)
}
/*
+ * _readRangeTableSample
+ */
+static RangeTableSample *
+_readRangeTableSample(void)
+{
+ READ_LOCALS(RangeTableSample);
+
+ READ_NODE_FIELD(relation);
+ READ_STRING_FIELD(method);
+ READ_NODE_FIELD(repeatable);
+ READ_NODE_FIELD(args);
+
+ READ_DONE();
+}
+
+/*
+ * _readTableSampleClause
+ */
+static TableSampleClause *
+_readTableSampleClause(void)
+{
+ READ_LOCALS(TableSampleClause);
+
+ READ_OID_FIELD(tsmid);
+ READ_BOOL_FIELD(tsmseqscan);
+ READ_OID_FIELD(tsminit);
+ READ_OID_FIELD(tsmnextblock);
+ READ_OID_FIELD(tsmnexttuple);
+ READ_OID_FIELD(tsmexaminetuple);
+ READ_OID_FIELD(tsmend);
+ READ_OID_FIELD(tsmreset);
+ READ_OID_FIELD(tsmcost);
+ READ_NODE_FIELD(repeatable);
+ READ_NODE_FIELD(args);
+
+ READ_DONE();
+}
+
+/*
* _readSetOperationStmt
*/
static SetOperationStmt *
@@ -1216,6 +1255,7 @@ _readRangeTblEntry(void)
case RTE_RELATION:
READ_OID_FIELD(relid);
READ_CHAR_FIELD(relkind);
+ READ_NODE_FIELD(tablesample);
break;
case RTE_SUBQUERY:
READ_NODE_FIELD(subquery);
@@ -1311,6 +1351,10 @@ parseNodeString(void)
return_value = _readRowMarkClause();
else if (MATCH("COMMONTABLEEXPR", 15))
return_value = _readCommonTableExpr();
+ else if (MATCH("RANGETABLESAMPLE", 16))
+ return_value = _readRangeTableSample();
+ else if (MATCH("TABLESAMPLECLAUSE", 17))
+ return_value = _readTableSampleClause();
else if (MATCH("SETOPERATIONSTMT", 16))
return_value = _readSetOperationStmt();
else if (MATCH("ALIAS", 5))
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 58d78e6..5f12477 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -71,6 +71,10 @@ static void set_plain_rel_size(PlannerInfo *root, RelOptInfo *rel,
RangeTblEntry *rte);
static void set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
RangeTblEntry *rte);
+static void set_tablesample_rel_size(PlannerInfo *root, RelOptInfo *rel,
+ RangeTblEntry *rte);
+static void set_tablesample_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
+ RangeTblEntry *rte);
static void set_foreign_size(PlannerInfo *root, RelOptInfo *rel,
RangeTblEntry *rte);
static void set_foreign_pathlist(PlannerInfo *root, RelOptInfo *rel,
@@ -265,6 +269,11 @@ set_rel_size(PlannerInfo *root, RelOptInfo *rel,
/* Foreign table */
set_foreign_size(root, rel, rte);
}
+ else if (rte->tablesample != NULL)
+ {
+ /* Sampled relation */
+ set_tablesample_rel_size(root, rel, rte);
+ }
else
{
/* Plain relation */
@@ -332,6 +341,11 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
/* Foreign table */
set_foreign_pathlist(root, rel, rte);
}
+ else if (rte->tablesample != NULL)
+ {
+ /* Build sample scan on relation */
+ set_tablesample_rel_pathlist(root, rel, rte);
+ }
else
{
/* Plain relation */
@@ -418,6 +432,41 @@ set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
}
/*
+ * set_tablesample_rel_size
+ * Set size estimates for a sampled relation.
+ */
+static void
+set_tablesample_rel_size(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
+{
+ /* Mark rel with estimated output rows, width, etc */
+ set_baserel_size_estimates(root, rel);
+}
+
+/*
+ * set_tablesample_rel_pathlist
+ * Build access paths for a sampled relation
+ *
+ * There is only one possible path - sampling scan
+ */
+static void
+set_tablesample_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
+{
+ Relids required_outer;
+ Path *path;
+
+ /*
+ * We don't support pushing join clauses into the quals of a seqscan, but
+ * it could still have required parameterization due to LATERAL refs in
+ * its tlist.
+ */
+ required_outer = rel->lateral_relids;
+
+ /* We only do sample scan if it was requested */
+ path = create_samplescan_path(root, rel, required_outer);
+ rel->pathlist = list_make1(path);
+}
+
+/*
* set_foreign_size
* Set size estimates for a foreign table RTE
*/
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 78ef229..d8523de 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -90,6 +90,7 @@
#include "utils/lsyscache.h"
#include "utils/selfuncs.h"
#include "utils/spccache.h"
+#include "utils/tablesample.h"
#include "utils/tuplesort.h"
@@ -219,6 +220,73 @@ cost_seqscan(Path *path, PlannerInfo *root,
}
/*
+ * cost_samplescan
+ * Determines and returns the cost of scanning a relation using sampling.
+ *
+ * From planner/optimizer perspective, we dont't care all that much about cost
+ * itself since there is always only one scan path to consider when sampling
+ * scan is present, but number of rows estimation is still important.
+ *
+ * 'baserel' is the relation to be scanned
+ * 'param_info' is the ParamPathInfo if this is a parameterized path, else NULL
+ */
+void
+cost_samplescan(Path *path, PlannerInfo *root, RelOptInfo *baserel)
+{
+ Cost startup_cost = 0;
+ Cost run_cost = 0;
+ double spc_seq_page_cost,
+ spc_random_page_cost,
+ spc_page_cost;
+ QualCost qpqual_cost;
+ Cost cpu_per_tuple;
+ BlockNumber pages;
+ double tuples;
+ RangeTblEntry *rte = planner_rt_fetch(baserel->relid, root);
+ TableSampleClause *tablesample = rte->tablesample;
+
+ /* Should only be applied to base relations */
+ Assert(baserel->relid > 0);
+ Assert(baserel->rtekind == RTE_RELATION);
+
+ /* Mark the path with the correct row estimate */
+ if (path->param_info)
+ path->rows = path->param_info->ppi_rows;
+ else
+ path->rows = baserel->rows;
+
+ /* Call the sampling method's costing function. */
+ OidFunctionCall6(tablesample->tsmcost, PointerGetDatum(root),
+ PointerGetDatum(path), PointerGetDatum(baserel),
+ PointerGetDatum(tablesample->args),
+ PointerGetDatum(&pages), PointerGetDatum(&tuples));
+
+ /* fetch estimated page cost for tablespace containing table */
+ get_tablespace_page_costs(baserel->reltablespace,
+ &spc_random_page_cost,
+ &spc_seq_page_cost);
+
+
+ spc_page_cost = tablesample->tsmseqscan ? spc_seq_page_cost :
+ spc_random_page_cost;
+
+ /*
+ * disk costs
+ */
+ run_cost += spc_page_cost * pages;
+
+ /* CPU costs */
+ get_restriction_qual_cost(root, baserel, path->param_info, &qpqual_cost);
+
+ startup_cost += qpqual_cost.startup;
+ cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple;
+ run_cost += cpu_per_tuple * tuples;
+
+ path->startup_cost = startup_cost;
+ path->total_cost = startup_cost + run_cost;
+}
+
+/*
* cost_index
* Determines and returns the cost of scanning a relation using an index.
*
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 76ba1bf..2aaa2b1 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -58,6 +58,8 @@ static Material *create_material_plan(PlannerInfo *root, MaterialPath *best_path
static Plan *create_unique_plan(PlannerInfo *root, UniquePath *best_path);
static SeqScan *create_seqscan_plan(PlannerInfo *root, Path *best_path,
List *tlist, List *scan_clauses);
+static SampleScan *create_samplescan_plan(PlannerInfo *root, Path *best_path,
+ List *tlist, List *scan_clauses);
static Scan *create_indexscan_plan(PlannerInfo *root, IndexPath *best_path,
List *tlist, List *scan_clauses, bool indexonly);
static BitmapHeapScan *create_bitmap_scan_plan(PlannerInfo *root,
@@ -100,6 +102,7 @@ static List *order_qual_clauses(PlannerInfo *root, List *clauses);
static void copy_path_costsize(Plan *dest, Path *src);
static void copy_plan_costsize(Plan *dest, Plan *src);
static SeqScan *make_seqscan(List *qptlist, List *qpqual, Index scanrelid);
+static SampleScan *make_samplescan(List *qptlist, List *qpqual, Index scanrelid);
static IndexScan *make_indexscan(List *qptlist, List *qpqual, Index scanrelid,
Oid indexid, List *indexqual, List *indexqualorig,
List *indexorderby, List *indexorderbyorig,
@@ -228,6 +231,7 @@ create_plan_recurse(PlannerInfo *root, Path *best_path)
switch (best_path->pathtype)
{
case T_SeqScan:
+ case T_SampleScan:
case T_IndexScan:
case T_IndexOnlyScan:
case T_BitmapHeapScan:
@@ -343,6 +347,13 @@ create_scan_plan(PlannerInfo *root, Path *best_path)
scan_clauses);
break;
+ case T_SampleScan:
+ plan = (Plan *) create_samplescan_plan(root,
+ best_path,
+ tlist,
+ scan_clauses);
+ break;
+
case T_IndexScan:
plan = (Plan *) create_indexscan_plan(root,
(IndexPath *) best_path,
@@ -546,6 +557,7 @@ disuse_physical_tlist(PlannerInfo *root, Plan *plan, Path *path)
switch (path->pathtype)
{
case T_SeqScan:
+ case T_SampleScan:
case T_IndexScan:
case T_IndexOnlyScan:
case T_BitmapHeapScan:
@@ -1133,6 +1145,45 @@ create_seqscan_plan(PlannerInfo *root, Path *best_path,
}
/*
+ * create_samplescan_plan
+ * Returns a samplecan plan for the base relation scanned by 'best_path'
+ * with restriction clauses 'scan_clauses' and targetlist 'tlist'.
+ */
+static SampleScan *
+create_samplescan_plan(PlannerInfo *root, Path *best_path,
+ List *tlist, List *scan_clauses)
+{
+ SampleScan *scan_plan;
+ Index scan_relid = best_path->parent->relid;
+
+ /* it should be a base rel with tablesample clause... */
+ Assert(scan_relid > 0);
+ Assert(best_path->parent->rtekind == RTE_RELATION);
+ Assert(best_path->pathtype == T_SampleScan);
+
+ /* Sort clauses into best execution order */
+ scan_clauses = order_qual_clauses(root, scan_clauses);
+
+ /* Reduce RestrictInfo list to bare expressions; ignore pseudoconstants */
+ scan_clauses = extract_actual_clauses(scan_clauses, false);
+
+ /* Replace any outer-relation variables with nestloop params */
+ if (best_path->param_info)
+ {
+ scan_clauses = (List *)
+ replace_nestloop_params(root, (Node *) scan_clauses);
+ }
+
+ scan_plan = make_samplescan(tlist,
+ scan_clauses,
+ scan_relid);
+
+ copy_path_costsize(&scan_plan->plan, best_path);
+
+ return scan_plan;
+}
+
+/*
* create_indexscan_plan
* Returns an indexscan plan for the base relation scanned by 'best_path'
* with restriction clauses 'scan_clauses' and targetlist 'tlist'.
@@ -3318,6 +3369,24 @@ make_seqscan(List *qptlist,
return node;
}
+static SampleScan *
+make_samplescan(List *qptlist,
+ List *qpqual,
+ Index scanrelid)
+{
+ SampleScan *node = makeNode(SampleScan);
+ Plan *plan = &node->plan;
+
+ /* cost should be inserted by caller */
+ plan->targetlist = qptlist;
+ plan->qual = qpqual;
+ plan->lefttree = NULL;
+ plan->righttree = NULL;
+ node->scanrelid = scanrelid;
+
+ return node;
+}
+
static IndexScan *
make_indexscan(List *qptlist,
List *qpqual,
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index ec828cd..82771dc 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -445,6 +445,17 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
fix_scan_list(root, splan->plan.qual, rtoffset);
}
break;
+ case T_SampleScan:
+ {
+ SampleScan *splan = (SampleScan *) plan;
+
+ splan->scanrelid += rtoffset;
+ splan->plan.targetlist =
+ fix_scan_list(root, splan->plan.targetlist, rtoffset);
+ splan->plan.qual =
+ fix_scan_list(root, splan->plan.qual, rtoffset);
+ }
+ break;
case T_IndexScan:
{
IndexScan *splan = (IndexScan *) plan;
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index 5a1d539..84b305f 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -2163,6 +2163,7 @@ finalize_plan(PlannerInfo *root, Plan *plan, Bitmapset *valid_params,
break;
case T_SeqScan:
+ case T_SampleScan:
context.paramids = bms_add_members(context.paramids, scan_params);
break;
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 1395a21..014d670 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -706,6 +706,26 @@ create_seqscan_path(PlannerInfo *root, RelOptInfo *rel, Relids required_outer)
}
/*
+ * create_samplescan_path
+ * Like seqscan but uses sampling function while scanning.
+ */
+Path *
+create_samplescan_path(PlannerInfo *root, RelOptInfo *rel, Relids required_outer)
+{
+ Path *pathnode = makeNode(Path);
+
+ pathnode->pathtype = T_SampleScan;
+ pathnode->parent = rel;
+ pathnode->param_info = get_baserel_parampathinfo(root, rel,
+ required_outer);
+ pathnode->pathkeys = NIL; /* samplescan has unordered result */
+
+ cost_samplescan(pathnode, root, rel);
+
+ return pathnode;
+}
+
+/*
* create_index_path
* Creates a path node for an index scan.
*
@@ -1921,6 +1941,8 @@ reparameterize_path(PlannerInfo *root, Path *path,
case T_SubqueryScan:
return create_subqueryscan_path(root, rel, path->pathkeys,
required_outer);
+ case T_SampleScan:
+ return (Path *) create_samplescan_path(root, rel, required_outer);
default:
break;
}
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 36dac29..ac5e095 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -447,6 +447,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
%type <range> relation_expr
%type <range> relation_expr_opt_alias
%type <target> target_el single_set_clause set_target insert_column_item
+%type <node> relation_expr_tablesample tablesample_clause opt_repeatable_clause
%type <str> generic_option_name
%type <node> generic_option_arg
@@ -611,8 +612,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
STATEMENT STATISTICS STDIN STDOUT STORAGE STRICT_P STRIP_P SUBSTRING
SYMMETRIC SYSID SYSTEM_P
- TABLE TABLES TABLESPACE TEMP TEMPLATE TEMPORARY TEXT_P THEN TIME TIMESTAMP
- TO TRAILING TRANSACTION TREAT TRIGGER TRIM TRUE_P
+ TABLE TABLES TABLESAMPLE TABLESPACE TEMP TEMPLATE TEMPORARY TEXT_P THEN
+ TIME TIMESTAMP TO TRAILING TRANSACTION TREAT TRIGGER TRIM TRUE_P
TRUNCATE TRUSTED TYPE_P TYPES_P
UNBOUNDED UNCOMMITTED UNENCRYPTED UNION UNIQUE UNKNOWN UNLISTEN UNLOGGED
@@ -10227,6 +10228,12 @@ table_ref: relation_expr opt_alias_clause
$1->alias = $2;
$$ = (Node *) $1;
}
+ | relation_expr_tablesample opt_alias_clause
+ {
+ RangeTableSample *n = (RangeTableSample *) $1;
+ n->relation->alias = $2;
+ $$ = (Node *) n;
+ }
| func_table func_alias_clause
{
RangeFunction *n = (RangeFunction *) $1;
@@ -10522,7 +10529,6 @@ relation_expr_list:
| relation_expr_list ',' relation_expr { $$ = lappend($1, $3); }
;
-
/*
* Given "UPDATE foo set set ...", we have to decide without looking any
* further ahead whether the first "set" is an alias or the UPDATE's SET
@@ -10552,6 +10558,31 @@ relation_expr_opt_alias: relation_expr %prec UMINUS
}
;
+
+relation_expr_tablesample: relation_expr tablesample_clause
+ {
+ RangeTableSample *n = (RangeTableSample *) $2;
+ n->relation = $1;
+ $$ = (Node *) n;
+ }
+ ;
+
+tablesample_clause:
+ TABLESAMPLE ColId '(' func_arg_list ')' opt_repeatable_clause
+ {
+ RangeTableSample *n = makeNode(RangeTableSample);
+ n->method = $2;
+ n->args = $4;
+ n->repeatable = $6;
+ $$ = (Node *) n;
+ }
+ ;
+
+opt_repeatable_clause:
+ REPEATABLE '(' AexprConst ')' { $$ = (Node *) $3; }
+ | /*EMPTY*/ { $$ = NULL; }
+ ;
+
/*
* func_table represents a function invocation in a FROM list. It can be
* a plain function call, like "foo(...)", or a ROWS FROM expression with
@@ -13334,7 +13365,6 @@ unreserved_keyword:
| RELATIVE_P
| RELEASE
| RENAME
- | REPEATABLE
| REPLACE
| REPLICA
| RESET
@@ -13509,6 +13539,7 @@ type_func_name_keyword:
| OVERLAPS
| RIGHT
| SIMILAR
+ | TABLESAMPLE
| VERBOSE
;
@@ -13577,6 +13608,7 @@ reserved_keyword:
| PLACING
| PRIMARY
| REFERENCES
+ | REPEATABLE
| RETURNING
| SELECT
| SESSION_USER
diff --git a/src/backend/parser/parse_clause.c b/src/backend/parser/parse_clause.c
index 654dce6..e3554b3 100644
--- a/src/backend/parser/parse_clause.c
+++ b/src/backend/parser/parse_clause.c
@@ -16,6 +16,7 @@
#include "postgres.h"
#include "access/heapam.h"
+#include "access/htup_details.h"
#include "catalog/heap.h"
#include "catalog/pg_type.h"
#include "commands/defrem.h"
@@ -29,6 +30,7 @@
#include "parser/parse_coerce.h"
#include "parser/parse_collate.h"
#include "parser/parse_expr.h"
+#include "parser/parse_func.h"
#include "parser/parse_oper.h"
#include "parser/parse_relation.h"
#include "parser/parse_target.h"
@@ -36,6 +38,7 @@
#include "utils/guc.h"
#include "utils/lsyscache.h"
#include "utils/rel.h"
+#include "utils/syscache.h"
/* Convenience macro for the most common makeNamespaceItem() case */
@@ -413,6 +416,28 @@ transformJoinOnClause(ParseState *pstate, JoinExpr *j, List *namespace)
return result;
}
+static RangeTblEntry *
+transformTableSampleEntry(ParseState *pstate, RangeTableSample *r)
+{
+ RangeTblEntry *rte;
+ TableSampleClause *tablesample = NULL;
+
+ rte = transformTableEntry(pstate, r->relation);
+
+ if (rte->relkind != RELKIND_RELATION &&
+ rte->relkind != RELKIND_MATVIEW)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("TABLESAMPLE clause can only be used on tables and materialized views"),
+ parser_errposition(pstate,
+ exprLocation((Node *) r))));
+
+ tablesample = ParseTableSample(pstate, r->method, r->repeatable, r->args);
+ rte->tablesample = tablesample;
+
+ return rte;
+}
+
/*
* transformTableEntry --- transform a RangeVar (simple relation reference)
*/
@@ -421,7 +446,7 @@ transformTableEntry(ParseState *pstate, RangeVar *r)
{
RangeTblEntry *rte;
- /* We need only build a range table entry */
+ /* We first need to build a range table entry */
rte = addRangeTableEntry(pstate, r, r->alias,
interpretInhOption(r->inhOpt), true);
@@ -1121,6 +1146,27 @@ transformFromClauseItem(ParseState *pstate, Node *n,
return (Node *) j;
}
+ else if (IsA(n, RangeTableSample))
+ {
+ /* Tablesample reference */
+ RangeTableSample *rv = (RangeTableSample *) n;
+ RangeTblRef *rtr;
+ RangeTblEntry *rte = NULL;
+ int rtindex;
+
+
+ rte = transformTableSampleEntry(pstate, rv);
+
+ /* assume new rte is at end */
+ rtindex = list_length(pstate->p_rtable);
+ Assert(rte == rt_fetch(rtindex, pstate->p_rtable));
+ *top_rte = rte;
+ *top_rti = rtindex;
+ *namespace = list_make1(makeDefaultNSItem(rte));
+ rtr = makeNode(RangeTblRef);
+ rtr->rtindex = rtindex;
+ return (Node *) rtr;
+ }
else
elog(ERROR, "unrecognized node type: %d", (int) nodeTag(n));
return NULL; /* can't get here, keep compiler quiet */
diff --git a/src/backend/parser/parse_func.c b/src/backend/parser/parse_func.c
index a200804..541f415 100644
--- a/src/backend/parser/parse_func.c
+++ b/src/backend/parser/parse_func.c
@@ -18,6 +18,7 @@
#include "catalog/pg_aggregate.h"
#include "catalog/pg_proc.h"
#include "catalog/pg_type.h"
+#include "catalog/pg_tablesample_method.h"
#include "funcapi.h"
#include "lib/stringinfo.h"
#include "nodes/makefuncs.h"
@@ -26,6 +27,7 @@
#include "parser/parse_clause.h"
#include "parser/parse_coerce.h"
#include "parser/parse_func.h"
+#include "parser/parse_expr.h"
#include "parser/parse_relation.h"
#include "parser/parse_target.h"
#include "parser/parse_type.h"
@@ -760,6 +762,134 @@ ParseFuncOrColumn(ParseState *pstate, List *funcname, List *fargs,
}
+/*
+ * ParseTableSample
+ *
+ * Parse TABLESAMPLE clause and process the arguments
+ */
+extern TableSampleClause *
+ParseTableSample(ParseState *pstate, char *samplemethod, Node *repeatable,
+ List *sampleargs)
+{
+ HeapTuple tuple;
+ Form_pg_tablesample_method tsm;
+ Form_pg_proc procform;
+ TableSampleClause *tablesample;
+ List *fargs;
+ ListCell *larg;
+ int nargs, initnargs;
+ Oid actual_arg_types[FUNC_MAX_ARGS];
+ Oid init_arg_types[FUNC_MAX_ARGS];
+
+ /* Load the tablesample method */
+ tuple = SearchSysCache1(TABLESAMPLEMETHODNAME, PointerGetDatum(samplemethod));
+ if (!HeapTupleIsValid(tuple))
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_OBJECT),
+ errmsg("tablesample method \"%s\" does not exist",
+ samplemethod)));
+
+ tablesample = makeNode(TableSampleClause);
+ tablesample->tsmid = HeapTupleGetOid(tuple);
+
+ tsm = (Form_pg_tablesample_method) GETSTRUCT(tuple);
+
+ tablesample->tsmseqscan = tsm->tsmseqscan;
+ tablesample->tsminit = tsm->tsminit;
+ tablesample->tsmnextblock = tsm->tsmnextblock;
+ tablesample->tsmnexttuple = tsm->tsmnexttuple;
+ tablesample->tsmexaminetuple = tsm->tsmexaminetuple;
+ tablesample->tsmend = tsm->tsmend;
+ tablesample->tsmreset = tsm->tsmreset;
+ tablesample->tsmcost = tsm->tsmcost;
+
+ ReleaseSysCache(tuple);
+
+ /* Validate the parameters against init function definition. */
+ tuple = SearchSysCache1(PROCOID,
+ ObjectIdGetDatum(tablesample->tsminit));
+
+ if (!HeapTupleIsValid(tuple)) /* should not happen */
+ elog(ERROR, "cache lookup failed for function %u",
+ tablesample->tsminit);
+
+ procform = (Form_pg_proc) GETSTRUCT(tuple);
+ initnargs = procform->pronargs;
+ Assert(initnargs >= 3);
+
+ /*
+ * First parameter is used to pass the SampleScanState, second is
+ * seed (REPEATABLE), skip the processing for them here, just assert
+ * that the types are correct.
+ */
+ Assert(procform->proargtypes.values[0] == INTERNALOID);
+ Assert(procform->proargtypes.values[1] == INT4OID);
+ initnargs -= 2;
+ memcpy(init_arg_types, procform->proargtypes.values + 2,
+ initnargs * sizeof(Oid));
+
+ /* Now we are done with the catalog */
+ ReleaseSysCache(tuple);
+
+ /* Process repeatable (seed) */
+ if (repeatable != NULL)
+ {
+ Node *arg = repeatable;
+
+ if (arg && IsA(arg, A_Const))
+ {
+ A_Const *con = (A_Const *) arg;
+
+ if (con->val.type == T_Null)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("REPEATABLE clause must be NOT NULL numeric value"),
+ parser_errposition(pstate, con->location)));
+
+ }
+
+ arg = transformExpr(pstate, arg, EXPR_KIND_FROM_FUNCTION);
+ arg = coerce_to_specific_type(pstate, arg, INT4OID, "REPEATABLE");
+ tablesample->repeatable = arg;
+ }
+ else
+ tablesample->repeatable = NULL;
+
+ /* Transform the rest of arguments ... */
+ fargs = NIL;
+ nargs = 0;
+ foreach(larg, sampleargs)
+ {
+ Node *arg = transformExpr(pstate, (Node *) lfirst(larg), EXPR_KIND_FROM_FUNCTION);
+ Oid argtype = exprType(arg);
+
+ fargs = lappend(fargs, arg);
+
+ actual_arg_types[nargs++] = argtype;
+ }
+
+ /*
+ * Check if parameters are correct.
+ *
+ * XXX: can we do better at hinting here?
+ */
+ if (initnargs != nargs ||
+ !can_coerce_type(initnargs, actual_arg_types, init_arg_types,
+ COERCION_IMPLICIT))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("wrong parameters for tablesample method \"%s\"",
+ samplemethod)));
+
+ /* perform the necessary typecasting of arguments */
+ make_fn_arguments(pstate, fargs, actual_arg_types, init_arg_types);
+
+ /* Pass the arguments down */
+ tablesample->args = fargs;
+
+ return tablesample;
+}
+
/* func_match_argtypes()
*
* Given a list of candidate functions (having the right name and number
diff --git a/src/backend/utils/Makefile b/src/backend/utils/Makefile
index 8374533..9daa2ae 100644
--- a/src/backend/utils/Makefile
+++ b/src/backend/utils/Makefile
@@ -9,7 +9,8 @@ top_builddir = ../../..
include $(top_builddir)/src/Makefile.global
OBJS = fmgrtab.o
-SUBDIRS = adt cache error fmgr hash init mb misc mmgr resowner sort time
+SUBDIRS = adt cache error fmgr hash init mb misc mmgr resowner sort time \
+ tablesample
# location of Catalog.pm
catalogdir = $(top_srcdir)/src/backend/catalog
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index c1d860c..8198fc7 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -31,6 +31,7 @@
#include "catalog/pg_opclass.h"
#include "catalog/pg_operator.h"
#include "catalog/pg_proc.h"
+#include "catalog/pg_tablesample_method.h"
#include "catalog/pg_trigger.h"
#include "catalog/pg_type.h"
#include "commands/defrem.h"
@@ -343,6 +344,8 @@ static void make_ruledef(StringInfo buf, HeapTuple ruletup, TupleDesc rulettc,
int prettyFlags);
static void make_viewdef(StringInfo buf, HeapTuple ruletup, TupleDesc rulettc,
int prettyFlags, int wrapColumn);
+static void get_tablesample_def(TableSampleClause *tablesample,
+ deparse_context *context);
static void get_query_def(Query *query, StringInfo buf, List *parentnamespace,
TupleDesc resultDesc,
int prettyFlags, int wrapColumn, int startIndent);
@@ -4184,6 +4187,50 @@ make_viewdef(StringInfo buf, HeapTuple ruletup, TupleDesc rulettc,
heap_close(ev_relation, AccessShareLock);
}
+/* ----------
+ * get_tablesample_def - Convert TableSampleClause back to SQL
+ * ----------
+ */
+static void
+get_tablesample_def(TableSampleClause *tablesample, deparse_context *context)
+{
+ StringInfo buf = context->buf;
+ HeapTuple tuple;
+ Form_pg_tablesample_method tsm;
+ char *tsmname;
+ int nargs;
+ ListCell *l;
+
+ /* Load the tablesample method */
+ tuple = SearchSysCache1(TABLESAMPLEMETHODOID, ObjectIdGetDatum(tablesample->tsmid));
+ if (!HeapTupleIsValid(tuple))
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_OBJECT),
+ errmsg("cache lookup failed for tablesample method %u",
+ tablesample->tsmid)));
+
+ tsm = (Form_pg_tablesample_method) GETSTRUCT(tuple);
+ tsmname = NameStr(tsm->tsmname);
+ appendStringInfo(buf, " TABLESAMPLE %s (", quote_identifier(tsmname));
+
+ ReleaseSysCache(tuple);
+
+ nargs = 0;
+ foreach(l, tablesample->args)
+ {
+ if (nargs++ > 0)
+ appendStringInfoString(buf, ", ");
+ get_rule_expr((Node *) lfirst(l), context, true);
+ }
+ appendStringInfoChar(buf, ')');
+
+ if (tablesample->repeatable != NULL)
+ {
+ appendStringInfoString(buf, " REPEATABLE (");
+ get_rule_expr(tablesample->repeatable, context, true);
+ appendStringInfoChar(buf, ')');
+ }
+}
/* ----------
* get_query_def - Parse back one query parsetree
@@ -8411,6 +8458,9 @@ get_from_clause_item(Node *jtnode, Query *query, deparse_context *context)
only_marker(rte),
generate_relation_name(rte->relid,
context->namespaces));
+
+ if (rte->tablesample)
+ get_tablesample_def(rte->tablesample, context);
break;
case RTE_SUBQUERY:
/* Subquery RTE */
diff --git a/src/backend/utils/cache/syscache.c b/src/backend/utils/cache/syscache.c
index bd27168..3a8f01e 100644
--- a/src/backend/utils/cache/syscache.c
+++ b/src/backend/utils/cache/syscache.c
@@ -55,6 +55,7 @@
#include "catalog/pg_shdescription.h"
#include "catalog/pg_shseclabel.h"
#include "catalog/pg_statistic.h"
+#include "catalog/pg_tablesample_method.h"
#include "catalog/pg_tablespace.h"
#include "catalog/pg_ts_config.h"
#include "catalog/pg_ts_config_map.h"
@@ -642,6 +643,28 @@ static const struct cachedesc cacheinfo[] = {
},
128
},
+ {TableSampleMethodRelationId, /* TABLESAMPLEMETHODNAME */
+ TableSampleMethodNameIndexId,
+ 1,
+ {
+ Anum_pg_tablesample_method_tsmname,
+ 0,
+ 0,
+ 0,
+ },
+ 2
+ },
+ {TableSampleMethodRelationId, /* TABLESAMPLEMETHODOID */
+ TableSampleMethodOidIndexId,
+ 1,
+ {
+ ObjectIdAttributeNumber,
+ 0,
+ 0,
+ 0,
+ },
+ 2
+ },
{TableSpaceRelationId, /* TABLESPACEOID */
TablespaceOidIndexId,
1,
diff --git a/src/backend/utils/misc/sampling.c b/src/backend/utils/misc/sampling.c
index 1eeabaf..f213c46 100644
--- a/src/backend/utils/misc/sampling.c
+++ b/src/backend/utils/misc/sampling.c
@@ -46,6 +46,8 @@ BlockSampler_Init(BlockSampler bs, BlockNumber nblocks, int samplesize,
bs->n = samplesize;
bs->t = 0; /* blocks scanned so far */
bs->m = 0; /* blocks selected so far */
+
+ sampler_random_init_state(randseed, bs->randstate);
}
bool
@@ -92,7 +94,7 @@ BlockSampler_Next(BlockSampler bs)
* less than k, which means that we cannot fail to select enough blocks.
*----------
*/
- V = sampler_random_fract();
+ V = sampler_random_fract(bs->randstate);
p = 1.0 - (double) k / (double) K;
while (V < p)
{
@@ -126,8 +128,14 @@ BlockSampler_Next(BlockSampler bs)
void
reservoir_init_selection_state(ReservoirState rs, int n)
{
+ /*
+ * Reservoir sampling is not used anywhere where it would need to return
+ * repeatable results so we can initialize it randomly.
+ */
+ sampler_random_init_state(random(), rs->randstate);
+
/* Initial value of W (for use when Algorithm Z is first applied) */
- *rs = exp(-log(sampler_random_fract()) / n);
+ rs->W = exp(-log(sampler_random_fract(rs->randstate)) / n);
}
double
@@ -142,7 +150,7 @@ reservoir_get_next_S(ReservoirState rs, double t, int n)
double V,
quot;
- V = sampler_random_fract(); /* Generate V */
+ V = sampler_random_fract(rs->randstate); /* Generate V */
S = 0;
t += 1;
/* Note: "num" in Vitter's code is always equal to t - n */
@@ -158,7 +166,7 @@ reservoir_get_next_S(ReservoirState rs, double t, int n)
else
{
/* Now apply Algorithm Z */
- double W = *rs;
+ double W = rs->W;
double term = t - (double) n + 1;
for (;;)
@@ -174,7 +182,7 @@ reservoir_get_next_S(ReservoirState rs, double t, int n)
tmp;
/* Generate U and X */
- U = sampler_random_fract();
+ U = sampler_random_fract(rs->randstate);
X = t * (W - 1.0);
S = floor(X); /* S is tentatively set to floor(X) */
/* Test if U <= h(S)/cg(X) in the manner of (6.3) */
@@ -203,11 +211,11 @@ reservoir_get_next_S(ReservoirState rs, double t, int n)
y *= numer / denom;
denom -= 1;
}
- W = exp(-log(sampler_random_fract()) / n); /* Generate W in advance */
+ W = exp(-log(sampler_random_fract(rs->randstate)) / n); /* Generate W in advance */
if (exp(log(y) / n) <= (t + X) / t)
break;
}
- *rs = W;
+ rs->W = W;
}
return S;
}
@@ -217,10 +225,17 @@ reservoir_get_next_S(ReservoirState rs, double t, int n)
* Random number generator used by sampling
*----------
*/
+void
+sampler_random_init_state(long seed, SamplerRandomState randstate)
+{
+ randstate[0] = 0x330e;
+ randstate[1] = (unsigned short) seed;
+ randstate[2] = (unsigned short) (seed >> 16);
+}
/* Select a random value R uniformly distributed in (0 - 1) */
double
-sampler_random_fract()
+sampler_random_fract(SamplerRandomState randstate)
{
- return ((double) random() + 1) / ((double) MAX_RANDOM_VALUE + 2);
+ return pg_erand48(randstate);
}
diff --git a/src/backend/utils/tablesample/Makefile b/src/backend/utils/tablesample/Makefile
new file mode 100644
index 0000000..df92939
--- /dev/null
+++ b/src/backend/utils/tablesample/Makefile
@@ -0,0 +1,17 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+# Makefile for utils/tablesample
+#
+# IDENTIFICATION
+# src/backend/utils/tablesample/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/backend/utils/tablesample
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS = system.o bernoulli.o
+
+include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/utils/tablesample/bernoulli.c b/src/backend/utils/tablesample/bernoulli.c
new file mode 100644
index 0000000..f7e9688
--- /dev/null
+++ b/src/backend/utils/tablesample/bernoulli.c
@@ -0,0 +1,224 @@
+/*-------------------------------------------------------------------------
+ *
+ * bernoulli.c
+ * interface routines for BERNOULLI tablesample method
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/utils/tablesample/bernoulli.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "fmgr.h"
+
+#include "access/relscan.h"
+#include "nodes/execnodes.h"
+#include "nodes/relation.h"
+#include "optimizer/clauses.h"
+#include "storage/bufmgr.h"
+#include "utils/sampling.h"
+#include "utils/tablesample.h"
+
+
+/* State */
+typedef struct
+{
+ uint32 seed; /* random seed */
+ BlockNumber startblock; /* starting block, we use ths for syncscan support */
+ BlockNumber nblocks; /* number of blocks */
+ BlockNumber blockno; /* current block */
+ float4 probability; /* probabilty that tuple will be returned (0.0-1.0) */
+ OffsetNumber lt; /* last tuple returned from current block */
+ SamplerRandomState randstate; /* random generator state */
+} BernoulliSamplerData;
+
+/*
+ * Initialize the state.
+ */
+Datum
+tsm_bernoulli_init(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ uint32 seed = PG_GETARG_UINT32(1);
+ float4 percent = PG_ARGISNULL(2) ? -1 : PG_GETARG_FLOAT4(2);
+ HeapScanDesc scan = scanstate->ss.ss_currentScanDesc;
+ BernoulliSamplerData *sampler;
+
+ if (percent < 0 || percent > 100)
+ ereport(ERROR,
+ (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+ errmsg("invalid sample size"),
+ errhint("Sample size must be numeric value between 0 and 100 (inclusive).")));
+
+ sampler = palloc0(sizeof(BernoulliSamplerData));
+
+ /* Remember initial values for reinit */
+ sampler->seed = seed;
+ sampler->startblock = scan->rs_startblock;
+ sampler->nblocks = scan->rs_nblocks;
+ sampler->blockno = InvalidBlockNumber;
+ sampler->probability = percent / 100;
+ sampler->lt = InvalidOffsetNumber;
+ sampler_random_init_state(sampler->seed, sampler->randstate);
+
+ scanstate->tsmdata = (void *) sampler;
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Get next block number to read or InvalidBlockNumber if we are at the
+ * end of the relation.
+ */
+Datum
+tsm_bernoulli_nextblock(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ BernoulliSamplerData *sampler =
+ (BernoulliSamplerData *) scanstate->tsmdata;
+
+ /*
+ * Bernoulli sampling scans all blocks on the table and supports
+ * syncscan so loop from startblock to startblock instead of
+ * from 0 to nblocks.
+ */
+ if (sampler->blockno == InvalidBlockNumber)
+ sampler->blockno = sampler->startblock;
+ else
+ {
+ sampler->blockno++;
+
+ if (sampler->blockno >= sampler->nblocks)
+ sampler->blockno = 0;
+
+ if (sampler->blockno == sampler->startblock)
+ PG_RETURN_UINT32(InvalidBlockNumber);
+ }
+
+ PG_RETURN_UINT32(sampler->blockno);
+}
+
+/*
+ * Get next tuple from current block.
+ *
+ * This method implements the main logic in bernoulli sampling.
+ * The algorithm simply generates new random number (in 0.0-1.0 range) and if
+ * it falls within user specified probability (in the same range) return the
+ * tuple offset.
+ *
+ * If we reach end of the block return InvalidOffsetNumber which tells
+ * SampleScan to go to next block.
+ */
+Datum
+tsm_bernoulli_nexttuple(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ OffsetNumber maxoffset = PG_GETARG_UINT16(2);
+ BernoulliSamplerData *sampler =
+ (BernoulliSamplerData *) scanstate->tsmdata;
+ OffsetNumber tupoffset = sampler->lt;
+ float4 probability = sampler->probability;
+
+ if (tupoffset == InvalidOffsetNumber)
+ tupoffset = FirstOffsetNumber;
+ else
+ tupoffset++;
+
+ /*
+ * Loop over tuple offsets until the random generator returns value that
+ * is within the probability of returning the tuple or until we reach
+ * end of the block.
+ *
+ * (This is our implementation of bernoulli trial)
+ */
+ while (sampler_random_fract(sampler->randstate) > probability)
+ {
+ tupoffset++;
+
+ if (tupoffset > maxoffset)
+ break;
+ }
+
+ if (tupoffset > maxoffset)
+ /* Tell SampleScan that we want next block. */
+ tupoffset = InvalidOffsetNumber;
+
+ sampler->lt = tupoffset;
+
+ PG_RETURN_UINT16(tupoffset);
+}
+
+/*
+ * Cleanup method.
+ */
+Datum
+tsm_bernoulli_end(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+
+ pfree(scanstate->tsmdata);
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Reset state (called by ReScan).
+ */
+Datum
+tsm_bernoulli_reset(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ BernoulliSamplerData *sampler =
+ (BernoulliSamplerData *) scanstate->tsmdata;
+
+ sampler->blockno = InvalidBlockNumber;
+ sampler->lt = InvalidOffsetNumber;
+ sampler_random_init_state(sampler->seed, sampler->randstate);
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Costing function.
+ */
+Datum
+tsm_bernoulli_cost(PG_FUNCTION_ARGS)
+{
+ PlannerInfo *root = (PlannerInfo *) PG_GETARG_POINTER(0);
+ Path *path = (Path *) PG_GETARG_POINTER(1);
+ RelOptInfo *baserel = (RelOptInfo *) PG_GETARG_POINTER(2);
+ List *args = (List *) PG_GETARG_POINTER(3);
+ BlockNumber *pages = (BlockNumber *) PG_GETARG_POINTER(4);
+ double *tuples = (double *) PG_GETARG_POINTER(5);
+ Node *pctnode;
+ float4 samplesize;
+
+ *pages = baserel->pages;
+
+ pctnode = linitial(args);
+ pctnode = estimate_expression_value(root, pctnode);
+
+ if (IsA(pctnode, RelabelType))
+ pctnode = (Node *) ((RelabelType *) pctnode)->arg;
+
+ if (IsA(pctnode, Const))
+ {
+ samplesize = DatumGetFloat4(((Const *) pctnode)->constvalue);
+ samplesize /= 100.0;
+ }
+ else
+ {
+ /* Default samplesize if the estimation didn't return Const. */
+ samplesize = 0.1;
+ }
+
+ *tuples = path->rows * samplesize;
+ path->rows = *tuples;
+
+ PG_RETURN_VOID();
+}
diff --git a/src/backend/utils/tablesample/system.c b/src/backend/utils/tablesample/system.c
new file mode 100644
index 0000000..0c4da28
--- /dev/null
+++ b/src/backend/utils/tablesample/system.c
@@ -0,0 +1,185 @@
+/*-------------------------------------------------------------------------
+ *
+ * system.c
+ * interface routines for system tablesample method
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/utils/tablesample/system.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "fmgr.h"
+
+#include "nodes/execnodes.h"
+#include "nodes/relation.h"
+#include "optimizer/clauses.h"
+#include "storage/bufmgr.h"
+#include "utils/sampling.h"
+#include "utils/tablesample.h"
+
+
+/*
+ * State
+ */
+typedef struct
+{
+ BlockSamplerData bs;
+ uint32 seed; /* random seed */
+ BlockNumber tblocks; /* total blocks in relation */
+ int samplesize; /* number of blocks to return */
+ OffsetNumber lt; /* last tuple returned from current block */
+} SystemSamplerData;
+
+
+/*
+ * Initializes the state.
+ */
+Datum
+tsm_system_init(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ uint32 seed = PG_GETARG_UINT32(1);
+ float4 percent = PG_ARGISNULL(2) ? -1 : PG_GETARG_FLOAT4(2);
+ SystemSamplerData *sampler;
+
+ if (percent < 0 || percent > 100)
+ ereport(ERROR,
+ (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+ errmsg("invalid sample size"),
+ errhint("Sample size must be numeric value between 0 and 100 (inclusive).")));
+
+ sampler = palloc0(sizeof(SystemSamplerData));
+
+ /* Remember initial values for reinit */
+ sampler->seed = seed;
+ sampler->tblocks =
+ RelationGetNumberOfBlocks(scanstate->ss.ss_currentRelation);
+ sampler->samplesize = 1 + (int) (sampler->tblocks * (percent / 100.0));
+ sampler->lt = InvalidOffsetNumber;
+
+ BlockSampler_Init(&sampler->bs, sampler->tblocks, sampler->samplesize,
+ sampler->seed);
+
+ scanstate->tsmdata = (void *) sampler;
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Get next block number or InvalidBlockNumber when we're done.
+ *
+ * Uses the same logic as VACUUM for picking the random blocks.
+ */
+Datum
+tsm_system_nextblock(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ SystemSamplerData *sampler = (SystemSamplerData *) scanstate->tsmdata;
+ BlockNumber blockno;
+
+ if (!BlockSampler_HasMore(&sampler->bs))
+ PG_RETURN_UINT32(InvalidBlockNumber);
+
+ blockno = BlockSampler_Next(&sampler->bs);
+
+ PG_RETURN_UINT32(blockno);
+}
+
+/*
+ * Get next tuple offset in current block or InvalidOffsetNumber if we are done
+ * with this block.
+ */
+Datum
+tsm_system_nexttuple(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ OffsetNumber maxoffset = PG_GETARG_UINT16(2);
+ SystemSamplerData *sampler = (SystemSamplerData *) scanstate->tsmdata;
+ OffsetNumber tupoffset = sampler->lt;
+
+ if (tupoffset == InvalidOffsetNumber)
+ tupoffset = FirstOffsetNumber;
+ else
+ tupoffset++;
+
+ if (tupoffset > maxoffset)
+ tupoffset = InvalidOffsetNumber;
+
+ sampler->lt = tupoffset;
+
+ PG_RETURN_UINT16(tupoffset);
+}
+
+/*
+ * Cleanup method.
+ */
+Datum
+tsm_system_end(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+
+ pfree(scanstate->tsmdata);
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Reset state (called by ReScan).
+ */
+Datum
+tsm_system_reset(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ SystemSamplerData *sampler = (SystemSamplerData *) scanstate->tsmdata;
+
+ sampler->lt = InvalidOffsetNumber;
+ BlockSampler_Init(&sampler->bs, sampler->tblocks, sampler->samplesize,
+ sampler->seed);
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Costing function.
+ */
+Datum
+tsm_system_cost(PG_FUNCTION_ARGS)
+{
+ PlannerInfo *root = (PlannerInfo *) PG_GETARG_POINTER(0);
+ Path *path = (Path *) PG_GETARG_POINTER(1);
+ RelOptInfo *baserel = (RelOptInfo *) PG_GETARG_POINTER(2);
+ List *args = (List *) PG_GETARG_POINTER(3);
+ BlockNumber *pages = (BlockNumber *) PG_GETARG_POINTER(4);
+ double *tuples = (double *) PG_GETARG_POINTER(5);
+ Node *pctnode;
+ float4 samplesize;
+
+ pctnode = linitial(args);
+ pctnode = estimate_expression_value(root, pctnode);
+
+ if (IsA(pctnode, RelabelType))
+ pctnode = (Node *) ((RelabelType *) pctnode)->arg;
+
+ if (IsA(pctnode, Const))
+ {
+ samplesize = DatumGetFloat4(((Const *) pctnode)->constvalue);
+ samplesize /= 100.0;
+ }
+ else
+ {
+ /* Default samplesize if the estimation didn't return Const. */
+ samplesize = 0.1;
+ }
+
+ *pages = baserel->pages * samplesize;
+ *tuples = path->rows * samplesize;
+ path->rows = *tuples;
+
+ PG_RETURN_VOID();
+}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 939d93d..9dabcb0 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -115,6 +115,7 @@ extern HeapScanDesc heap_beginscan_bm(Relation relation, Snapshot snapshot,
int nkeys, ScanKey key);
extern void heap_setscanlimits(HeapScanDesc scan, BlockNumber startBlk,
BlockNumber endBlk);
+extern void heapgetpage(HeapScanDesc scan, BlockNumber page);
extern void heap_rescan(HeapScanDesc scan, ScanKey key);
extern void heap_endscan(HeapScanDesc scan);
extern HeapTuple heap_getnext(HeapScanDesc scan, ScanDirection direction);
diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
index 9bb6362..e2b2b4f 100644
--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -29,6 +29,7 @@ typedef struct HeapScanDescData
int rs_nkeys; /* number of scan keys */
ScanKey rs_key; /* array of scan key descriptors */
bool rs_bitmapscan; /* true if this is really a bitmap scan */
+ bool rs_samplescan; /* true if this is really a sample scan */
bool rs_pageatatime; /* verify visibility page-at-a-time? */
bool rs_allow_strat; /* allow or disallow use of access strategy */
bool rs_allow_sync; /* allow or disallow use of syncscan */
diff --git a/src/include/catalog/indexing.h b/src/include/catalog/indexing.h
index a680229..c711cca 100644
--- a/src/include/catalog/indexing.h
+++ b/src/include/catalog/indexing.h
@@ -305,6 +305,11 @@ DECLARE_UNIQUE_INDEX(pg_policy_oid_index, 3257, on pg_policy using btree(oid oid
DECLARE_UNIQUE_INDEX(pg_policy_polrelid_polname_index, 3258, on pg_policy using btree(polrelid oid_ops, polname name_ops));
#define PolicyPolrelidPolnameIndexId 3258
+DECLARE_UNIQUE_INDEX(pg_tablesample_method_name_index, 3281, on pg_tablesample_method using btree(tsmname name_ops));
+#define TableSampleMethodNameIndexId 3281
+DECLARE_UNIQUE_INDEX(pg_tablesample_method_oid_index, 3282, on pg_tablesample_method using btree(oid oid_ops));
+#define TableSampleMethodOidIndexId 3282
+
/* last step of initialization script: build the indexes declared above */
BUILD_INDICES
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 4268b99..756911c 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -5145,6 +5145,31 @@ DESCR("rank of hypothetical row without gaps");
DATA(insert OID = 3993 ( dense_rank_final PGNSP PGUID 12 1 0 2276 0 f f f f f f i 2 0 20 "2281 2276" "{2281,2276}" "{i,v}" _null_ _null_ hypothetical_dense_rank_final _null_ _null_ _null_ ));
DESCR("aggregate final function");
+DATA(insert OID = 3285 ( tsm_system_init PGNSP PGUID 12 1 0 0 0 f f f f t f v 3 0 2278 "2281 23 700" _null_ _null_ _null_ _null_ tsm_system_init _null_ _null_ _null_ ));
+DESCR("tsm_system_init(internal)");
+DATA(insert OID = 3286 ( tsm_system_nextblock PGNSP PGUID 12 1 0 0 0 f f f f t f v 2 0 23 "2281 16" _null_ _null_ _null_ _null_ tsm_system_nextblock _null_ _null_ _null_ ));
+DESCR("tsm_system_nextblock(internal)");
+DATA(insert OID = 3287 ( tsm_system_nexttuple PGNSP PGUID 12 1 0 0 0 f f f f t f v 4 0 21 "2281 23 21 16" _null_ _null_ _null_ _null_ tsm_system_nexttuple _null_ _null_ _null_ ));
+DESCR("tsm_system_nexttuple(internal)");
+DATA(insert OID = 3288 ( tsm_system_end PGNSP PGUID 12 1 0 0 0 f f f f t f v 1 0 2278 "2281" _null_ _null_ _null_ _null_ tsm_system_end _null_ _null_ _null_ ));
+DESCR("tsm_system_end(internal)");
+DATA(insert OID = 3289 ( tsm_system_reset PGNSP PGUID 12 1 0 0 0 f f f f t f v 1 0 2278 "2281" _null_ _null_ _null_ _null_ tsm_system_reset _null_ _null_ _null_ ));
+DESCR("tsm_system_reset(internal)");
+DATA(insert OID = 3290 ( tsm_system_cost PGNSP PGUID 12 1 0 0 0 f f f f t f v 7 0 2278 "2281 2281 2281 2281 2281 2281 2281" _null_ _null_ _null_ _null_ tsm_system_cost _null_ _null_ _null_ ));
+DESCR("tsm_system_cost(internal)");
+
+DATA(insert OID = 3291 ( tsm_bernoulli_init PGNSP PGUID 12 1 0 0 0 f f f f t f v 3 0 2278 "2281 23 700" _null_ _null_ _null_ _null_ tsm_bernoulli_init _null_ _null_ _null_ ));
+DESCR("tsm_bernoulli_init(internal)");
+DATA(insert OID = 3292 ( tsm_bernoulli_nextblock PGNSP PGUID 12 1 0 0 0 f f f f t f v 2 0 23 "2281 16" _null_ _null_ _null_ _null_ tsm_bernoulli_nextblock _null_ _null_ _null_ ));
+DESCR("tsm_bernoulli_nextblock(internal)");
+DATA(insert OID = 3293 ( tsm_bernoulli_nexttuple PGNSP PGUID 12 1 0 0 0 f f f f t f v 4 0 21 "2281 23 21 16" _null_ _null_ _null_ _null_ tsm_bernoulli_nexttuple _null_ _null_ _null_ ));
+DESCR("tsm_bernoulli_nexttuple(internal)");
+DATA(insert OID = 3294 ( tsm_bernoulli_end PGNSP PGUID 12 1 0 0 0 f f f f t f v 1 0 2278 "2281" _null_ _null_ _null_ _null_ tsm_bernoulli_end _null_ _null_ _null_ ));
+DESCR("tsm_bernoulli_end(internal)");
+DATA(insert OID = 3296 ( tsm_bernoulli_reset PGNSP PGUID 12 1 0 0 0 f f f f t f v 1 0 2278 "2281" _null_ _null_ _null_ _null_ tsm_bernoulli_reset _null_ _null_ _null_ ));
+DESCR("tsm_bernoulli_reset(internal)");
+DATA(insert OID = 3297 ( tsm_bernoulli_cost PGNSP PGUID 12 1 0 0 0 f f f f t f v 7 0 2278 "2281 2281 2281 2281 2281 2281 2281" _null_ _null_ _null_ _null_ tsm_bernoulli_cost _null_ _null_ _null_ ));
+DESCR("tsm_bernoulli_cost(internal)");
/*
* Symbolic values for provolatile column: these indicate whether the result
diff --git a/src/include/catalog/pg_tablesample_method.h b/src/include/catalog/pg_tablesample_method.h
new file mode 100644
index 0000000..fd76f77
--- /dev/null
+++ b/src/include/catalog/pg_tablesample_method.h
@@ -0,0 +1,75 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_tablesample_method.h
+ * definition of the table scan methods.
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/catalog/pg_tablesample_method.h
+ *
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PG_TABLESAMPLE_METHOD_H
+#define PG_TABLESAMPLE_METHOD_H
+
+#include "catalog/genbki.h"
+
+/* ----------------
+ * pg_tablesample_method definition. cpp turns this into
+ * typedef struct FormData_pg_tablesample_method
+ * ----------------
+ */
+#define TableSampleMethodRelationId 3280
+
+CATALOG(pg_tablesample_method,3280)
+{
+ NameData tsmname; /* tablesample method name */
+ bool tsmseqscan; /* does this method scan whole table sequentially? */
+ regproc tsminit; /* init scan function */
+ regproc tsmnextblock; /* function returning next block to sample
+ or InvalidBlockOffset if finished */
+ regproc tsmnexttuple; /* function returning next tuple offset from current block
+ or InvalidOffsetNumber if end of the block was reacher */
+ regproc tsmexaminetuple; /* optional function which can examine tuple contents and
+ decide if tuple should be returned or not */
+ regproc tsmend; /* end scan function*/
+ regproc tsmreset; /* reset state - used by rescan */
+ regproc tsmcost; /* costing function */
+} FormData_pg_tablesample_method;
+
+/* ----------------
+ * Form_pg_tablesample_method corresponds to a pointer to a tuple with
+ * the format of pg_tablesample_method relation.
+ * ----------------
+ */
+typedef FormData_pg_tablesample_method *Form_pg_tablesample_method;
+
+/* ----------------
+ * compiler constants for pg_tablesample_method
+ * ----------------
+ */
+#define Natts_pg_tablesample_method 9
+#define Anum_pg_tablesample_method_tsmname 1
+#define Anum_pg_tablesample_method_tsmseqscan 2
+#define Anum_pg_tablesample_method_tsminit 3
+#define Anum_pg_tablesample_method_tsmnextblock 4
+#define Anum_pg_tablesample_method_tsmnexttuple 5
+#define Anum_pg_tablesample_method_tsmexaminetuple 6
+#define Anum_pg_tablesample_method_tsmend 7
+#define Anum_pg_tablesample_method_tsmreset 8
+#define Anum_pg_tablesample_method_tsmcost 9
+
+/* ----------------
+ * initial contents of pg_tablesample_method
+ * ----------------
+ */
+
+DATA(insert OID = 3283 ( system false tsm_system_init tsm_system_nextblock tsm_system_nexttuple - tsm_system_end tsm_system_reset tsm_system_cost ));
+DESCR("SYSTEM table sampling method");
+DATA(insert OID = 3284 ( bernoulli true tsm_bernoulli_init tsm_bernoulli_nextblock tsm_bernoulli_nexttuple - tsm_bernoulli_end tsm_bernoulli_reset tsm_bernoulli_cost ));
+DESCR("BERNOULLI table sampling method");
+
+#endif /* PG_TABLESAMPLE_METHOD_H */
diff --git a/src/include/executor/nodeSamplescan.h b/src/include/executor/nodeSamplescan.h
new file mode 100644
index 0000000..4b769da
--- /dev/null
+++ b/src/include/executor/nodeSamplescan.h
@@ -0,0 +1,24 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeSamplescan.h
+ *
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/executor/nodeSamplescan.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef NODESAMPLESCAN_H
+#define NODESAMPLESCAN_H
+
+#include "nodes/execnodes.h"
+
+extern SampleScanState *ExecInitSampleScan(SampleScan *node, EState *estate, int eflags);
+extern TupleTableSlot *ExecSampleScan(SampleScanState *node);
+extern void ExecEndSampleScan(SampleScanState *node);
+extern void ExecReScanSampleScan(SampleScanState *node);
+
+#endif /* NODESAMPLESCAN_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 41288ed..e913924 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1212,6 +1212,24 @@ typedef struct ScanState
typedef ScanState SeqScanState;
/*
+ * SampleScan
+ */
+typedef struct SampleScanState
+{
+ ScanState ss;
+
+ /* Sampling method functions. */
+ FmgrInfo tsminit;
+ FmgrInfo tsmnextblock;
+ FmgrInfo tsmnexttuple;
+ FmgrInfo tsmexaminetuple;
+ FmgrInfo tsmend;
+ FmgrInfo tsmreset;
+
+ void *tsmdata; /* for use by table scan method */
+} SampleScanState;
+
+/*
* These structs store information about index quals that don't have simple
* constant right-hand sides. See comments for ExecIndexBuildScanKeys()
* for discussion.
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 97ef0fc..3276be8 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -61,6 +61,7 @@ typedef enum NodeTag
T_ValuesScan,
T_CteScan,
T_WorkTableScan,
+ T_SampleScan,
T_ForeignScan,
T_CustomScan,
T_Join,
@@ -97,6 +98,7 @@ typedef enum NodeTag
T_BitmapOrState,
T_ScanState,
T_SeqScanState,
+ T_SampleScanState,
T_IndexScanState,
T_IndexOnlyScanState,
T_BitmapIndexScanState,
@@ -413,6 +415,8 @@ typedef enum NodeTag
T_XmlSerialize,
T_WithClause,
T_CommonTableExpr,
+ T_RangeTableSample,
+ T_TableSampleClause,
/*
* TAGS FOR REPLICATION GRAMMAR PARSE NODES (replnodes.h)
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index b1dfa85..2f4df1d 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -307,6 +307,25 @@ typedef struct FuncCall
} FuncCall;
/*
+ * TableSampleClause - a sampling method information
+ */
+typedef struct TableSampleClause
+{
+ NodeTag type;
+ Oid tsmid;
+ bool tsmseqscan;
+ Oid tsminit;
+ Oid tsmnextblock;
+ Oid tsmnexttuple;
+ Oid tsmexaminetuple;
+ Oid tsmend;
+ Oid tsmreset;
+ Oid tsmcost;
+ Node *repeatable;
+ List *args;
+} TableSampleClause;
+
+/*
* A_Star - '*' representing all columns of a table or compound field
*
* This can appear within ColumnRef.fields, A_Indirection.indirection, and
@@ -507,6 +526,21 @@ typedef struct RangeFunction
} RangeFunction;
/*
+ * RangeTableSample - represents <table> TABLESAMPLE <method> (<params>) REPEATABLE (<num>)
+ *
+ * We are more generic than SQL Standard so we pass generic function
+ * arguments to the sampling method.
+ */
+typedef struct RangeTableSample
+{
+ NodeTag type;
+ RangeVar *relation;
+ char *method; /* sampling method */
+ Node *repeatable;
+ List *args; /* arguments for sampling method */
+} RangeTableSample;
+
+/*
* ColumnDef - column definition (used in various creates)
*
* If the column has a default value, we may have the value expression
@@ -751,6 +785,7 @@ typedef struct RangeTblEntry
*/
Oid relid; /* OID of the relation */
char relkind; /* relation kind (see pg_class.relkind) */
+ TableSampleClause *tablesample; /* sampling method and parameters */
/*
* Fields valid for a subquery RTE (else NULL):
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index f6683f0..5289c43 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -279,6 +279,12 @@ typedef struct Scan
typedef Scan SeqScan;
/* ----------------
+ * table sample scan node
+ * ----------------
+ */
+typedef Scan SampleScan;
+
+/* ----------------
* index scan node
*
* indexqualorig is an implicitly-ANDed list of index qual expressions, each
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 9c2000b..24003ae 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -68,6 +68,7 @@ extern double index_pages_fetched(double tuples_fetched, BlockNumber pages,
double index_pages, PlannerInfo *root);
extern void cost_seqscan(Path *path, PlannerInfo *root, RelOptInfo *baserel,
ParamPathInfo *param_info);
+extern void cost_samplescan(Path *path, PlannerInfo *root, RelOptInfo *baserel);
extern void cost_index(IndexPath *path, PlannerInfo *root,
double loop_count);
extern void cost_bitmap_heap_scan(Path *path, PlannerInfo *root, RelOptInfo *baserel,
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 9923f0e..89c8ded 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -32,6 +32,8 @@ extern bool add_path_precheck(RelOptInfo *parent_rel,
extern Path *create_seqscan_path(PlannerInfo *root, RelOptInfo *rel,
Relids required_outer);
+extern Path *create_samplescan_path(PlannerInfo *root, RelOptInfo *rel,
+ Relids required_outer);
extern IndexPath *create_index_path(PlannerInfo *root,
IndexOptInfo *index,
List *indexclauses,
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index 7c243ec..6ff7b44 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -312,7 +312,7 @@ PG_KEYWORD("reindex", REINDEX, UNRESERVED_KEYWORD)
PG_KEYWORD("relative", RELATIVE_P, UNRESERVED_KEYWORD)
PG_KEYWORD("release", RELEASE, UNRESERVED_KEYWORD)
PG_KEYWORD("rename", RENAME, UNRESERVED_KEYWORD)
-PG_KEYWORD("repeatable", REPEATABLE, UNRESERVED_KEYWORD)
+PG_KEYWORD("repeatable", REPEATABLE, RESERVED_KEYWORD)
PG_KEYWORD("replace", REPLACE, UNRESERVED_KEYWORD)
PG_KEYWORD("replica", REPLICA, UNRESERVED_KEYWORD)
PG_KEYWORD("reset", RESET, UNRESERVED_KEYWORD)
@@ -366,6 +366,7 @@ PG_KEYWORD("sysid", SYSID, UNRESERVED_KEYWORD)
PG_KEYWORD("system", SYSTEM_P, UNRESERVED_KEYWORD)
PG_KEYWORD("table", TABLE, RESERVED_KEYWORD)
PG_KEYWORD("tables", TABLES, UNRESERVED_KEYWORD)
+PG_KEYWORD("tablesample", TABLESAMPLE, TYPE_FUNC_NAME_KEYWORD)
PG_KEYWORD("tablespace", TABLESPACE, UNRESERVED_KEYWORD)
PG_KEYWORD("temp", TEMP, UNRESERVED_KEYWORD)
PG_KEYWORD("template", TEMPLATE, UNRESERVED_KEYWORD)
diff --git a/src/include/parser/parse_func.h b/src/include/parser/parse_func.h
index 3264691..6727e55 100644
--- a/src/include/parser/parse_func.h
+++ b/src/include/parser/parse_func.h
@@ -33,6 +33,10 @@ typedef enum
extern Node *ParseFuncOrColumn(ParseState *pstate, List *funcname, List *fargs,
FuncCall *fn, int location);
+extern TableSampleClause *ParseTableSample(ParseState *pstate,
+ char *samplemethod,
+ Node *repeatable, List *args);
+
extern FuncDetailCode func_get_detail(List *funcname,
List *fargs, List *fargnames,
int nargs, Oid *argtypes,
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index 6bd786d..185bd81 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -63,7 +63,6 @@ typedef struct RelationAmInfo
FmgrInfo amcanreturn;
} RelationAmInfo;
-
/*
* Here are the contents of a relation cache entry.
*/
diff --git a/src/include/utils/sampling.h b/src/include/utils/sampling.h
index e3e7f9c..4ac208d 100644
--- a/src/include/utils/sampling.h
+++ b/src/include/utils/sampling.h
@@ -15,7 +15,12 @@
#include "storage/bufmgr.h"
-extern double sampler_random_fract(void);
+/* Random generator for sampling code */
+typedef unsigned short SamplerRandomState[3];
+
+extern void sampler_random_init_state(long seed,
+ SamplerRandomState randstate);
+extern double sampler_random_fract(SamplerRandomState randstate);
/* Block sampling methods */
/* Data structure for Algorithm S from Knuth 3.4.2 */
@@ -25,6 +30,7 @@ typedef struct
int n; /* desired sample size */
BlockNumber t; /* current block number */
int m; /* blocks selected so far */
+ SamplerRandomState randstate; /* random generator state */
} BlockSamplerData;
typedef BlockSamplerData *BlockSampler;
@@ -35,7 +41,12 @@ extern bool BlockSampler_HasMore(BlockSampler bs);
extern BlockNumber BlockSampler_Next(BlockSampler bs);
/* Reservoid sampling methods */
-typedef double ReservoirStateData;
+typedef struct
+{
+ double W;
+ SamplerRandomState randstate; /* random generator state */
+} ReservoirStateData;
+
typedef ReservoirStateData *ReservoirState;
extern void reservoir_init_selection_state(ReservoirState rs, int n);
diff --git a/src/include/utils/syscache.h b/src/include/utils/syscache.h
index ba0b090..6b628f6 100644
--- a/src/include/utils/syscache.h
+++ b/src/include/utils/syscache.h
@@ -79,6 +79,8 @@ enum SysCacheIdentifier
RELOID,
RULERELNAME,
STATRELATTINH,
+ TABLESAMPLEMETHODNAME,
+ TABLESAMPLEMETHODOID,
TABLESPACEOID,
TSCONFIGMAP,
TSCONFIGNAMENSP,
diff --git a/src/include/utils/tablesample.h b/src/include/utils/tablesample.h
new file mode 100644
index 0000000..1a24cb6
--- /dev/null
+++ b/src/include/utils/tablesample.h
@@ -0,0 +1,27 @@
+/*--------------------------------------------------------------------------
+ * tablesample.h
+ * Header file for builtin table sampling methods.
+ *
+ * Copyright (c) 2006-2014, PostgreSQL Global Development Group
+ *
+ * src/include/utils/tablesample.h
+ *--------------------------------------------------------------------------
+ */
+#ifndef TABLESAMPLE_H
+#define TABLESAMPLE_H
+
+extern Datum tsm_system_init(PG_FUNCTION_ARGS);
+extern Datum tsm_system_nextblock(PG_FUNCTION_ARGS);
+extern Datum tsm_system_nexttuple(PG_FUNCTION_ARGS);
+extern Datum tsm_system_end(PG_FUNCTION_ARGS);
+extern Datum tsm_system_reset(PG_FUNCTION_ARGS);
+extern Datum tsm_system_cost(PG_FUNCTION_ARGS);
+
+extern Datum tsm_bernoulli_init(PG_FUNCTION_ARGS);
+extern Datum tsm_bernoulli_nextblock(PG_FUNCTION_ARGS);
+extern Datum tsm_bernoulli_nexttuple(PG_FUNCTION_ARGS);
+extern Datum tsm_bernoulli_end(PG_FUNCTION_ARGS);
+extern Datum tsm_bernoulli_reset(PG_FUNCTION_ARGS);
+extern Datum tsm_bernoulli_cost(PG_FUNCTION_ARGS);
+
+#endif /* TABLESAMPLE_H */
diff --git a/src/test/regress/expected/sanity_check.out b/src/test/regress/expected/sanity_check.out
index c7be273..5946edf 100644
--- a/src/test/regress/expected/sanity_check.out
+++ b/src/test/regress/expected/sanity_check.out
@@ -127,6 +127,7 @@ pg_shdepend|t
pg_shdescription|t
pg_shseclabel|t
pg_statistic|t
+pg_tablesample_method|t
pg_tablespace|t
pg_trigger|t
pg_ts_config|t
diff --git a/src/test/regress/expected/tablesample.out b/src/test/regress/expected/tablesample.out
new file mode 100644
index 0000000..ce9abf7
--- /dev/null
+++ b/src/test/regress/expected/tablesample.out
@@ -0,0 +1,168 @@
+CREATE TABLE test_tablesample (id INT, name text) WITH (fillfactor=10); -- force smaller pages so we don't have to load too much data to get multiple pages
+INSERT INTO test_tablesample SELECT i, repeat(i::text, 200) FROM generate_series(0, 9) s(i) ORDER BY i;
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50) REPEATABLE (10);
+ id
+----
+ 0
+ 1
+ 2
+ 3
+ 4
+ 5
+ 9
+(7 rows)
+
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (100.0/11) REPEATABLE (9999);
+ id
+----
+ 6
+ 7
+ 8
+(3 rows)
+
+SELECT count(*) FROM test_tablesample TABLESAMPLE SYSTEM (100);
+ count
+-------
+ 10
+(1 row)
+
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50) REPEATABLE (100);
+ id
+----
+ 0
+ 1
+ 2
+ 6
+ 7
+ 8
+ 9
+(7 rows)
+
+SELECT id FROM test_tablesample TABLESAMPLE BERNOULLI (50) REPEATABLE (100);
+ id
+----
+ 0
+ 1
+ 3
+ 4
+ 5
+(5 rows)
+
+SELECT id FROM test_tablesample TABLESAMPLE BERNOULLI (5.5) REPEATABLE (1);
+ id
+----
+ 0
+ 5
+(2 rows)
+
+CREATE VIEW test_tablesample_v1 AS SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (10*2) REPEATABLE (2);
+CREATE VIEW test_tablesample_v2 AS SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (99);
+SELECT pg_get_viewdef('test_tablesample_v1'::regclass);
+ pg_get_viewdef
+--------------------------------------------------------------------------------
+ SELECT test_tablesample.id +
+ FROM test_tablesample TABLESAMPLE system (((10 * 2))::real) REPEATABLE (2);
+(1 row)
+
+SELECT pg_get_viewdef('test_tablesample_v2'::regclass);
+ pg_get_viewdef
+-----------------------------------------------------------
+ SELECT test_tablesample.id +
+ FROM test_tablesample TABLESAMPLE system ((99)::real);
+(1 row)
+
+BEGIN;
+DECLARE tablesample_cur CURSOR FOR SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50) REPEATABLE (100);
+FETCH FIRST FROM tablesample_cur;
+ id
+----
+ 0
+(1 row)
+
+FETCH NEXT FROM tablesample_cur;
+ id
+----
+ 1
+(1 row)
+
+FETCH NEXT FROM tablesample_cur;
+ id
+----
+ 2
+(1 row)
+
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50) REPEATABLE (10);
+ id
+----
+ 0
+ 1
+ 2
+ 3
+ 4
+ 5
+ 9
+(7 rows)
+
+FETCH NEXT FROM tablesample_cur;
+ id
+----
+ 6
+(1 row)
+
+FETCH NEXT FROM tablesample_cur;
+ id
+----
+ 7
+(1 row)
+
+FETCH NEXT FROM tablesample_cur;
+ id
+----
+ 8
+(1 row)
+
+FETCH FIRST FROM tablesample_cur;
+ id
+----
+ 0
+(1 row)
+
+FETCH NEXT FROM tablesample_cur;
+ id
+----
+ 1
+(1 row)
+
+FETCH NEXT FROM tablesample_cur;
+ id
+----
+ 2
+(1 row)
+
+FETCH NEXT FROM tablesample_cur;
+ id
+----
+ 6
+(1 row)
+
+FETCH NEXT FROM tablesample_cur;
+ id
+----
+ 7
+(1 row)
+
+FETCH NEXT FROM tablesample_cur;
+ id
+----
+ 8
+(1 row)
+
+CLOSE tablesample_cur;
+END;
+-- should fail
+SELECT id FROM test_tablesample_v1 TABLESAMPLE BERNOULLI (1);
+ERROR: TABLESAMPLE clause can only be used on tables and materialized views
+DROP TABLE test_tablesample CASCADE;
+NOTICE: drop cascades to 2 other objects
+DETAIL: drop cascades to view test_tablesample_v1
+drop cascades to view test_tablesample_v2
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index e0ae2f2..e0240ac 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -83,7 +83,7 @@ test: select_into select_distinct select_distinct_on select_implicit select_havi
# ----------
# Another group of parallel tests
# ----------
-test: brin gin gist spgist privileges security_label collate matview lock replica_identity rowsecurity object_address
+test: brin gin gist spgist privileges security_label collate matview lock replica_identity rowsecurity object_address tablesample
# ----------
# Another group of parallel tests
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 7f762bd..9a7611b 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -152,3 +152,4 @@ test: with
test: xml
test: event_trigger
test: stats
+test: tablesample
diff --git a/src/test/regress/sql/tablesample.sql b/src/test/regress/sql/tablesample.sql
new file mode 100644
index 0000000..0d8ce39
--- /dev/null
+++ b/src/test/regress/sql/tablesample.sql
@@ -0,0 +1,42 @@
+CREATE TABLE test_tablesample (id INT, name text) WITH (fillfactor=10); -- force smaller pages so we don't have to load too much data to get multiple pages
+
+INSERT INTO test_tablesample SELECT i, repeat(i::text, 200) FROM generate_series(0, 9) s(i) ORDER BY i;
+
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50) REPEATABLE (10);
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (100.0/11) REPEATABLE (9999);
+SELECT count(*) FROM test_tablesample TABLESAMPLE SYSTEM (100);
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50) REPEATABLE (100);
+SELECT id FROM test_tablesample TABLESAMPLE BERNOULLI (50) REPEATABLE (100);
+SELECT id FROM test_tablesample TABLESAMPLE BERNOULLI (5.5) REPEATABLE (1);
+
+CREATE VIEW test_tablesample_v1 AS SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (10*2) REPEATABLE (2);
+CREATE VIEW test_tablesample_v2 AS SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (99);
+SELECT pg_get_viewdef('test_tablesample_v1'::regclass);
+SELECT pg_get_viewdef('test_tablesample_v2'::regclass);
+
+BEGIN;
+DECLARE tablesample_cur CURSOR FOR SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50) REPEATABLE (100);
+FETCH FIRST FROM tablesample_cur;
+FETCH NEXT FROM tablesample_cur;
+FETCH NEXT FROM tablesample_cur;
+
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50) REPEATABLE (10);
+
+FETCH NEXT FROM tablesample_cur;
+FETCH NEXT FROM tablesample_cur;
+FETCH NEXT FROM tablesample_cur;
+
+FETCH FIRST FROM tablesample_cur;
+FETCH NEXT FROM tablesample_cur;
+FETCH NEXT FROM tablesample_cur;
+FETCH NEXT FROM tablesample_cur;
+FETCH NEXT FROM tablesample_cur;
+FETCH NEXT FROM tablesample_cur;
+
+CLOSE tablesample_cur;
+END;
+
+-- should fail
+SELECT id FROM test_tablesample_v1 TABLESAMPLE BERNOULLI (1);
+
+DROP TABLE test_tablesample CASCADE;
--
1.9.1
0003-tablesample-ddl-v4.patchtext/x-diff; name=0003-tablesample-ddl-v4.patchDownload
>From 01ad447b67cb860fd1ded19b736c8955a80ce5a8 Mon Sep 17 00:00:00 2001
From: Petr Jelinek <pjmodos@pjmodos.net>
Date: Wed, 7 Jan 2015 23:51:44 +0100
Subject: [PATCH 3/3] tablesample-ddl v4
---
doc/src/sgml/ref/allfiles.sgml | 2 +
doc/src/sgml/ref/create_tablesamplemethod.sgml | 173 +++++++++
doc/src/sgml/ref/drop_tablesamplemethod.sgml | 87 +++++
doc/src/sgml/reference.sgml | 2 +
src/backend/catalog/dependency.c | 15 +-
src/backend/catalog/objectaddress.c | 65 +++-
src/backend/commands/Makefile | 6 +-
src/backend/commands/dropcmds.c | 4 +
src/backend/commands/event_trigger.c | 3 +
src/backend/commands/tablecmds.c | 1 +
src/backend/commands/tablesample.c | 422 +++++++++++++++++++++
src/backend/parser/gram.y | 14 +-
src/backend/tcop/utility.c | 12 +
src/bin/pg_dump/common.c | 5 +
src/bin/pg_dump/pg_dump.c | 171 +++++++++
src/bin/pg_dump/pg_dump.h | 10 +-
src/bin/pg_dump/pg_dump_sort.c | 11 +-
src/include/catalog/dependency.h | 1 +
src/include/catalog/pg_tablesample_method.h | 11 +
src/include/nodes/parsenodes.h | 1 +
src/include/parser/kwlist.h | 1 +
src/test/modules/Makefile | 3 +-
src/test/modules/tablesample/.gitignore | 4 +
src/test/modules/tablesample/Makefile | 21 +
.../modules/tablesample/expected/tablesample.out | 38 ++
src/test/modules/tablesample/sql/tablesample.sql | 14 +
src/test/modules/tablesample/tsm_test--1.0.sql | 51 +++
src/test/modules/tablesample/tsm_test.c | 228 +++++++++++
src/test/modules/tablesample/tsm_test.control | 5 +
29 files changed, 1370 insertions(+), 11 deletions(-)
create mode 100644 doc/src/sgml/ref/create_tablesamplemethod.sgml
create mode 100644 doc/src/sgml/ref/drop_tablesamplemethod.sgml
create mode 100644 src/backend/commands/tablesample.c
create mode 100644 src/test/modules/tablesample/.gitignore
create mode 100644 src/test/modules/tablesample/Makefile
create mode 100644 src/test/modules/tablesample/expected/tablesample.out
create mode 100644 src/test/modules/tablesample/sql/tablesample.sql
create mode 100644 src/test/modules/tablesample/tsm_test--1.0.sql
create mode 100644 src/test/modules/tablesample/tsm_test.c
create mode 100644 src/test/modules/tablesample/tsm_test.control
diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index 7aa3128..2fad084 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -78,6 +78,7 @@ Complete list of usable sgml source files in this directory.
<!ENTITY createServer SYSTEM "create_server.sgml">
<!ENTITY createTable SYSTEM "create_table.sgml">
<!ENTITY createTableAs SYSTEM "create_table_as.sgml">
+<!ENTITY createTablesampleMethod SYSTEM "create_tablesamplemethod.sgml">
<!ENTITY createTableSpace SYSTEM "create_tablespace.sgml">
<!ENTITY createTrigger SYSTEM "create_trigger.sgml">
<!ENTITY createTSConfig SYSTEM "create_tsconfig.sgml">
@@ -119,6 +120,7 @@ Complete list of usable sgml source files in this directory.
<!ENTITY dropSequence SYSTEM "drop_sequence.sgml">
<!ENTITY dropServer SYSTEM "drop_server.sgml">
<!ENTITY dropTable SYSTEM "drop_table.sgml">
+<!ENTITY dropTablesampleMethod SYSTEM "drop_tablesamplemethod.sgml">
<!ENTITY dropTableSpace SYSTEM "drop_tablespace.sgml">
<!ENTITY dropTrigger SYSTEM "drop_trigger.sgml">
<!ENTITY dropTSConfig SYSTEM "drop_tsconfig.sgml">
diff --git a/doc/src/sgml/ref/create_tablesamplemethod.sgml b/doc/src/sgml/ref/create_tablesamplemethod.sgml
new file mode 100644
index 0000000..62a8ce4
--- /dev/null
+++ b/doc/src/sgml/ref/create_tablesamplemethod.sgml
@@ -0,0 +1,173 @@
+<!--
+doc/src/sgml/ref/create_tablesamplemethod.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="SQL-CREATETABLESAMPLEMETHOD">
+ <indexterm zone="sql-createtablesamplemethod">
+ <primary>CREATE TABLESAMPLE METHOD</primary>
+ </indexterm>
+
+ <refmeta>
+ <refentrytitle>CREATE TABLESAMPLE METHOD</refentrytitle>
+ <manvolnum>7</manvolnum>
+ <refmiscinfo>SQL - Language Statements</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+ <refname>CREATE TABLESAMPLE METHOD</refname>
+ <refpurpose>define custom tablesample method</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+<synopsis>
+CREATE TABLESAMPLE METHOD <replaceable class="parameter">name</replaceable> (
+ INIT = <replaceable class="parameter">init_function</replaceable> ,
+ NEXTBLOCK = <replaceable class="parameter">nextblock_function</replaceable> ,
+ NEXTTUPLE = <replaceable class="parameter">nexttuple_function</replaceable> ,
+ END = <replaceable class="parameter">end_function</replaceable> ,
+ RESET = <replaceable class="parameter">reset_function</replaceable> ,
+ COST = <replaceable class="parameter">cost_function</replaceable>
+ [ , EXAMINETUPLE = <replaceable class="parameter">examinetuple_function</replaceable> ]
+ [ , SEQSCAN = <replaceable class="parameter">seqscan</replaceable> ]
+)
+</synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+ <title>Description</title>
+
+ <para>
+ <command>CREATE TABLESAMPLE METHOD</command> creates a tablesample method.
+ A tablesample method provides alrorithm for reading sample part of a table
+ when used in <command>TABLESAMPLE</> clause of a <command>SELECT</>
+ statement.
+ </para>
+
+ <para>
+ You must be a superuser to use <command>CREATE TABLESAMPLE METHOD</command>.
+ </para>
+ </refsect1>
+
+ <refsect1>
+ <title>Parameters</title>
+
+ <variablelist>
+ <varlistentry>
+ <term><replaceable class="parameter">name</replaceable></term>
+ <listitem>
+ <para>
+ The name of the tablesample method to be created. This name must be
+ unique within the database.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable class="parameter">init_function</replaceable></term>
+ <listitem>
+ <para>
+ The name of the init function for the tablesample method.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable class="parameter">nextblock_function</replaceable></term>
+ <listitem>
+ <para>
+ The name of the get-next-block function for the tablesample method.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable class="parameter">nexttuple_function</replaceable></term>
+ <listitem>
+ <para>
+ The name of the get-next-tuple function for the tablesample method.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable class="parameter">end_function</replaceable></term>
+ <listitem>
+ <para>
+ The name of the end function for the tablesample method.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable class="parameter">reset_function</replaceable></term>
+ <listitem>
+ <para>
+ The name of the reset function for the tablesample method.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable class="parameter">cost_function</replaceable></term>
+ <listitem>
+ <para>
+ The name of the costing function for the tablesample method.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable class="parameter">examinetuple_function</replaceable></term>
+ <listitem>
+ <para>
+ The name of the function for inspecting the tuple contents in order
+ to make decision if it should be returned or not. This parameter
+ is optional.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable class="parameter">seqscan</replaceable></term>
+ <listitem>
+ <para>
+ True if the sampling method will do sequential scan of the whole table.
+ Used for cost estimation and syncscan. The default value if not specified
+ is False.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ </variablelist>
+
+ <para>
+ The function names can be schema-qualified if necessary. Argument types
+ are not given, since the argument list for each type of function is
+ predetermined. All functions are required.
+ </para>
+
+ <para>
+ The arguments can appear in any order, not only the one shown above.
+ </para>
+ </refsect1>
+
+ <refsect1>
+ <title>Compatibility</title>
+
+ <para>
+ There is no
+ <command>CREATE TABLESAMPLE METHOD</command> statement in the SQL
+ standard.
+ </para>
+ </refsect1>
+
+ <refsect1>
+ <title>See Also</title>
+
+ <simplelist type="inline">
+ <member><xref linkend="sql-droptablesamplemethod"></member>
+ <member><xref linkend="sql-select"></member>
+ </simplelist>
+ </refsect1>
+</refentry>
diff --git a/doc/src/sgml/ref/drop_tablesamplemethod.sgml b/doc/src/sgml/ref/drop_tablesamplemethod.sgml
new file mode 100644
index 0000000..dffd2ec
--- /dev/null
+++ b/doc/src/sgml/ref/drop_tablesamplemethod.sgml
@@ -0,0 +1,87 @@
+<!--
+doc/src/sgml/ref/drop_tablesamplemethod.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="SQL-DROPTABLESAMPLEMETHOD">
+ <indexterm zone="sql-droptablesamplemethod">
+ <primary>DROP TABLESAMPLE METHOD</primary>
+ </indexterm>
+
+ <refmeta>
+ <refentrytitle>DROP TABLESAMPLE METHOD</refentrytitle>
+ <manvolnum>7</manvolnum>
+ <refmiscinfo>SQL - Language Statements</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+ <refname>DROP TABLESAMPLE METHOD</refname>
+ <refpurpose>remove a custom tablesample method</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+<synopsis>
+DROP TABLESAMPLE METHOD [ IF EXISTS ] <replaceable class="parameter">name</replaceable>
+</synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+ <title>Description</title>
+
+ <para>
+ <command>DROP TABLESAMPLE METHOD</command> drop an existing tablesample
+ method.
+ </para>
+
+ <para>
+ You must be a superuser to use <command>CREATE TABLESAMPLE METHOD</command>.
+ </para>
+ </refsect1>
+
+ <refsect1>
+ <title>Parameters</title>
+
+ <variablelist>
+
+ <varlistentry>
+ <term><literal>IF EXISTS</literal></term>
+ <listitem>
+ <para>
+ Do not throw an error if the tablesample method does not exist.
+ A notice is issued in this case.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable class="parameter">name</replaceable></term>
+ <listitem>
+ <para>
+ The name of an existing tablesample method to be removed.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ </variablelist>
+
+ </refsect1>
+
+ <refsect1>
+ <title>Compatibility</title>
+
+ <para>
+ There is no
+ <command>DROP TABLESAMPLE METHOD</command> statement in the SQL
+ standard.
+ </para>
+ </refsect1>
+
+ <refsect1>
+ <title>See Also</title>
+
+ <simplelist type="inline">
+ <member><xref linkend="sql-createtablesamplemethod"></member>
+ <member><xref linkend="sql-select"></member>
+ </simplelist>
+ </refsect1>
+</refentry>
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index 10c9a6d..2c09a3c 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -106,6 +106,7 @@
&createServer;
&createTable;
&createTableAs;
+ &createTablesampleMethod;
&createTableSpace;
&createTSConfig;
&createTSDictionary;
@@ -147,6 +148,7 @@
&dropSequence;
&dropServer;
&dropTable;
+ &dropTablesampleMethod;
&dropTableSpace;
&dropTSConfig;
&dropTSDictionary;
diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index bacb242..6acb5b3 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -46,6 +46,7 @@
#include "catalog/pg_policy.h"
#include "catalog/pg_proc.h"
#include "catalog/pg_rewrite.h"
+#include "catalog/pg_tablesample_method.h"
#include "catalog/pg_tablespace.h"
#include "catalog/pg_trigger.h"
#include "catalog/pg_ts_config.h"
@@ -157,7 +158,8 @@ static const Oid object_classes[MAX_OCLASS] = {
DefaultAclRelationId, /* OCLASS_DEFACL */
ExtensionRelationId, /* OCLASS_EXTENSION */
EventTriggerRelationId, /* OCLASS_EVENT_TRIGGER */
- PolicyRelationId /* OCLASS_POLICY */
+ PolicyRelationId, /* OCLASS_POLICY */
+ TableSampleMethodRelationId /* OCLASS_TABLESAMPLEMETHOD */
};
@@ -1265,6 +1267,10 @@ doDeletion(const ObjectAddress *object, int flags)
RemovePolicyById(object->objectId);
break;
+ case OCLASS_TABLESAMPLEMETHOD:
+ RemoveTablesampleMethodById(object->objectId);
+ break;
+
default:
elog(ERROR, "unrecognized object class: %u",
object->classId);
@@ -1794,6 +1800,10 @@ find_expr_references_walker(Node *node,
case RTE_RELATION:
add_object_address(OCLASS_CLASS, rte->relid, 0,
context->addrs);
+ if (rte->tablesample)
+ add_object_address(OCLASS_TABLESAMPLEMETHOD,
+ rte->tablesample->tsmid, 0,
+ context->addrs);
break;
default:
break;
@@ -2373,6 +2383,9 @@ getObjectClass(const ObjectAddress *object)
case PolicyRelationId:
return OCLASS_POLICY;
+
+ case TableSampleMethodRelationId:
+ return OCLASS_TABLESAMPLEMETHOD;
}
/* shouldn't get here */
diff --git a/src/backend/catalog/objectaddress.c b/src/backend/catalog/objectaddress.c
index d899dd7..6d8d129 100644
--- a/src/backend/catalog/objectaddress.c
+++ b/src/backend/catalog/objectaddress.c
@@ -44,6 +44,7 @@
#include "catalog/pg_proc.h"
#include "catalog/pg_policy.h"
#include "catalog/pg_rewrite.h"
+#include "catalog/pg_tablesample_method.h"
#include "catalog/pg_tablespace.h"
#include "catalog/pg_trigger.h"
#include "catalog/pg_ts_config.h"
@@ -429,7 +430,19 @@ static const ObjectPropertyType ObjectProperty[] =
Anum_pg_type_typacl,
ACL_KIND_TYPE,
true
- }
+ },
+ {
+ TableSampleMethodRelationId,
+ TableSampleMethodOidIndexId,
+ TABLESAMPLEMETHODOID,
+ TABLESAMPLEMETHODNAME,
+ Anum_pg_tablesample_method_tsmname,
+ InvalidAttrNumber,
+ InvalidAttrNumber,
+ InvalidAttrNumber,
+ -1,
+ true
+ },
};
/*
@@ -528,7 +541,9 @@ ObjectTypeMap[] =
/* OCLASS_EVENT_TRIGGER */
{ "event trigger", OBJECT_EVENT_TRIGGER },
/* OCLASS_POLICY */
- { "policy", OBJECT_POLICY }
+ { "policy", OBJECT_POLICY },
+ /* OCLASS_TABLESAMPLEMETHOD */
+ { "tablesample method", OBJECT_TABLESAMPLEMETHOD }
};
@@ -670,6 +685,7 @@ get_object_address(ObjectType objtype, List *objname, List *objargs,
case OBJECT_FDW:
case OBJECT_FOREIGN_SERVER:
case OBJECT_EVENT_TRIGGER:
+ case OBJECT_TABLESAMPLEMETHOD:
address = get_object_address_unqualified(objtype,
objname, missing_ok);
break;
@@ -896,6 +912,9 @@ get_object_address_unqualified(ObjectType objtype,
case OBJECT_EVENT_TRIGGER:
msg = gettext_noop("event trigger name cannot be qualified");
break;
+ case OBJECT_TABLESAMPLEMETHOD:
+ msg = gettext_noop("tablesample method name cannot be qualified");
+ break;
default:
elog(ERROR, "unrecognized objtype: %d", (int) objtype);
msg = NULL; /* placate compiler */
@@ -956,6 +975,11 @@ get_object_address_unqualified(ObjectType objtype,
address.objectId = get_event_trigger_oid(name, missing_ok);
address.objectSubId = 0;
break;
+ case OBJECT_TABLESAMPLEMETHOD:
+ address.classId = TableSampleMethodRelationId;
+ address.objectId = get_tablesample_method_oid(name, missing_ok);
+ address.objectSubId = 0;
+ break;
default:
elog(ERROR, "unrecognized objtype: %d", (int) objtype);
/* placate compiler, which doesn't know elog won't return */
@@ -1720,6 +1744,7 @@ check_object_ownership(Oid roleid, ObjectType objtype, ObjectAddress address,
break;
case OBJECT_TSPARSER:
case OBJECT_TSTEMPLATE:
+ case OBJECT_TABLESAMPLEMETHOD:
/* We treat these object types as being owned by superusers */
if (!superuser_arg(roleid))
ereport(ERROR,
@@ -2654,6 +2679,21 @@ getObjectDescription(const ObjectAddress *object)
break;
}
+ case OCLASS_TABLESAMPLEMETHOD:
+ {
+ HeapTuple tup;
+
+ tup = SearchSysCache1(TABLESAMPLEMETHODOID,
+ ObjectIdGetDatum(object->objectId));
+ if (!HeapTupleIsValid(tup))
+ elog(ERROR, "cache lookup failed for tablesample method %u",
+ object->objectId);
+ appendStringInfo(&buffer, _("tablesample method %s"),
+ NameStr(((Form_pg_tablesample_method) GETSTRUCT(tup))->tsmname));
+ ReleaseSysCache(tup);
+ break;
+ }
+
default:
appendStringInfo(&buffer, "unrecognized object %u %u %d",
object->classId,
@@ -3131,6 +3171,10 @@ getObjectTypeDescription(const ObjectAddress *object)
appendStringInfoString(&buffer, "policy");
break;
+ case OCLASS_TABLESAMPLEMETHOD:
+ appendStringInfoString(&buffer, "tablesample method");
+ break;
+
default:
appendStringInfo(&buffer, "unrecognized %u", object->classId);
break;
@@ -4029,6 +4073,23 @@ getObjectIdentityParts(const ObjectAddress *object,
break;
}
+ case OCLASS_TABLESAMPLEMETHOD:
+ {
+ HeapTuple tup;
+ Form_pg_tablesample_method tsmForm;
+
+ tup = SearchSysCache1(TABLESAMPLEMETHODOID,
+ ObjectIdGetDatum(object->objectId));
+ if (!HeapTupleIsValid(tup))
+ elog(ERROR, "cache lookup failed for tablesample method %u",
+ object->objectId);
+ tsmForm = (Form_pg_tablesample_method) GETSTRUCT(tup);
+ appendStringInfoString(&buffer,
+ quote_identifier(NameStr(tsmForm->tsmname)));
+ ReleaseSysCache(tup);
+ break;
+ }
+
default:
appendStringInfo(&buffer, "unrecognized object %u %u %d",
object->classId,
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index b1ac704..04fcd8c 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -18,8 +18,8 @@ OBJS = aggregatecmds.o alter.o analyze.o async.o cluster.o comment.o \
event_trigger.o explain.o extension.o foreigncmds.o functioncmds.o \
indexcmds.o lockcmds.o matview.o operatorcmds.o opclasscmds.o \
policy.o portalcmds.o prepare.o proclang.o \
- schemacmds.o seclabel.o sequence.o tablecmds.o tablespace.o trigger.o \
- tsearchcmds.o typecmds.o user.o vacuum.o vacuumlazy.o \
- variable.o view.o
+ schemacmds.o seclabel.o sequence.o tablecmds.o tablesample.o \
+ tablespace.o trigger.o tsearchcmds.o typecmds.o user.o vacuum.o \
+ vacuumlazy.o variable.o view.o
include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/commands/dropcmds.c b/src/backend/commands/dropcmds.c
index e5185ba..04d29a2 100644
--- a/src/backend/commands/dropcmds.c
+++ b/src/backend/commands/dropcmds.c
@@ -421,6 +421,10 @@ does_not_exist_skipping(ObjectType objtype, List *objname, List *objargs)
args = strVal(linitial(objargs));
}
break;
+ case OBJECT_TABLESAMPLEMETHOD:
+ msg = gettext_noop("tablesample method \"%s\" does not exist, skipping");
+ name = NameListToString(objname);
+ break;
default:
elog(ERROR, "unexpected object type (%d)", (int) objtype);
break;
diff --git a/src/backend/commands/event_trigger.c b/src/backend/commands/event_trigger.c
index a33a5ad..f20e9f7 100644
--- a/src/backend/commands/event_trigger.c
+++ b/src/backend/commands/event_trigger.c
@@ -97,6 +97,7 @@ static event_trigger_support_data event_trigger_support[] = {
{"SEQUENCE", true},
{"SERVER", true},
{"TABLE", true},
+ {"TABLESAMPLE METHOD", true},
{"TABLESPACE", false},
{"TRIGGER", true},
{"TEXT SEARCH CONFIGURATION", true},
@@ -1078,6 +1079,7 @@ EventTriggerSupportsObjectType(ObjectType obtype)
case OBJECT_SEQUENCE:
case OBJECT_TABCONSTRAINT:
case OBJECT_TABLE:
+ case OBJECT_TABLESAMPLEMETHOD:
case OBJECT_TRIGGER:
case OBJECT_TSCONFIGURATION:
case OBJECT_TSDICTIONARY:
@@ -1134,6 +1136,7 @@ EventTriggerSupportsObjectClass(ObjectClass objclass)
case OCLASS_DEFACL:
case OCLASS_EXTENSION:
case OCLASS_POLICY:
+ case OCLASS_TABLESAMPLEMETHOD:
return true;
case MAX_OCLASS:
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index b2993b8..1e408bc 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -8062,6 +8062,7 @@ ATExecAlterColumnType(AlteredTableInfo *tab, Relation rel,
case OCLASS_USER_MAPPING:
case OCLASS_DEFACL:
case OCLASS_EXTENSION:
+ case OCLASS_TABLESAMPLEMETHOD:
/*
* We don't expect any of these sorts of objects to depend on
diff --git a/src/backend/commands/tablesample.c b/src/backend/commands/tablesample.c
new file mode 100644
index 0000000..ed8102d
--- /dev/null
+++ b/src/backend/commands/tablesample.c
@@ -0,0 +1,422 @@
+/*-------------------------------------------------------------------------
+ *
+ * tablesample.c
+ * Commands to manipulate tablesample methods
+ *
+ * Table sampling methods provide algorithms for doing sample scan over
+ * the table.
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ * src/backend/commands/tablesample.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <unistd.h>
+#include <dirent.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "catalog/dependency.h"
+#include "catalog/indexing.h"
+#include "catalog/objectaccess.h"
+#include "catalog/pg_proc.h"
+#include "catalog/pg_tablesample_method.h"
+#include "catalog/pg_type.h"
+#include "commands/defrem.h"
+#include "miscadmin.h"
+#include "parser/parse_func.h"
+#include "utils/builtins.h"
+#include "utils/rel.h"
+#include "utils/lsyscache.h"
+#include "utils/syscache.h"
+
+
+static Datum
+get_tablesample_method_func(DefElem *defel, int attnum)
+{
+ List *funcName = defGetQualifiedName(defel);
+ /* Big enough size for our needs. */
+ Oid *typeId = palloc0(7 * sizeof(Oid));
+ Oid retTypeId;
+ int nargs;
+ Oid procOid = InvalidOid;
+ FuncCandidateList clist;
+
+ switch (attnum)
+ {
+ case Anum_pg_tablesample_method_tsminit:
+ /*
+ * tsminit needs special handling because it is defined as function
+ * with 3 or more arguments and only first two arguments must have
+ * specific type, the rest is up to the tablesample method creator.
+ */
+ {
+ nargs = 2;
+ typeId[0] = INTERNALOID;
+ typeId[1] = INT4OID;
+ retTypeId = VOIDOID;
+
+ clist = FuncnameGetCandidates(funcName, -1, NIL, false, false, false);
+
+ while (clist)
+ {
+ if (clist->nargs >= 3 &&
+ memcmp(typeId, clist->args, nargs * sizeof(Oid)) == 0)
+ {
+ procOid = clist->oid;
+ /* Save real function signature for future errors. */
+ nargs = clist->nargs;
+ pfree(typeId);
+ typeId = clist->args;
+ break;
+ }
+ clist = clist->next;
+ }
+
+ if (!OidIsValid(procOid))
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_FUNCTION),
+ errmsg("function \"%s\" does not exist or does not have valid signature",
+ NameListToString(funcName)),
+ errhint("The tamplesample method init function "
+ "must have at least 3 input parameters "
+ "with first one of type INTERNAL and second of type INTEGER.")));
+ }
+ break;
+
+ case Anum_pg_tablesample_method_tsmnextblock:
+ nargs = 1;
+ typeId[0] = INTERNALOID;
+ retTypeId = INT4OID;
+ break;
+ case Anum_pg_tablesample_method_tsmnexttuple:
+ nargs = 3;
+ typeId[0] = INTERNALOID;
+ typeId[1] = INT4OID;
+ typeId[2] = INT2OID;
+ retTypeId = INT2OID;
+ break;
+ case Anum_pg_tablesample_method_tsmexaminetuple:
+ nargs = 4;
+ typeId[0] = INTERNALOID;
+ typeId[1] = INT4OID;
+ typeId[2] = INTERNALOID;
+ typeId[3] = BOOLOID;
+ retTypeId = BOOLOID;
+ break;
+ case Anum_pg_tablesample_method_tsmend:
+ case Anum_pg_tablesample_method_tsmreset:
+ nargs = 1;
+ typeId[0] = INTERNALOID;
+ retTypeId = VOIDOID;
+ break;
+ case Anum_pg_tablesample_method_tsmcost:
+ nargs = 7;
+ typeId[0] = INTERNALOID;
+ typeId[1] = INTERNALOID;
+ typeId[2] = INTERNALOID;
+ typeId[3] = INTERNALOID;
+ typeId[4] = INTERNALOID;
+ typeId[5] = INTERNALOID;
+ typeId[6] = INTERNALOID;
+ retTypeId = VOIDOID;
+ break;
+ default:
+ /* should not be here */
+ elog(ERROR, "unrecognized attribute for tablesample method: %d",
+ attnum);
+ nargs = 0; /* keep compiler quiet */
+ }
+
+ if (!OidIsValid(procOid))
+ procOid = LookupFuncName(funcName, nargs, typeId, false);
+ if (get_func_rettype(procOid) != retTypeId)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("function %s should return type %s",
+ func_signature_string(funcName, nargs, NIL, typeId),
+ format_type_be(retTypeId))));
+
+ return ObjectIdGetDatum(procOid);
+}
+
+/*
+ * make pg_depend entries for a new pg_tablesample_method entry
+ */
+static void
+makeTablesampleMethodDeps(HeapTuple tuple)
+{
+ Form_pg_tablesample_method tsm = (Form_pg_tablesample_method) GETSTRUCT(tuple);
+ ObjectAddress myself,
+ referenced;
+
+ myself.classId = TableSampleMethodRelationId;
+ myself.objectId = HeapTupleGetOid(tuple);
+ myself.objectSubId = 0;
+
+ /* dependency on extension */
+ recordDependencyOnCurrentExtension(&myself, false);
+
+ /* dependencies on functions */
+ referenced.classId = ProcedureRelationId;
+ referenced.objectSubId = 0;
+
+ referenced.objectId = tsm->tsminit;
+ recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);
+
+ referenced.objectId = tsm->tsmnextblock;
+ recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);
+
+ referenced.objectId = tsm->tsmnexttuple;
+ recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);
+
+ if (OidIsValid(tsm->tsmexaminetuple))
+ {
+ referenced.objectId = tsm->tsmexaminetuple;
+ recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);
+ }
+
+ referenced.objectId = tsm->tsmend;
+ recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);
+
+ referenced.objectId = tsm->tsmreset;
+ recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);
+
+ referenced.objectId = tsm->tsmcost;
+ recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);
+}
+
+/*
+ * Create a table sampling method
+ *
+ * Only superusers can create a table sampling methods.
+ */
+Oid
+DefineTablesampleMethod(List *names, List *parameters)
+{
+ char *tsmname = strVal(linitial(names));
+ Oid tsmoid;
+ ListCell *pl;
+ Relation rel;
+ Datum values[Natts_pg_tablesample_method];
+ bool nulls[Natts_pg_tablesample_method];
+ HeapTuple tuple;
+
+ /* Must be super user. */
+ if (!superuser())
+ ereport(ERROR,
+ (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+ errmsg("permission denied to create tablesample method \"%s\"",
+ tsmname),
+ errhint("Must be superuser to create a tablesample method.")));
+
+ /* Must not already exist. */
+ tsmoid = get_tablesample_method_oid(tsmname, true);
+ if (OidIsValid(tsmoid))
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_OBJECT),
+ errmsg("tablesample method \"%s\" already exists",
+ tsmname)));
+
+ /* Initialize the values. */
+ memset(values, 0, sizeof(values));
+ memset(nulls, false, sizeof(nulls));
+
+ values[Anum_pg_tablesample_method_tsmname - 1] =
+ DirectFunctionCall1(namein, CStringGetDatum(tsmname));
+
+ /*
+ * loop over the definition list and extract the information we need.
+ */
+ foreach(pl, parameters)
+ {
+ DefElem *defel = (DefElem *) lfirst(pl);
+
+ if (pg_strcasecmp(defel->defname, "seqscan") == 0)
+ {
+ values[Anum_pg_tablesample_method_tsmseqscan - 1] =
+ BoolGetDatum(defGetBoolean(defel));
+ }
+ else if (pg_strcasecmp(defel->defname, "init") == 0)
+ {
+ values[Anum_pg_tablesample_method_tsminit - 1] =
+ get_tablesample_method_func(defel,
+ Anum_pg_tablesample_method_tsminit);
+ }
+ else if (pg_strcasecmp(defel->defname, "nextblock") == 0)
+ {
+ values[Anum_pg_tablesample_method_tsmnextblock - 1] =
+ get_tablesample_method_func(defel,
+ Anum_pg_tablesample_method_tsmnextblock);
+ }
+ else if (pg_strcasecmp(defel->defname, "nexttuple") == 0)
+ {
+ values[Anum_pg_tablesample_method_tsmnexttuple - 1] =
+ get_tablesample_method_func(defel,
+ Anum_pg_tablesample_method_tsmnexttuple);
+ }
+ else if (pg_strcasecmp(defel->defname, "examinetuple") == 0)
+ {
+ values[Anum_pg_tablesample_method_tsmexaminetuple - 1] =
+ get_tablesample_method_func(defel,
+ Anum_pg_tablesample_method_tsmexaminetuple);
+ }
+ else if (pg_strcasecmp(defel->defname, "end") == 0)
+ {
+ values[Anum_pg_tablesample_method_tsmend - 1] =
+ get_tablesample_method_func(defel,
+ Anum_pg_tablesample_method_tsmend);
+ }
+ else if (pg_strcasecmp(defel->defname, "reset") == 0)
+ {
+ values[Anum_pg_tablesample_method_tsmreset - 1] =
+ get_tablesample_method_func(defel,
+ Anum_pg_tablesample_method_tsmreset);
+ }
+ else if (pg_strcasecmp(defel->defname, "cost") == 0)
+ {
+ values[Anum_pg_tablesample_method_tsmcost - 1] =
+ get_tablesample_method_func(defel,
+ Anum_pg_tablesample_method_tsmcost);
+ }
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("tablesample method parameter \"%s\" not recognized",
+ defel->defname)));
+ }
+
+ /*
+ * Validation.
+ */
+ if (!OidIsValid(DatumGetObjectId(values[Anum_pg_tablesample_method_tsminit - 1])))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("tablesample method init function is required")));
+
+ if (!OidIsValid(DatumGetObjectId(values[Anum_pg_tablesample_method_tsmnextblock - 1])))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("tablesample method nextblock function is required")));
+
+ if (!OidIsValid(DatumGetObjectId(values[Anum_pg_tablesample_method_tsmnexttuple - 1])))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("tablesample method nexttuple function is required")));
+
+ if (!OidIsValid(DatumGetObjectId(values[Anum_pg_tablesample_method_tsmend - 1])))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("tablesample method end function is required")));
+
+ if (!OidIsValid(DatumGetObjectId(values[Anum_pg_tablesample_method_tsmreset - 1])))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("tablesample method reset function is required")));
+
+ if (!OidIsValid(DatumGetObjectId(values[Anum_pg_tablesample_method_tsmcost - 1])))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("tablesample method cost function is required")));
+
+ /*
+ * Insert tuple into pg_tablesample_method.
+ */
+ rel = heap_open(TableSampleMethodRelationId, RowExclusiveLock);
+
+ tuple = heap_form_tuple(rel->rd_att, values, nulls);
+
+ tsmoid = simple_heap_insert(rel, tuple);
+
+ CatalogUpdateIndexes(rel, tuple);
+
+ makeTablesampleMethodDeps(tuple);
+
+ heap_freetuple(tuple);
+
+ /* Post creation hook for new tablesample method */
+ InvokeObjectPostCreateHook(TableSampleMethodRelationId, tsmoid, 0);
+
+ heap_close(rel, RowExclusiveLock);
+
+ return tsmoid;
+}
+
+/*
+ * Drop a tablesample method.
+ */
+void
+RemoveTablesampleMethodById(Oid tsmoid)
+{
+ Relation rel;
+ HeapTuple tuple;
+ Form_pg_tablesample_method tsm;
+
+ /*
+ * Find the target tuple
+ */
+ rel = heap_open(TableSampleMethodRelationId, RowExclusiveLock);
+
+ tuple = SearchSysCache1(TABLESAMPLEMETHODOID, ObjectIdGetDatum(tsmoid));
+ if (!HeapTupleIsValid(tuple))
+ elog(ERROR, "cache lookup failed for tablesample method %u",
+ tsmoid);
+
+ tsm = (Form_pg_tablesample_method) GETSTRUCT(tuple);
+ /* Can't drop builtin tablesample methods. */
+ if (tsmoid == TABLESAMPLE_METHOD_SYSTEM_OID ||
+ tsmoid == TABLESAMPLE_METHOD_BERNOULLI_OID)
+ ereport(ERROR,
+ (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+ errmsg("permission denied for tablesample method %s",
+ NameStr(tsm->tsmname))));
+
+ /*
+ * Remove the pg_tablespace tuple (this will roll back if we fail below)
+ */
+ simple_heap_delete(rel, &tuple->t_self);
+
+ ReleaseSysCache(tuple);
+
+ heap_close(rel, RowExclusiveLock);
+}
+
+/*
+ * get_tablesample_method_oid - given a tablesample method name,
+ * look up the OID
+ *
+ * If missing_ok is false, throw an error if tablesample method name not found.
+ * If true, just return InvalidOid.
+ */
+Oid
+get_tablesample_method_oid(const char *tsmname, bool missing_ok)
+{
+ Oid result;
+ HeapTuple tuple;
+
+ tuple = SearchSysCache1(TABLESAMPLEMETHODNAME, PointerGetDatum(tsmname));
+ if (HeapTupleIsValid(tuple))
+ {
+ result = HeapTupleGetOid(tuple);
+ ReleaseSysCache(tuple);
+ }
+ else
+ result = InvalidOid;
+
+ if (!OidIsValid(result) && !missing_ok)
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_OBJECT),
+ errmsg("tablesample method \"%s\" does not exist",
+ tsmname)));
+
+ return result;
+}
+
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index ac5e095..4578b5e 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -586,7 +586,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
LEADING LEAKPROOF LEAST LEFT LEVEL LIKE LIMIT LISTEN LOAD LOCAL
LOCALTIME LOCALTIMESTAMP LOCATION LOCK_P LOCKED LOGGED
- MAPPING MATCH MATERIALIZED MAXVALUE MINUTE_P MINVALUE MODE MONTH_P MOVE
+ MAPPING MATCH MATERIALIZED MAXVALUE METHOD MINUTE_P MINVALUE MODE MONTH_P
+ MOVE
NAME_P NAMES NATIONAL NATURAL NCHAR NEXT NO NONE
NOT NOTHING NOTIFY NOTNULL NOWAIT NULL_P NULLIF
@@ -5094,6 +5095,15 @@ DefineStmt:
n->definition = list_make1(makeDefElem("from", (Node *) $5));
$$ = (Node *)n;
}
+ | CREATE TABLESAMPLE METHOD name definition
+ {
+ DefineStmt *n = makeNode(DefineStmt);
+ n->kind = OBJECT_TABLESAMPLEMETHOD;
+ n->args = NIL;
+ n->defnames = list_make1(makeString($4));
+ n->definition = $5;
+ $$ = (Node *)n;
+ }
;
definition: '(' def_list ')' { $$ = $2; }
@@ -5552,6 +5562,7 @@ drop_type: TABLE { $$ = OBJECT_TABLE; }
| TEXT_P SEARCH DICTIONARY { $$ = OBJECT_TSDICTIONARY; }
| TEXT_P SEARCH TEMPLATE { $$ = OBJECT_TSTEMPLATE; }
| TEXT_P SEARCH CONFIGURATION { $$ = OBJECT_TSCONFIGURATION; }
+ | TABLESAMPLE METHOD { $$ = OBJECT_TABLESAMPLEMETHOD; }
;
any_name_list:
@@ -13313,6 +13324,7 @@ unreserved_keyword:
| MATCH
| MATERIALIZED
| MAXVALUE
+ | METHOD
| MINUTE_P
| MINVALUE
| MODE
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 3533cfa..532256d 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -23,6 +23,7 @@
#include "access/xlog.h"
#include "catalog/catalog.h"
#include "catalog/namespace.h"
+#include "catalog/pg_tablesample_method.h"
#include "catalog/toasting.h"
#include "commands/alter.h"
#include "commands/async.h"
@@ -1106,6 +1107,11 @@ ProcessUtilitySlow(Node *parsetree,
Assert(stmt->args == NIL);
DefineCollation(stmt->defnames, stmt->definition);
break;
+ case OBJECT_TABLESAMPLEMETHOD:
+ Assert(stmt->args == NIL);
+ Assert(list_length(stmt->defnames) == 1);
+ DefineTablesampleMethod(stmt->defnames, stmt->definition);
+ break;
default:
elog(ERROR, "unrecognized define stmt type: %d",
(int) stmt->kind);
@@ -1960,6 +1966,9 @@ CreateCommandTag(Node *parsetree)
case OBJECT_POLICY:
tag = "DROP POLICY";
break;
+ case OBJECT_TABLESAMPLEMETHOD:
+ tag = "DROP TABLESAMPLE METHOD";
+ break;
default:
tag = "???";
}
@@ -2056,6 +2065,9 @@ CreateCommandTag(Node *parsetree)
case OBJECT_COLLATION:
tag = "CREATE COLLATION";
break;
+ case OBJECT_TABLESAMPLEMETHOD:
+ tag = "CREATE TABLESAMPLE METHOD";
+ break;
default:
tag = "???";
}
diff --git a/src/bin/pg_dump/common.c b/src/bin/pg_dump/common.c
index 1a0a587..8a64e4b 100644
--- a/src/bin/pg_dump/common.c
+++ b/src/bin/pg_dump/common.c
@@ -103,6 +103,7 @@ getSchemaData(Archive *fout, DumpOptions *dopt, int *numTablesPtr)
int numForeignServers;
int numDefaultACLs;
int numEventTriggers;
+ int numTSMs;
if (g_verbose)
write_msg(NULL, "reading schemas\n");
@@ -251,6 +252,10 @@ getSchemaData(Archive *fout, DumpOptions *dopt, int *numTablesPtr)
write_msg(NULL, "reading policies\n");
getPolicies(fout, tblinfo, numTables);
+ if (g_verbose)
+ write_msg(NULL, "reading tablesample methods\n");
+ getTableSampleMethods(fout, &numTSMs);
+
*numTablesPtr = numTables;
return tblinfo;
}
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 2b53c72..0af75f4 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -182,6 +182,7 @@ static void dumpSequenceData(Archive *fout, TableDataInfo *tdinfo);
static void dumpIndex(Archive *fout, DumpOptions *dopt, IndxInfo *indxinfo);
static void dumpConstraint(Archive *fout, DumpOptions *dopt, ConstraintInfo *coninfo);
static void dumpTableConstraintComment(Archive *fout, DumpOptions *dopt, ConstraintInfo *coninfo);
+static void dumpTableSampleMethod(Archive *fout, DumpOptions *dopt, TSMInfo *tbinfo);
static void dumpTSParser(Archive *fout, DumpOptions *dopt, TSParserInfo *prsinfo);
static void dumpTSDictionary(Archive *fout, DumpOptions *dopt, TSDictInfo *dictinfo);
static void dumpTSTemplate(Archive *fout, DumpOptions *dopt, TSTemplateInfo *tmplinfo);
@@ -7177,6 +7178,75 @@ getTableAttrs(Archive *fout, DumpOptions *dopt, TableInfo *tblinfo, int numTable
}
/*
+ * getTableSampleMethods:
+ * read all tablesample methods in the system catalogs and return them
+ * in the TSMInfo* structure
+ *
+ * numTSMs is set to the number of tablesample methods read in
+ */
+TSMInfo *
+getTableSampleMethods(Archive *fout, int *numTSMs)
+{
+ PGresult *res;
+ int ntups;
+ int i;
+ PQExpBuffer query;
+ TSMInfo *tsminfo;
+ int i_tableoid,
+ i_oid,
+ i_tsmname,
+ i_tsmseqscan;
+
+ /* Before 9.5, there were no tablesample methods */
+ if (fout->remoteVersion < 90500)
+ {
+ *numTSMs = 0;
+ return NULL;
+ }
+
+ query = createPQExpBuffer();
+
+ appendPQExpBuffer(query,
+ "SELECT tableoid, oid, tsmname, tsmseqscan "
+ "FROM pg_catalog.pg_tablesample_method "
+ "WHERE oid >= '%u'::pg_catalog.oid",
+ FirstNormalObjectId);
+
+ res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+
+ ntups = PQntuples(res);
+ *numTSMs = ntups;
+
+ tsminfo = (TSMInfo *) pg_malloc(ntups * sizeof(TSMInfo));
+
+ i_tableoid = PQfnumber(res, "tableoid");
+ i_oid = PQfnumber(res, "oid");
+ i_tsmname = PQfnumber(res, "tsmname");
+ i_tsmseqscan = PQfnumber(res, "tsmseqscan");
+
+ for (i = 0; i < ntups; i++)
+ {
+ tsminfo[i].dobj.objType = DO_TABLESAMPLE_METHOD;
+ tsminfo[i].dobj.catId.tableoid = atooid(PQgetvalue(res, i, i_tableoid));
+ tsminfo[i].dobj.catId.oid = atooid(PQgetvalue(res, i, i_oid));
+ AssignDumpId(&tsminfo[i].dobj);
+ tsminfo[i].dobj.name = pg_strdup(PQgetvalue(res, i, i_tsmname));
+ tsminfo[i].dobj.namespace = NULL;
+ tsminfo[i].tsmseqscan = PQgetvalue(res, i, i_tsmseqscan)[0] == 't';
+
+ /* Decide whether we want to dump it */
+ selectDumpableObject(&(tsminfo[i].dobj));
+ }
+
+ PQclear(res);
+
+ destroyPQExpBuffer(query);
+
+ return tsminfo;
+}
+
+
+/*
* Test whether a column should be printed as part of table's CREATE TABLE.
* Column number is zero-based.
*
@@ -8269,6 +8339,9 @@ dumpDumpableObject(Archive *fout, DumpOptions *dopt, DumpableObject *dobj)
case DO_POLICY:
dumpPolicy(fout, dopt, (PolicyInfo *) dobj);
break;
+ case DO_TABLESAMPLE_METHOD:
+ dumpTableSampleMethod(fout, dopt, (TSMInfo *) dobj);
+ break;
case DO_PRE_DATA_BOUNDARY:
case DO_POST_DATA_BOUNDARY:
/* never dumped, nothing to do */
@@ -12269,6 +12342,103 @@ dumpAgg(Archive *fout, DumpOptions *dopt, AggInfo *agginfo)
}
/*
+ * dumpTableSampleMethod
+ * write the declaration of one user-defined tablesample method
+ */
+static void
+dumpTableSampleMethod(Archive *fout, DumpOptions *dopt, TSMInfo *tsminfo)
+{
+ PGresult *res;
+ PQExpBuffer q;
+ PQExpBuffer delq;
+ PQExpBuffer labelq;
+ PQExpBuffer query;
+ char *tsminit;
+ char *tsmnextblock;
+ char *tsmnexttuple;
+ char *tsmexaminetuple;
+ char *tsmend;
+ char *tsmreset;
+ char *tsmcost;
+
+ /* Skip if not to be dumped */
+ if (!tsminfo->dobj.dump || dopt->dataOnly)
+ return;
+
+ q = createPQExpBuffer();
+ delq = createPQExpBuffer();
+ labelq = createPQExpBuffer();
+ query = createPQExpBuffer();
+
+ /* Make sure we are in proper schema */
+ selectSourceSchema(fout, "pg_catalog");
+
+ appendPQExpBuffer(query, "SELECT tsminit, tsmnextblock, "
+ "tsmnexttuple, tsmexaminetuple, "
+ "tsmend, tsmreset, tsmcost "
+ "FROM pg_catalog.pg_tablesample_method "
+ "WHERE oid = '%u'::pg_catalog.oid",
+ tsminfo->dobj.catId.oid);
+
+ res = ExecuteSqlQueryForSingleRow(fout, query->data);
+
+ tsminit = PQgetvalue(res, 0, PQfnumber(res, "tsminit"));
+ tsmnexttuple = PQgetvalue(res, 0, PQfnumber(res, "tsmnexttuple"));
+ tsmnextblock = PQgetvalue(res, 0, PQfnumber(res, "tsmnextblock"));
+ tsmexaminetuple = PQgetvalue(res, 0, PQfnumber(res, "tsmexaminetuple"));
+ tsmend = PQgetvalue(res, 0, PQfnumber(res, "tsmend"));
+ tsmreset = PQgetvalue(res, 0, PQfnumber(res, "tsmreset"));
+ tsmcost = PQgetvalue(res, 0, PQfnumber(res, "tsmcost"));
+
+ appendPQExpBuffer(q, "CREATE TABLESAMPLE METHOD %s (\n",
+ fmtId(tsminfo->dobj.name));
+
+ appendPQExpBuffer(q, " INIT = %s,\n", tsminit);
+ appendPQExpBuffer(q, " NEXTTUPLE = %s,\n", tsmnexttuple);
+ appendPQExpBuffer(q, " NEXTBLOCK = %s,\n", tsmnextblock);
+ appendPQExpBuffer(q, " END = %s,\n", tsmend);
+ appendPQExpBuffer(q, " RESET = %s,\n", tsmreset);
+ appendPQExpBuffer(q, " COST = %s", tsmcost);
+
+ if (strcmp(tsmexaminetuple, "-") != 0)
+ appendPQExpBuffer(q, ",\n EXAMINETUPLE = %s", tsmexaminetuple);
+
+ if (tsminfo->tsmseqscan)
+ appendPQExpBufferStr(q, ",\n SEQSCAN = true");
+
+ appendPQExpBufferStr(q, "\n);");
+
+ appendPQExpBuffer(delq, "DROP TABLESAMPLE METHOD %s",
+ fmtId(tsminfo->dobj.name));
+
+ appendPQExpBuffer(labelq, "TABLESAMPLE METHOD %s",
+ fmtId(tsminfo->dobj.name));
+
+ if (dopt->binary_upgrade)
+ binary_upgrade_extension_member(q, &tsminfo->dobj, labelq->data);
+
+ ArchiveEntry(fout, tsminfo->dobj.catId, tsminfo->dobj.dumpId,
+ tsminfo->dobj.name,
+ NULL,
+ NULL,
+ "",
+ false, "TABLESAMPLE METHOD", SECTION_PRE_DATA,
+ q->data, delq->data, NULL,
+ NULL, 0,
+ NULL, NULL);
+
+ /* Dump Parser Comments */
+ dumpComment(fout, dopt, labelq->data,
+ NULL, "",
+ tsminfo->dobj.catId, 0, tsminfo->dobj.dumpId);
+
+ PQclear(res);
+ destroyPQExpBuffer(q);
+ destroyPQExpBuffer(delq);
+ destroyPQExpBuffer(labelq);
+}
+
+/*
* dumpTSParser
* write out a single text search parser
*/
@@ -15622,6 +15792,7 @@ addBoundaryDependencies(DumpableObject **dobjs, int numObjs,
case DO_FDW:
case DO_FOREIGN_SERVER:
case DO_BLOB:
+ case DO_TABLESAMPLE_METHOD:
/* Pre-data objects: must come before the pre-data boundary */
addObjectDependency(preDataBound, dobj->dumpId);
break;
diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h
index f42c42d..645c07c 100644
--- a/src/bin/pg_dump/pg_dump.h
+++ b/src/bin/pg_dump/pg_dump.h
@@ -76,7 +76,8 @@ typedef enum
DO_POST_DATA_BOUNDARY,
DO_EVENT_TRIGGER,
DO_REFRESH_MATVIEW,
- DO_POLICY
+ DO_POLICY,
+ DO_TABLESAMPLE_METHOD
} DumpableObjectType;
typedef struct _dumpableObject
@@ -384,6 +385,12 @@ typedef struct _inhInfo
Oid inhparent; /* OID of its parent */
} InhInfo;
+typedef struct _tsmInfo
+{
+ DumpableObject dobj;
+ bool tsmseqscan;
+} TSMInfo;
+
typedef struct _prsInfo
{
DumpableObject dobj;
@@ -537,6 +544,7 @@ extern ProcLangInfo *getProcLangs(Archive *fout, int *numProcLangs);
extern CastInfo *getCasts(Archive *fout, DumpOptions *dopt, int *numCasts);
extern void getTableAttrs(Archive *fout, DumpOptions *dopt, TableInfo *tbinfo, int numTables);
extern bool shouldPrintColumn(DumpOptions *dopt, TableInfo *tbinfo, int colno);
+extern TSMInfo *getTableSampleMethods(Archive *fout, int *numTSMs);
extern TSParserInfo *getTSParsers(Archive *fout, int *numTSParsers);
extern TSDictInfo *getTSDictionaries(Archive *fout, int *numTSDicts);
extern TSTemplateInfo *getTSTemplates(Archive *fout, int *numTSTemplates);
diff --git a/src/bin/pg_dump/pg_dump_sort.c b/src/bin/pg_dump/pg_dump_sort.c
index 4b9bba0..cb009ce 100644
--- a/src/bin/pg_dump/pg_dump_sort.c
+++ b/src/bin/pg_dump/pg_dump_sort.c
@@ -73,7 +73,8 @@ static const int oldObjectTypePriority[] =
13, /* DO_POST_DATA_BOUNDARY */
20, /* DO_EVENT_TRIGGER */
15, /* DO_REFRESH_MATVIEW */
- 21 /* DO_POLICY */
+ 21, /* DO_POLICY */
+ 5 /* DO_TABLESAMPLE_METHOD */
};
/*
@@ -122,7 +123,8 @@ static const int newObjectTypePriority[] =
25, /* DO_POST_DATA_BOUNDARY */
32, /* DO_EVENT_TRIGGER */
33, /* DO_REFRESH_MATVIEW */
- 34 /* DO_POLICY */
+ 34, /* DO_POLICY */
+ 17 /* DO_TABLESAMPLE_METHOD */
};
static DumpId preDataBoundId;
@@ -1443,6 +1445,11 @@ describeDumpableObject(DumpableObject *obj, char *buf, int bufsize)
"POLICY (ID %d OID %u)",
obj->dumpId, obj->catId.oid);
return;
+ case DO_TABLESAMPLE_METHOD:
+ snprintf(buf, bufsize,
+ "TABLESAMPLE METHOD %s (ID %d OID %u)",
+ obj->name, obj->dumpId, obj->catId.oid);
+ return;
case DO_PRE_DATA_BOUNDARY:
snprintf(buf, bufsize,
"PRE-DATA BOUNDARY (ID %d)",
diff --git a/src/include/catalog/dependency.h b/src/include/catalog/dependency.h
index 6481ac8..30653f8 100644
--- a/src/include/catalog/dependency.h
+++ b/src/include/catalog/dependency.h
@@ -148,6 +148,7 @@ typedef enum ObjectClass
OCLASS_EXTENSION, /* pg_extension */
OCLASS_EVENT_TRIGGER, /* pg_event_trigger */
OCLASS_POLICY, /* pg_policy */
+ OCLASS_TABLESAMPLEMETHOD, /* pg_tablesample_method */
MAX_OCLASS /* MUST BE LAST */
} ObjectClass;
diff --git a/src/include/catalog/pg_tablesample_method.h b/src/include/catalog/pg_tablesample_method.h
index fd76f77..6a55669 100644
--- a/src/include/catalog/pg_tablesample_method.h
+++ b/src/include/catalog/pg_tablesample_method.h
@@ -69,7 +69,18 @@ typedef FormData_pg_tablesample_method *Form_pg_tablesample_method;
DATA(insert OID = 3283 ( system false tsm_system_init tsm_system_nextblock tsm_system_nexttuple - tsm_system_end tsm_system_reset tsm_system_cost ));
DESCR("SYSTEM table sampling method");
+#define TABLESAMPLE_METHOD_SYSTEM_OID 3283
DATA(insert OID = 3284 ( bernoulli true tsm_bernoulli_init tsm_bernoulli_nextblock tsm_bernoulli_nexttuple - tsm_bernoulli_end tsm_bernoulli_reset tsm_bernoulli_cost ));
DESCR("BERNOULLI table sampling method");
+#define TABLESAMPLE_METHOD_BERNOULLI_OID 3284
+
+/* ----------------
+ * functions for manipulation of pg_tablesample_method
+ * ----------------
+ */
+
+extern Oid DefineTablesampleMethod(List *names, List *parameters);
+extern void RemoveTablesampleMethodById(Oid tsmoid);
+extern Oid get_tablesample_method_oid(const char *tsmname, bool missing_ok);
#endif /* PG_TABLESAMPLE_METHOD_H */
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 2f4df1d..bc1fef2 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -1269,6 +1269,7 @@ typedef enum ObjectType
OBJECT_SEQUENCE,
OBJECT_TABCONSTRAINT,
OBJECT_TABLE,
+ OBJECT_TABLESAMPLEMETHOD,
OBJECT_TABLESPACE,
OBJECT_TRIGGER,
OBJECT_TSCONFIGURATION,
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index 6ff7b44..c3269c0 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -236,6 +236,7 @@ PG_KEYWORD("mapping", MAPPING, UNRESERVED_KEYWORD)
PG_KEYWORD("match", MATCH, UNRESERVED_KEYWORD)
PG_KEYWORD("materialized", MATERIALIZED, UNRESERVED_KEYWORD)
PG_KEYWORD("maxvalue", MAXVALUE, UNRESERVED_KEYWORD)
+PG_KEYWORD("method", METHOD, UNRESERVED_KEYWORD)
PG_KEYWORD("minute", MINUTE_P, UNRESERVED_KEYWORD)
PG_KEYWORD("minvalue", MINVALUE, UNRESERVED_KEYWORD)
PG_KEYWORD("mode", MODE, UNRESERVED_KEYWORD)
diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile
index 93d93af..37ea524 100644
--- a/src/test/modules/Makefile
+++ b/src/test/modules/Makefile
@@ -9,7 +9,8 @@ SUBDIRS = \
worker_spi \
dummy_seclabel \
test_shm_mq \
- test_parser
+ test_parser \
+ tablesample
all: submake-errcodes
diff --git a/src/test/modules/tablesample/.gitignore b/src/test/modules/tablesample/.gitignore
new file mode 100644
index 0000000..5dcb3ff
--- /dev/null
+++ b/src/test/modules/tablesample/.gitignore
@@ -0,0 +1,4 @@
+# Generated subdirectories
+/log/
+/results/
+/tmp_check/
diff --git a/src/test/modules/tablesample/Makefile b/src/test/modules/tablesample/Makefile
new file mode 100644
index 0000000..469b004
--- /dev/null
+++ b/src/test/modules/tablesample/Makefile
@@ -0,0 +1,21 @@
+# src/test/modules/tsm_test/Makefile
+
+MODULE_big = tsm_test
+OBJS = tsm_test.o $(WIN32RES)
+PGFILEDESC = "tsm_test - example of a custom tablesample method"
+
+EXTENSION = tsm_test
+DATA = tsm_test--1.0.sql
+
+REGRESS = tablesample
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/tablesample
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/src/test/modules/tablesample/expected/tablesample.out b/src/test/modules/tablesample/expected/tablesample.out
new file mode 100644
index 0000000..3cb3848
--- /dev/null
+++ b/src/test/modules/tablesample/expected/tablesample.out
@@ -0,0 +1,38 @@
+CREATE EXTENSION tsm_test;
+CREATE TABLE test_tsm AS SELECT md5(i::text) a, 0.5::float b FROM generate_series(1,10) g(i);
+SELECT * FROM test_tsm TABLESAMPLE tsm_test('b') REPEATABLE (1);
+ a | b
+----------------------------------+-----
+ c4ca4238a0b923820dcc509a6f75849b | 0.5
+ c81e728d9d4c2f636f067f89cc14862c | 0.5
+ a87ff679a2f3e71d9181a67b7542122c | 0.5
+ 1679091c5a880faf6fb5e6087eb1b2dc | 0.5
+ 8f14e45fceea167a5a36dedd4bea2543 | 0.5
+ d3d9446802a44259755d38e6d163e820 | 0.5
+(6 rows)
+
+CREATE VIEW test_tsm_v AS SELECT * FROM test_tsm TABLESAMPLE tsm_test('b') REPEATABLE (9999);
+SELECT * FROM test_tsm_v;
+ a | b
+----------------------------------+-----
+ c4ca4238a0b923820dcc509a6f75849b | 0.5
+ e4da3b7fbbce2345d7772b0674a318d5 | 0.5
+ 1679091c5a880faf6fb5e6087eb1b2dc | 0.5
+ 8f14e45fceea167a5a36dedd4bea2543 | 0.5
+ c9f0f895fb98ab9159f51fd0297e236d | 0.5
+(5 rows)
+
+DROP TABLESAMPLE METHOD tsm_test;
+ERROR: cannot drop tablesample method tsm_test because extension tsm_test requires it
+HINT: You can drop extension tsm_test instead.
+DROP EXTENSION tsm_test;
+ERROR: cannot drop extension tsm_test because other objects depend on it
+DETAIL: view test_tsm_v depends on tablesample method tsm_test
+HINT: Use DROP ... CASCADE to drop the dependent objects too.
+DROP EXTENSION tsm_test CASCADE;
+NOTICE: drop cascades to view test_tsm_v
+SELECT * FROM pg_tablesample_method WHERE tsmname = 'tsm_test';
+ tsmname | tsmseqscan | tsminit | tsmnextblock | tsmnexttuple | tsmexaminetuple | tsmend | tsmreset | tsmcost
+---------+------------+---------+--------------+--------------+-----------------+--------+----------+---------
+(0 rows)
+
diff --git a/src/test/modules/tablesample/sql/tablesample.sql b/src/test/modules/tablesample/sql/tablesample.sql
new file mode 100644
index 0000000..b1104d6
--- /dev/null
+++ b/src/test/modules/tablesample/sql/tablesample.sql
@@ -0,0 +1,14 @@
+CREATE EXTENSION tsm_test;
+
+CREATE TABLE test_tsm AS SELECT md5(i::text) a, 0.5::float b FROM generate_series(1,10) g(i);
+
+SELECT * FROM test_tsm TABLESAMPLE tsm_test('b') REPEATABLE (1);
+
+CREATE VIEW test_tsm_v AS SELECT * FROM test_tsm TABLESAMPLE tsm_test('b') REPEATABLE (9999);
+SELECT * FROM test_tsm_v;
+
+DROP TABLESAMPLE METHOD tsm_test;
+DROP EXTENSION tsm_test;
+DROP EXTENSION tsm_test CASCADE;
+
+SELECT * FROM pg_tablesample_method WHERE tsmname = 'tsm_test';
diff --git a/src/test/modules/tablesample/tsm_test--1.0.sql b/src/test/modules/tablesample/tsm_test--1.0.sql
new file mode 100644
index 0000000..2280ab0
--- /dev/null
+++ b/src/test/modules/tablesample/tsm_test--1.0.sql
@@ -0,0 +1,51 @@
+/* src/test/modules/tablesample/tsm_test--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION tsm_test" to load this file. \quit
+
+CREATE FUNCTION tsm_test_init(internal, int4, text)
+RETURNS void
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION tsm_test_nextblock(internal)
+RETURNS int4
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION tsm_test_nexttuple(internal, int4, int2)
+RETURNS int2
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION tsm_test_examinetuple(internal, int4, internal, bool)
+RETURNS bool
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION tsm_test_end(internal)
+RETURNS void
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION tsm_test_reset(internal)
+RETURNS void
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION tsm_test_cost(internal, internal, internal, internal, internal, internal, internal)
+RETURNS void
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+
+CREATE TABLESAMPLE METHOD tsm_test (
+ SEQSCAN = true,
+ INIT = tsm_test_init,
+ NEXTBLOCK = tsm_test_nextblock,
+ NEXTTUPLE = tsm_test_nexttuple,
+ EXAMINETUPLE = tsm_test_examinetuple,
+ END = tsm_test_end,
+ RESET = tsm_test_reset,
+ COST = tsm_test_cost
+);
diff --git a/src/test/modules/tablesample/tsm_test.c b/src/test/modules/tablesample/tsm_test.c
new file mode 100644
index 0000000..be4dcb9
--- /dev/null
+++ b/src/test/modules/tablesample/tsm_test.c
@@ -0,0 +1,228 @@
+/*-------------------------------------------------------------------------
+ *
+ * tsm_test.c
+ * Simple example of a custom tablesample method
+ *
+ * Copyright (c) 2007-2014, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/test/modules/tablesample/tsm_test.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "fmgr.h"
+
+#include "access/htup_details.h"
+#include "access/relscan.h"
+#include "access/tupdesc.h"
+#include "catalog/pg_type.h"
+#include "nodes/execnodes.h"
+#include "nodes/relation.h"
+#include "optimizer/clauses.h"
+#include "storage/bufmgr.h"
+#include "utils/builtins.h"
+#include "utils/lsyscache.h"
+#include "utils/rel.h"
+#include "utils/sampling.h"
+#include "utils/tablesample.h"
+
+PG_MODULE_MAGIC;
+
+/* State */
+typedef struct
+{
+ uint32 seed; /* random seed */
+ AttrNumber attnum; /* column to check */
+ TupleDesc tupDesc; /* tuple descriptor of table */
+ BlockNumber startblock; /* starting block, we use ths for syncscan support */
+ BlockNumber nblocks; /* total blocks in relation */
+ BlockNumber blockno; /* current block */
+ OffsetNumber lt; /* last tuple returned from current block */
+ SamplerRandomState randstate; /* random generator state */
+} tsm_test_state;
+
+
+PG_FUNCTION_INFO_V1(tsm_test_init);
+PG_FUNCTION_INFO_V1(tsm_test_nextblock);
+PG_FUNCTION_INFO_V1(tsm_test_nexttuple);
+PG_FUNCTION_INFO_V1(tsm_test_examinetuple);
+PG_FUNCTION_INFO_V1(tsm_test_end);
+PG_FUNCTION_INFO_V1(tsm_test_reset);
+PG_FUNCTION_INFO_V1(tsm_test_cost);
+
+/*
+ * Initialize the state.
+ */
+Datum
+tsm_test_init(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ uint32 seed = PG_GETARG_UINT32(1);
+ char *attname;
+ AttrNumber attnum;
+ Oid atttype;
+ Relation rel = scanstate->ss.ss_currentRelation;
+ HeapScanDesc scan = scanstate->ss.ss_currentScanDesc;
+ tsm_test_state *state;
+ TupleDesc tupDesc = RelationGetDescr(rel);
+
+ if (PG_ARGISNULL(2))
+ ereport(ERROR,
+ (errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED),
+ errmsg("invalid parameter for tablesample method tsm_test"),
+ errhint("attnum cannot be NULL.")));
+
+ attname = text_to_cstring(PG_GETARG_TEXT_P(2));
+
+ attnum = get_attnum(rel->rd_id, attname);
+ if (!AttrNumberIsForUserDefinedAttr(attnum))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid parameter for tablesample method tsm_test"),
+ errhint("column %s does not exist", attname)));
+
+ atttype = get_atttype(rel->rd_id, attnum);
+ if (atttype != FLOAT8OID)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid parameter for tablesample method tsm_test"),
+ errhint("column %s is not of type float.", attname)));
+
+ state = palloc0(sizeof(tsm_test_state));
+
+ /* Remember initial values for reinit */
+ state->seed = seed;
+ state->attnum = attnum;
+ state->tupDesc = tupDesc;
+ state->startblock = scan->rs_startblock;
+ state->nblocks = scan->rs_nblocks;
+ state->blockno = InvalidBlockNumber;
+ state->lt = InvalidOffsetNumber;
+ sampler_random_init_state(state->seed, state->randstate);
+
+ scanstate->tsmdata = (void *) state;
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Get next block number to read or InvalidBlockNumber if we are at the
+ * end of the relation.
+ */
+Datum
+tsm_test_nextblock(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ tsm_test_state *state = (tsm_test_state *) scanstate->tsmdata;
+
+ /* Cycle from startblock to startblock to support syncscan. */
+ if (state->blockno == InvalidBlockNumber)
+ state->blockno = state->startblock;
+ else
+ {
+ state->blockno++;
+
+ if (state->blockno >= state->nblocks)
+ state->blockno = 0;
+
+ if (state->blockno == state->startblock)
+ PG_RETURN_UINT32(InvalidBlockNumber);
+ }
+
+ PG_RETURN_UINT32(state->blockno);
+}
+
+/*
+ * Get next tuple from current block.
+ */
+Datum
+tsm_test_nexttuple(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ OffsetNumber maxoffset = PG_GETARG_UINT16(2);
+ tsm_test_state *state = (tsm_test_state *) scanstate->tsmdata;
+
+ if (state->lt == InvalidOffsetNumber)
+ state->lt = FirstOffsetNumber;
+ else if (++state->lt > maxoffset)
+ PG_RETURN_UINT16(InvalidOffsetNumber);
+
+ PG_RETURN_UINT16(state->lt);
+}
+
+/*
+ * Examine tuple and decide if it should be returned.
+ */
+Datum
+tsm_test_examinetuple(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ HeapTuple tuple = (HeapTuple) PG_GETARG_POINTER(2);
+ bool visible = PG_GETARG_BOOL(3);
+ tsm_test_state *state = (tsm_test_state *) scanstate->tsmdata;
+ bool isnull;
+ float8 val, rand;
+
+ if (!visible)
+ PG_RETURN_BOOL(false);
+
+ val = DatumGetFloat8(heap_getattr(tuple, state->attnum, state->tupDesc, &isnull));
+ rand = sampler_random_fract(state->randstate);
+ if (isnull || val < rand)
+ PG_RETURN_BOOL(false);
+ else
+ PG_RETURN_BOOL(true);
+}
+
+/*
+ * Cleanup method.
+ */
+Datum
+tsm_test_end(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+
+ pfree(scanstate->tsmdata);
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Reset state (called by ReScan).
+ */
+Datum
+tsm_test_reset(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ tsm_test_state *state = (tsm_test_state *) scanstate->tsmdata;
+
+ state->blockno = InvalidBlockNumber;
+ state->lt = InvalidOffsetNumber;
+
+ sampler_random_init_state(state->seed, state->randstate);
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Costing function.
+ */
+Datum
+tsm_test_cost(PG_FUNCTION_ARGS)
+{
+ Path *path = (Path *) PG_GETARG_POINTER(1);
+ RelOptInfo *baserel = (RelOptInfo *) PG_GETARG_POINTER(2);
+ BlockNumber *pages = (BlockNumber *) PG_GETARG_POINTER(4);
+ double *tuples = (double *) PG_GETARG_POINTER(5);
+
+ *pages = baserel->pages;
+
+ /* This is very bad estimation */
+ *tuples = path->rows = path->rows/2;
+
+ PG_RETURN_VOID();
+}
+
diff --git a/src/test/modules/tablesample/tsm_test.control b/src/test/modules/tablesample/tsm_test.control
new file mode 100644
index 0000000..a7b2741
--- /dev/null
+++ b/src/test/modules/tablesample/tsm_test.control
@@ -0,0 +1,5 @@
+# tsm_test extension
+comment = 'test module for custom tablesample method'
+default_version = '1.0'
+module_pathname = '$libdir/tsm_test'
+relocatable = true
--
1.9.1
Hi,
On 22.2.2015 18:57, Petr Jelinek wrote:
Tomas noticed that the patch is missing error check when TABLESAMPLE
is used on view, so here is a new version that checks it's only used
against table or matview.No other changes.
Curious question - could/should this use page prefetch, similar to what
bitmap heap scan does? I believe the answer is 'yes'.
With SYSTEM that should be rather straightforward to implement, because
it already works at page level, and it's likely to give significant
performance speedup, similar to bitmap index scan:
/messages/by-id/CAHyXU0yiVvfQAnR9cyH=HWh1WbLRsioe=mzRJTHwtr=2azsTdQ@mail.gmail.com
With BERNOULLI that might be more complex to implement because of the
page/tuple sampling, and the benefit is probably much lower than for
SYSTEM because it's likely that at least one tuple will be sampled.
I'm not saying it has to be done in this CF (or that it makes the patch
uncommitable).
For example, this seems like a very nice project for the GSoC (clear
scope, not too large, ...).
--
Tomas Vondra http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Feb 17, 2015 at 3:29 AM, Petr Jelinek <petr@2ndquadrant.com> wrote:
I didn't add the whole page visibility caching as the tuple ids we get
from sampling methods don't map well to the visibility info we get from
heapgetpage (it maps to the values in the rs_vistuples array not to to its
indexes). Commented about it in code also.
I think we should set pagemode for system sampling as it can
have dual benefit, one is it will allow us caching tuples and other
is it can allow us pruning of page which is done in heapgetpage().
Do you see any downside to it?
Few other comments:
1.
Current patch fails to apply, minor change is required:
patching file `src/backend/utils/misc/Makefile'
Hunk #1 FAILED at 15.
1 out of 1 hunk FAILED -- saving rejects to
src/backend/utils/misc/Makefile.rej
2.
Few warnings in code (compiled on windows(msvc))
1>src\backend\utils\tablesample\bernoulli.c(217): warning C4305: '=' :
truncation from 'double' to 'float4'
1>src\backend\utils\tablesample\system.c(177): warning C4305: '=' :
truncation from 'double' to 'float4'
3.
+static void
+InitScanRelation(SampleScanState *node, EState *estate, int eflags,
+ TableSampleClause *tablesample)
{
..
+ /*
+ * Page at a time mode is useless for us as we need to check visibility
+ * of tuples individually because tuple offsets returned by sampling
+ * methods map to rs_vistuples values and not to its indexes.
+ */
+ node->ss.ss_currentScanDesc->rs_pageatatime = false;
..
}
Modifying scandescriptor in nodeSamplescan.c looks slightly odd,
Do we modify this way at anyother place?
I think it is better to either teach heap_beginscan_* api's about
it or expose it via new API in heapam.c
4.
+Datum
+tsm_system_cost(PG_FUNCTION_ARGS)
{
..
+ *tuples = path->rows * samplesize;
}
It seems above calculation considers all rows in table are of
equal size and hence multiplying directly with samplesize will
give estimated number of rows for sample method, however if
row size varies then this calculation might not give right
results. I think if possible we should consider the possibility
of rows with varying sizes else we can at least write a comment
to indicate the possibility of improvement for future.
5.
gram.y
-
/*
* Given "UPDATE foo set set ...", we have to decide without looking any
Unrelated line removed.
6.
@@ -13577,6 +13608,7 @@ reserved_keyword:
| PLACING
| PRIMARY
| REFERENCES
+ | REPEATABLE
Have you tried to investigate the reason why it is giving shift/reduce
error for unreserved category and if we are not able to resolve that,
then at least we can try to make it in some less restrictive category.
I tried (on windows) by putting it in (type_func_name_keyword:) and
it seems to be working.
To investigate, you can try with information at below link:
http://www.gnu.org/software/bison/manual/html_node/Understanding.html
Basically I think we should try some more before concluding
to change the category of REPEATABLE and especially if we
want to make it a RESERVED keyword.
On a related note, it seems you have agreed upthread with
Kyotaro-san for below change, but I don't see the same in latest patch:
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 4578b5e..8cf09d5 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -10590,7 +10590,7 @@ tablesample_clause:
;
opt_repeatable_clause:
- REPEATABLE '(' AexprConst ')' { $$ = (Node *) $3;
}
+ REPEATABLE '(' a_expr ')' { $$ = (Node *) $3;
}
| /*EMPTY*/
{ $$ = NULL; }
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On 05/03/15 09:21, Amit Kapila wrote:
On Tue, Feb 17, 2015 at 3:29 AM, Petr Jelinek <petr@2ndquadrant.com
<mailto:petr@2ndquadrant.com>> wrote:I didn't add the whole page visibility caching as the tuple ids we
get from sampling methods don't map well to the visibility info we get
from heapgetpage (it maps to the values in the rs_vistuples array not to
to its indexes). Commented about it in code also.I think we should set pagemode for system sampling as it can
have dual benefit, one is it will allow us caching tuples and other
is it can allow us pruning of page which is done in heapgetpage().
Do you see any downside to it?
Double checking for tuple visibility is the only downside I can think
of. Plus some added code complexity of course. I guess we could use
binary search on rs_vistuples (it's already sorted) so that info won't
be completely useless. Not sure if worth it compared to normal
visibility check, but not hard to do.
I personally don't see the page pruning in TABLESAMPLE as all that
important though, considering that in practice it will only scan tiny
portion of a relation and usually one that does not get many updates (in
practice the main use-case is BI).
Few other comments:
1.
Current patch fails to apply, minor change is required:
patching file `src/backend/utils/misc/Makefile'
Hunk #1 FAILED at 15.
1 out of 1 hunk FAILED -- saving rejects to
src/backend/utils/misc/Makefile.rej
Ah bitrot over time.
2.
Few warnings in code (compiled on windows(msvc))
1>src\backend\utils\tablesample\bernoulli.c(217): warning C4305: '=' :
truncation from 'double' to 'float4'
1>src\backend\utils\tablesample\system.c(177): warning C4305: '=' :
truncation from 'double' to 'float4'
I think this is just compiler stupidity but hopefully fixed (I don't
have msvc to check for it and other compilers I tried don't complain).
3. +static void +InitScanRelation(SampleScanState *node, EState *estate, int eflags, +TableSampleClause *tablesample) { .. +/* +* Page at a time mode is useless for us as we need to check visibility +* of tuples individually because tuple offsets returned by sampling +* methods map to rs_vistuples values and not to its indexes. +*/ +node->ss.ss_currentScanDesc->rs_pageatatime = false; .. }Modifying scandescriptor in nodeSamplescan.c looks slightly odd,
Do we modify this way at anyother place?I think it is better to either teach heap_beginscan_* api's about
it or expose it via new API in heapam.c
Yeah I agree, I taught the heap_beginscan_strat about it as that one is
the advanced API.
4.
+Datum
+tsm_system_cost(PG_FUNCTION_ARGS)
{
..
+*tuples = path->rows * samplesize;
}It seems above calculation considers all rows in table are of
equal size and hence multiplying directly with samplesize will
give estimated number of rows for sample method, however if
row size varies then this calculation might not give right
results. I think if possible we should consider the possibility
of rows with varying sizes else we can at least write a comment
to indicate the possibility of improvement for future.
I am not sure how we would know what size would the tuples have in the
random blocks that we are going to read later. That said, I am sure that
costing can be improved even if I don't know how myself.
6.
@@ -13577,6 +13608,7 @@ reserved_keyword:
| PLACING
| PRIMARY
| REFERENCES
+| REPEATABLEHave you tried to investigate the reason why it is giving shift/reduce
error for unreserved category and if we are not able to resolve that,
then at least we can try to make it in some less restrictive category.
I tried (on windows) by putting it in (type_func_name_keyword:) and
it seems to be working.To investigate, you can try with information at below link:
http://www.gnu.org/software/bison/manual/html_node/Understanding.htmlBasically I think we should try some more before concluding
to change the category of REPEATABLE and especially if we
want to make it a RESERVED keyword.
Yes it can be moved to type_func_name_keyword which is not all that much
better but at least something. I did try to play with this already
during first submission but could not find a way to move it to something
less restrictive.
I could not even pinpoint what exactly is the shift/reduce issue except
that it has something to do with the fact that the REPEATABLE clause is
optional (at least I didn't have the problem when it wasn't optional).
On a related note, it seems you have agreed upthread with
Kyotaro-san for below change, but I don't see the same in latest patch:diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y index 4578b5e..8cf09d5 100644 --- a/src/backend/parser/gram.y +++ b/src/backend/parser/gram.y @@ -10590,7 +10590,7 @@ tablesample_clause: ;opt_repeatable_clause: - REPEATABLE '(' AexprConst ')' { $$ = (Node *) $3; } + REPEATABLE '(' a_expr ')' { $$ = (Node *) $3; } | /*EMPTY*/ { $$ = NULL; }
Bah, lost this change during rebase.
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, Mar 7, 2015 at 10:37 PM, Petr Jelinek <petr@2ndquadrant.com> wrote:
On 05/03/15 09:21, Amit Kapila wrote:
On Tue, Feb 17, 2015 at 3:29 AM, Petr Jelinek <petr@2ndquadrant.com
<mailto:petr@2ndquadrant.com>> wrote:I didn't add the whole page visibility caching as the tuple ids we
get from sampling methods don't map well to the visibility info we get
from heapgetpage (it maps to the values in the rs_vistuples array not to
to its indexes). Commented about it in code also.I think we should set pagemode for system sampling as it can
have dual benefit, one is it will allow us caching tuples and other
is it can allow us pruning of page which is done in heapgetpage().
Do you see any downside to it?Double checking for tuple visibility is the only downside I can think of.
That will happen if we use heapgetpage and the way currently
code is written in patch, however we can easily avoid double
checking if we don't call heapgetpage and rather do the required
work at caller's place.
Plus some added code complexity of course. I guess we could use binary
search on rs_vistuples (it's already sorted) so that info won't be
completely useless. Not sure if worth it compared to normal visibility
check, but not hard to do.
It seems to me that it is better to avoid locking/unlocking buffer
wherever possible.
I personally don't see the page pruning in TABLESAMPLE as all that
important though, considering that in practice it will only scan tiny
portion of a relation and usually one that does not get many updates (in
practice the main use-case is BI).
In that case, I think it should be acceptable either way, because
if there are less updates then anyway it won't incur any cost of
doing the pruning.
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On 09/03/15 04:51, Amit Kapila wrote:
On Sat, Mar 7, 2015 at 10:37 PM, Petr Jelinek <petr@2ndquadrant.com
<mailto:petr@2ndquadrant.com>> wrote:On 05/03/15 09:21, Amit Kapila wrote:
On Tue, Feb 17, 2015 at 3:29 AM, Petr Jelinek <petr@2ndquadrant.com
<mailto:petr@2ndquadrant.com>
<mailto:petr@2ndquadrant.com <mailto:petr@2ndquadrant.com>>> wrote:
I didn't add the whole page visibility caching as the tuple ids we
get from sampling methods don't map well to the visibility info we get
from heapgetpage (it maps to the values in the rs_vistuples array not to
to its indexes). Commented about it in code also.I think we should set pagemode for system sampling as it can
have dual benefit, one is it will allow us caching tuples and other
is it can allow us pruning of page which is done in heapgetpage().
Do you see any downside to it?Double checking for tuple visibility is the only downside I can think
of.
That will happen if we use heapgetpage and the way currently
code is written in patch, however we can easily avoid double
checking if we don't call heapgetpage and rather do the required
work at caller's place.
What's the point of pagemode then if the caller code does the visibility
checks still one by one on each call. I thought one of the points of
pagemode was to do this in one step (and one buffer lock).
And if the caller will try to do it in one step and cache the visibility
info then we'll end up with pretty much same structure as rs_vistuples -
there isn't saner way to cache this info other than ordered vector of
tuple offsets, unless we assume that most pages have close to
MaxOffsetNumber of tuples which they don't, so why not just use the
heapgetpage directly and do the binary search over rs_vistuples.
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Mar 9, 2015 at 3:08 PM, Petr Jelinek <petr@2ndquadrant.com> wrote:
On 09/03/15 04:51, Amit Kapila wrote:
On Sat, Mar 7, 2015 at 10:37 PM, Petr Jelinek <petr@2ndquadrant.com
Double checking for tuple visibility is the only downside I can think
of.
That will happen if we use heapgetpage and the way currently
code is written in patch, however we can easily avoid double
checking if we don't call heapgetpage and rather do the required
work at caller's place.What's the point of pagemode then if the caller code does the visibility
checks still one by one on each call. I thought one of the points of
pagemode was to do this in one step (and one buffer lock).
You only need one buffer lock for doing at caller's location
as well something like we do in acquire_sample_rows().
And if the caller will try to do it in one step and cache the visibility
info then we'll end up with pretty much same structure as rs_vistuples -
there isn't saner way to cache this info other than ordered vector of tuple
offsets, unless we assume that most pages have close to MaxOffsetNumber of
tuples which they don't, so why not just use the heapgetpage directly and
do the binary search over rs_vistuples.
The downside of doing it via heapgetpage is that it will do
visibility test for tuples which we might not even need (I think
we should do visibility test for tuples retrurned by tsmnexttuple).
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On 10/03/15 04:43, Amit Kapila wrote:
On Mon, Mar 9, 2015 at 3:08 PM, Petr Jelinek <petr@2ndquadrant.com
<mailto:petr@2ndquadrant.com>> wrote:On 09/03/15 04:51, Amit Kapila wrote:
On Sat, Mar 7, 2015 at 10:37 PM, Petr Jelinek <petr@2ndquadrant.com
<mailto:petr@2ndquadrant.com>
Double checking for tuple visibility is the only downside I can think
of.
That will happen if we use heapgetpage and the way currently
code is written in patch, however we can easily avoid double
checking if we don't call heapgetpage and rather do the required
work at caller's place.What's the point of pagemode then if the caller code does the
visibility checks still one by one on each call. I thought one of the
points of pagemode was to do this in one step (and one buffer lock).You only need one buffer lock for doing at caller's location
as well something like we do in acquire_sample_rows().
Ok now I think I finally understand what you are suggesting - you are
saying let's go over whole page while tsmnexttuple returns something,
and do the visibility check and other stuff in that code block under the
buffer lock and cache the resulting valid tuples in some array and then
return those tuples one by one from that cache?
And if the caller will try to do it in one step and cache the
visibility info then we'll end up with pretty much same structure as
rs_vistuples - there isn't saner way to cache this info other than
ordered vector of tuple offsets, unless we assume that most pages have
close to MaxOffsetNumber of tuples which they don't, so why not just use
the heapgetpage directly and do the binary search over rs_vistuples.The downside of doing it via heapgetpage is that it will do
visibility test for tuples which we might not even need (I think
we should do visibility test for tuples retrurned by tsmnexttuple).
Well, heapgetpage can either read visibility data for whole page or not,
depending on if we want pagemode reading or not. So we can use the
pagemode for sampling methods where it's feasible (like system) and not
use pagemode where it's not (like bernoulli) and then either use the
rs_vistuples or call HeapTupleSatisfiesVisibility individually again
depending if the method is using pagemode or not. This is how sequential
scan does it afaik.
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Mar 10, 2015 at 3:03 PM, Petr Jelinek <petr@2ndquadrant.com> wrote:
On 10/03/15 04:43, Amit Kapila wrote:
On Mon, Mar 9, 2015 at 3:08 PM, Petr Jelinek <petr@2ndquadrant.com
<mailto:petr@2ndquadrant.com>> wrote:On 09/03/15 04:51, Amit Kapila wrote:
On Sat, Mar 7, 2015 at 10:37 PM, Petr Jelinek <petr@2ndquadrant.com
<mailto:petr@2ndquadrant.com>
Double checking for tuple visibility is the only downside I can
think
of.
That will happen if we use heapgetpage and the way currently
code is written in patch, however we can easily avoid double
checking if we don't call heapgetpage and rather do the required
work at caller's place.What's the point of pagemode then if the caller code does the
visibility checks still one by one on each call. I thought one of the
points of pagemode was to do this in one step (and one buffer lock).You only need one buffer lock for doing at caller's location
as well something like we do in acquire_sample_rows().Ok now I think I finally understand what you are suggesting - you are
saying let's go over whole page while tsmnexttuple returns something, and
do the visibility check and other stuff in that code block under the buffer
lock and cache the resulting valid tuples in some array and then return
those tuples one by one from that cache?
Yes, this is what I am suggesting.
And if the caller will try to do it in one step and cache the
visibility info then we'll end up with pretty much same structure as
rs_vistuples - there isn't saner way to cache this info other than
ordered vector of tuple offsets, unless we assume that most pages have
close to MaxOffsetNumber of tuples which they don't, so why not just use
the heapgetpage directly and do the binary search over rs_vistuples.The downside of doing it via heapgetpage is that it will do
visibility test for tuples which we might not even need (I think
we should do visibility test for tuples retrurned by tsmnexttuple).Well, heapgetpage can either read visibility data for whole page or not,
depending on if we want pagemode reading or not. So we can use the pagemode
for sampling methods where it's feasible (like system) and not use pagemode
where it's not (like bernoulli) and then either use the rs_vistuples or
call HeapTupleSatisfiesVisibility individually again depending if the
method is using pagemode or not.
Yeah, but as mentioned above, this has some downside, but go
for it only if you feel that above suggestion is making code complex,
which I think should not be the case as we are doing something similar
in acquire_sample_rows().
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On 10/03/15 10:54, Amit Kapila wrote:
On Tue, Mar 10, 2015 at 3:03 PM, Petr Jelinek <petr@2ndquadrant.com
<mailto:petr@2ndquadrant.com>> wrote:Ok now I think I finally understand what you are suggesting - you are
saying let's go over whole page while tsmnexttuple returns something,
and do the visibility check and other stuff in that code block under the
buffer lock and cache the resulting valid tuples in some array and then
return those tuples one by one from that cache?Yes, this is what I am suggesting.
And if the caller will try to do it in one step and cache the
visibility info then we'll end up with pretty much same structure as
rs_vistuples - there isn't saner way to cache this info other than
ordered vector of tuple offsets, unless we assume that most pages have
close to MaxOffsetNumber of tuples which they don't, so why not just use
the heapgetpage directly and do the binary search over rs_vistuples.The downside of doing it via heapgetpage is that it will do
visibility test for tuples which we might not even need (I think
we should do visibility test for tuples retrurned by tsmnexttuple).Well, heapgetpage can either read visibility data for whole page or
not, depending on if we want pagemode reading or not. So we can use the
pagemode for sampling methods where it's feasible (like system) and not
use pagemode where it's not (like bernoulli) and then either use the
rs_vistuples or call HeapTupleSatisfiesVisibility individually again
depending if the method is using pagemode or not.Yeah, but as mentioned above, this has some downside, but go
for it only if you feel that above suggestion is making code complex,
which I think should not be the case as we are doing something similar
in acquire_sample_rows().
I think your suggestion is actually simpler code wise, I am just
somewhat worried by the fact that no other scan node uses that kind of
caching and there is probably reason for that.
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 10/03/15 11:05, Petr Jelinek wrote:
On 10/03/15 10:54, Amit Kapila wrote:
On Tue, Mar 10, 2015 at 3:03 PM, Petr Jelinek <petr@2ndquadrant.com
<mailto:petr@2ndquadrant.com>> wrote:Ok now I think I finally understand what you are suggesting - you are
saying let's go over whole page while tsmnexttuple returns something,
and do the visibility check and other stuff in that code block under the
buffer lock and cache the resulting valid tuples in some array and then
return those tuples one by one from that cache?Yes, this is what I am suggesting.
And if the caller will try to do it in one step and cache the
visibility info then we'll end up with pretty much same structure as
rs_vistuples - there isn't saner way to cache this info other than
ordered vector of tuple offsets, unless we assume that most pageshave
close to MaxOffsetNumber of tuples which they don't, so why not
just use
the heapgetpage directly and do the binary search over rs_vistuples.
The downside of doing it via heapgetpage is that it will do
visibility test for tuples which we might not even need (I think
we should do visibility test for tuples retrurned by tsmnexttuple).Well, heapgetpage can either read visibility data for whole page or
not, depending on if we want pagemode reading or not. So we can use the
pagemode for sampling methods where it's feasible (like system) and not
use pagemode where it's not (like bernoulli) and then either use the
rs_vistuples or call HeapTupleSatisfiesVisibility individually again
depending if the method is using pagemode or not.Yeah, but as mentioned above, this has some downside, but go
for it only if you feel that above suggestion is making code complex,
which I think should not be the case as we are doing something similar
in acquire_sample_rows().I think your suggestion is actually simpler code wise, I am just
somewhat worried by the fact that no other scan node uses that kind of
caching and there is probably reason for that.
So it turned out to be simpler code-wise and slightly better performing
to still use the heapgetpage approach (the caching approach introduces a
lot of pallocing and tuple copying which seems to hurt performance a
bit). In general it seems that the sampling scan is different enough
from ANALYZE that same principles just can't be applied well to it.
There are now 2 ways of checking the visibility in sample scan, if the
sample method says that it will read whole pages it will just use the
rs_vistuples and if it says it won't read whole pages than it executes
the HeapTupleSatisfiesVisibility() on individual tuples. The buffer
locking is also done only if the whole page reading does not happen.
We trust the author of sampling method to set this correctly - it only
has performance related implications, everything should still work
correctly even if sampling method sets this wrongly so I think that's
acceptable.
I also did all the other adjustments we talked about up-thread and
rebased against current master (there was conflict with 31eae6028).
In the end I decided to not overload the heap_beginscan_strat even more
but just crate new heap_beginscan_ss interface.
Also while playing with the keywords I realized that not only REPEATABLE
can be moved to type_func_name_keyword but also TABLESAMPLE can be moved
down to col_name_keyword so both keyword were moved one level down
compared to previous versions. It's the best I could do in terms of
keywords.
Attached is new version.
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
0003-tablesample-ddl-v5.patchtext/x-diff; name=0003-tablesample-ddl-v5.patchDownload
>From 70cd678bdfe87bde171eeeab6345ba684c58c8bb Mon Sep 17 00:00:00 2001
From: Petr Jelinek <pjmodos@pjmodos.net>
Date: Wed, 7 Jan 2015 23:51:44 +0100
Subject: [PATCH 3/3] tablesample-ddl v5
---
doc/src/sgml/ref/allfiles.sgml | 2 +
doc/src/sgml/ref/create_tablesamplemethod.sgml | 184 +++++++++
doc/src/sgml/ref/drop_tablesamplemethod.sgml | 87 +++++
doc/src/sgml/reference.sgml | 2 +
src/backend/catalog/dependency.c | 15 +-
src/backend/catalog/objectaddress.c | 65 +++-
src/backend/commands/Makefile | 6 +-
src/backend/commands/dropcmds.c | 4 +
src/backend/commands/event_trigger.c | 3 +
src/backend/commands/tablecmds.c | 1 +
src/backend/commands/tablesample.c | 427 +++++++++++++++++++++
src/backend/parser/gram.y | 14 +-
src/backend/tcop/utility.c | 12 +
src/bin/pg_dump/common.c | 5 +
src/bin/pg_dump/pg_dump.c | 177 +++++++++
src/bin/pg_dump/pg_dump.h | 11 +-
src/bin/pg_dump/pg_dump_sort.c | 11 +-
src/include/catalog/dependency.h | 1 +
src/include/catalog/pg_tablesample_method.h | 11 +
src/include/nodes/parsenodes.h | 1 +
src/include/parser/kwlist.h | 1 +
src/test/modules/Makefile | 3 +-
src/test/modules/tablesample/.gitignore | 4 +
src/test/modules/tablesample/Makefile | 21 +
.../modules/tablesample/expected/tablesample.out | 38 ++
src/test/modules/tablesample/sql/tablesample.sql | 14 +
src/test/modules/tablesample/tsm_test--1.0.sql | 52 +++
src/test/modules/tablesample/tsm_test.c | 228 +++++++++++
src/test/modules/tablesample/tsm_test.control | 5 +
29 files changed, 1394 insertions(+), 11 deletions(-)
create mode 100644 doc/src/sgml/ref/create_tablesamplemethod.sgml
create mode 100644 doc/src/sgml/ref/drop_tablesamplemethod.sgml
create mode 100644 src/backend/commands/tablesample.c
create mode 100644 src/test/modules/tablesample/.gitignore
create mode 100644 src/test/modules/tablesample/Makefile
create mode 100644 src/test/modules/tablesample/expected/tablesample.out
create mode 100644 src/test/modules/tablesample/sql/tablesample.sql
create mode 100644 src/test/modules/tablesample/tsm_test--1.0.sql
create mode 100644 src/test/modules/tablesample/tsm_test.c
create mode 100644 src/test/modules/tablesample/tsm_test.control
diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index 7aa3128..2fad084 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -78,6 +78,7 @@ Complete list of usable sgml source files in this directory.
<!ENTITY createServer SYSTEM "create_server.sgml">
<!ENTITY createTable SYSTEM "create_table.sgml">
<!ENTITY createTableAs SYSTEM "create_table_as.sgml">
+<!ENTITY createTablesampleMethod SYSTEM "create_tablesamplemethod.sgml">
<!ENTITY createTableSpace SYSTEM "create_tablespace.sgml">
<!ENTITY createTrigger SYSTEM "create_trigger.sgml">
<!ENTITY createTSConfig SYSTEM "create_tsconfig.sgml">
@@ -119,6 +120,7 @@ Complete list of usable sgml source files in this directory.
<!ENTITY dropSequence SYSTEM "drop_sequence.sgml">
<!ENTITY dropServer SYSTEM "drop_server.sgml">
<!ENTITY dropTable SYSTEM "drop_table.sgml">
+<!ENTITY dropTablesampleMethod SYSTEM "drop_tablesamplemethod.sgml">
<!ENTITY dropTableSpace SYSTEM "drop_tablespace.sgml">
<!ENTITY dropTrigger SYSTEM "drop_trigger.sgml">
<!ENTITY dropTSConfig SYSTEM "drop_tsconfig.sgml">
diff --git a/doc/src/sgml/ref/create_tablesamplemethod.sgml b/doc/src/sgml/ref/create_tablesamplemethod.sgml
new file mode 100644
index 0000000..ff105d2
--- /dev/null
+++ b/doc/src/sgml/ref/create_tablesamplemethod.sgml
@@ -0,0 +1,184 @@
+<!--
+doc/src/sgml/ref/create_tablesamplemethod.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="SQL-CREATETABLESAMPLEMETHOD">
+ <indexterm zone="sql-createtablesamplemethod">
+ <primary>CREATE TABLESAMPLE METHOD</primary>
+ </indexterm>
+
+ <refmeta>
+ <refentrytitle>CREATE TABLESAMPLE METHOD</refentrytitle>
+ <manvolnum>7</manvolnum>
+ <refmiscinfo>SQL - Language Statements</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+ <refname>CREATE TABLESAMPLE METHOD</refname>
+ <refpurpose>define custom tablesample method</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+<synopsis>
+CREATE TABLESAMPLE METHOD <replaceable class="parameter">name</replaceable> (
+ INIT = <replaceable class="parameter">init_function</replaceable> ,
+ NEXTBLOCK = <replaceable class="parameter">nextblock_function</replaceable> ,
+ NEXTTUPLE = <replaceable class="parameter">nexttuple_function</replaceable> ,
+ END = <replaceable class="parameter">end_function</replaceable> ,
+ RESET = <replaceable class="parameter">reset_function</replaceable> ,
+ COST = <replaceable class="parameter">cost_function</replaceable>
+ [ , EXAMINETUPLE = <replaceable class="parameter">examinetuple_function</replaceable> ]
+ [ , SEQSCAN = <replaceable class="parameter">seqscan</replaceable> ]
+ [ , PAGEMODE = <replaceable class="parameter">pagemode</replaceable> ]
+)
+</synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+ <title>Description</title>
+
+ <para>
+ <command>CREATE TABLESAMPLE METHOD</command> creates a tablesample method.
+ A tablesample method provides alrorithm for reading sample part of a table
+ when used in <command>TABLESAMPLE</> clause of a <command>SELECT</>
+ statement.
+ </para>
+
+ <para>
+ You must be a superuser to use <command>CREATE TABLESAMPLE METHOD</command>.
+ </para>
+ </refsect1>
+
+ <refsect1>
+ <title>Parameters</title>
+
+ <variablelist>
+ <varlistentry>
+ <term><replaceable class="parameter">name</replaceable></term>
+ <listitem>
+ <para>
+ The name of the tablesample method to be created. This name must be
+ unique within the database.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable class="parameter">init_function</replaceable></term>
+ <listitem>
+ <para>
+ The name of the init function for the tablesample method.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable class="parameter">nextblock_function</replaceable></term>
+ <listitem>
+ <para>
+ The name of the get-next-block function for the tablesample method.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable class="parameter">nexttuple_function</replaceable></term>
+ <listitem>
+ <para>
+ The name of the get-next-tuple function for the tablesample method.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable class="parameter">end_function</replaceable></term>
+ <listitem>
+ <para>
+ The name of the end function for the tablesample method.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable class="parameter">reset_function</replaceable></term>
+ <listitem>
+ <para>
+ The name of the reset function for the tablesample method.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable class="parameter">cost_function</replaceable></term>
+ <listitem>
+ <para>
+ The name of the costing function for the tablesample method.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable class="parameter">examinetuple_function</replaceable></term>
+ <listitem>
+ <para>
+ The name of the function for inspecting the tuple contents in order
+ to make decision if it should be returned or not. This parameter
+ is optional.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable class="parameter">seqscan</replaceable></term>
+ <listitem>
+ <para>
+ True if the sampling method will do sequential scan of the whole table.
+ Used for cost estimation and syncscan. The default value if not specified
+ is False.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable class="parameter">pagemode</replaceable></term>
+ <listitem>
+ <para>
+ True if the sampling method will read whole page at a time. The default
+ value if not specified is False.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ </variablelist>
+
+ <para>
+ The function names can be schema-qualified if necessary. Argument types
+ are not given, since the argument list for each type of function is
+ predetermined. All functions are required.
+ </para>
+
+ <para>
+ The arguments can appear in any order, not only the one shown above.
+ </para>
+ </refsect1>
+
+ <refsect1>
+ <title>Compatibility</title>
+
+ <para>
+ There is no
+ <command>CREATE TABLESAMPLE METHOD</command> statement in the SQL
+ standard.
+ </para>
+ </refsect1>
+
+ <refsect1>
+ <title>See Also</title>
+
+ <simplelist type="inline">
+ <member><xref linkend="sql-droptablesamplemethod"></member>
+ <member><xref linkend="sql-select"></member>
+ </simplelist>
+ </refsect1>
+</refentry>
diff --git a/doc/src/sgml/ref/drop_tablesamplemethod.sgml b/doc/src/sgml/ref/drop_tablesamplemethod.sgml
new file mode 100644
index 0000000..dffd2ec
--- /dev/null
+++ b/doc/src/sgml/ref/drop_tablesamplemethod.sgml
@@ -0,0 +1,87 @@
+<!--
+doc/src/sgml/ref/drop_tablesamplemethod.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="SQL-DROPTABLESAMPLEMETHOD">
+ <indexterm zone="sql-droptablesamplemethod">
+ <primary>DROP TABLESAMPLE METHOD</primary>
+ </indexterm>
+
+ <refmeta>
+ <refentrytitle>DROP TABLESAMPLE METHOD</refentrytitle>
+ <manvolnum>7</manvolnum>
+ <refmiscinfo>SQL - Language Statements</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+ <refname>DROP TABLESAMPLE METHOD</refname>
+ <refpurpose>remove a custom tablesample method</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+<synopsis>
+DROP TABLESAMPLE METHOD [ IF EXISTS ] <replaceable class="parameter">name</replaceable>
+</synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+ <title>Description</title>
+
+ <para>
+ <command>DROP TABLESAMPLE METHOD</command> drop an existing tablesample
+ method.
+ </para>
+
+ <para>
+ You must be a superuser to use <command>CREATE TABLESAMPLE METHOD</command>.
+ </para>
+ </refsect1>
+
+ <refsect1>
+ <title>Parameters</title>
+
+ <variablelist>
+
+ <varlistentry>
+ <term><literal>IF EXISTS</literal></term>
+ <listitem>
+ <para>
+ Do not throw an error if the tablesample method does not exist.
+ A notice is issued in this case.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable class="parameter">name</replaceable></term>
+ <listitem>
+ <para>
+ The name of an existing tablesample method to be removed.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ </variablelist>
+
+ </refsect1>
+
+ <refsect1>
+ <title>Compatibility</title>
+
+ <para>
+ There is no
+ <command>DROP TABLESAMPLE METHOD</command> statement in the SQL
+ standard.
+ </para>
+ </refsect1>
+
+ <refsect1>
+ <title>See Also</title>
+
+ <simplelist type="inline">
+ <member><xref linkend="sql-createtablesamplemethod"></member>
+ <member><xref linkend="sql-select"></member>
+ </simplelist>
+ </refsect1>
+</refentry>
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index 10c9a6d..2c09a3c 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -106,6 +106,7 @@
&createServer;
&createTable;
&createTableAs;
+ &createTablesampleMethod;
&createTableSpace;
&createTSConfig;
&createTSDictionary;
@@ -147,6 +148,7 @@
&dropSequence;
&dropServer;
&dropTable;
+ &dropTablesampleMethod;
&dropTableSpace;
&dropTSConfig;
&dropTSDictionary;
diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index bacb242..6acb5b3 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -46,6 +46,7 @@
#include "catalog/pg_policy.h"
#include "catalog/pg_proc.h"
#include "catalog/pg_rewrite.h"
+#include "catalog/pg_tablesample_method.h"
#include "catalog/pg_tablespace.h"
#include "catalog/pg_trigger.h"
#include "catalog/pg_ts_config.h"
@@ -157,7 +158,8 @@ static const Oid object_classes[MAX_OCLASS] = {
DefaultAclRelationId, /* OCLASS_DEFACL */
ExtensionRelationId, /* OCLASS_EXTENSION */
EventTriggerRelationId, /* OCLASS_EVENT_TRIGGER */
- PolicyRelationId /* OCLASS_POLICY */
+ PolicyRelationId, /* OCLASS_POLICY */
+ TableSampleMethodRelationId /* OCLASS_TABLESAMPLEMETHOD */
};
@@ -1265,6 +1267,10 @@ doDeletion(const ObjectAddress *object, int flags)
RemovePolicyById(object->objectId);
break;
+ case OCLASS_TABLESAMPLEMETHOD:
+ RemoveTablesampleMethodById(object->objectId);
+ break;
+
default:
elog(ERROR, "unrecognized object class: %u",
object->classId);
@@ -1794,6 +1800,10 @@ find_expr_references_walker(Node *node,
case RTE_RELATION:
add_object_address(OCLASS_CLASS, rte->relid, 0,
context->addrs);
+ if (rte->tablesample)
+ add_object_address(OCLASS_TABLESAMPLEMETHOD,
+ rte->tablesample->tsmid, 0,
+ context->addrs);
break;
default:
break;
@@ -2373,6 +2383,9 @@ getObjectClass(const ObjectAddress *object)
case PolicyRelationId:
return OCLASS_POLICY;
+
+ case TableSampleMethodRelationId:
+ return OCLASS_TABLESAMPLEMETHOD;
}
/* shouldn't get here */
diff --git a/src/backend/catalog/objectaddress.c b/src/backend/catalog/objectaddress.c
index 142bc68..b50925f 100644
--- a/src/backend/catalog/objectaddress.c
+++ b/src/backend/catalog/objectaddress.c
@@ -44,6 +44,7 @@
#include "catalog/pg_proc.h"
#include "catalog/pg_policy.h"
#include "catalog/pg_rewrite.h"
+#include "catalog/pg_tablesample_method.h"
#include "catalog/pg_tablespace.h"
#include "catalog/pg_trigger.h"
#include "catalog/pg_ts_config.h"
@@ -429,7 +430,19 @@ static const ObjectPropertyType ObjectProperty[] =
Anum_pg_type_typacl,
ACL_KIND_TYPE,
true
- }
+ },
+ {
+ TableSampleMethodRelationId,
+ TableSampleMethodOidIndexId,
+ TABLESAMPLEMETHODOID,
+ TABLESAMPLEMETHODNAME,
+ Anum_pg_tablesample_method_tsmname,
+ InvalidAttrNumber,
+ InvalidAttrNumber,
+ InvalidAttrNumber,
+ -1,
+ true
+ },
};
/*
@@ -528,7 +541,9 @@ ObjectTypeMap[] =
/* OCLASS_EVENT_TRIGGER */
{ "event trigger", OBJECT_EVENT_TRIGGER },
/* OCLASS_POLICY */
- { "policy", OBJECT_POLICY }
+ { "policy", OBJECT_POLICY },
+ /* OCLASS_TABLESAMPLEMETHOD */
+ { "tablesample method", OBJECT_TABLESAMPLEMETHOD }
};
const ObjectAddress InvalidObjectAddress =
@@ -680,6 +695,7 @@ get_object_address(ObjectType objtype, List *objname, List *objargs,
case OBJECT_FDW:
case OBJECT_FOREIGN_SERVER:
case OBJECT_EVENT_TRIGGER:
+ case OBJECT_TABLESAMPLEMETHOD:
address = get_object_address_unqualified(objtype,
objname, missing_ok);
break;
@@ -914,6 +930,9 @@ get_object_address_unqualified(ObjectType objtype,
case OBJECT_EVENT_TRIGGER:
msg = gettext_noop("event trigger name cannot be qualified");
break;
+ case OBJECT_TABLESAMPLEMETHOD:
+ msg = gettext_noop("tablesample method name cannot be qualified");
+ break;
default:
elog(ERROR, "unrecognized objtype: %d", (int) objtype);
msg = NULL; /* placate compiler */
@@ -974,6 +993,11 @@ get_object_address_unqualified(ObjectType objtype,
address.objectId = get_event_trigger_oid(name, missing_ok);
address.objectSubId = 0;
break;
+ case OBJECT_TABLESAMPLEMETHOD:
+ address.classId = TableSampleMethodRelationId;
+ address.objectId = get_tablesample_method_oid(name, missing_ok);
+ address.objectSubId = 0;
+ break;
default:
elog(ERROR, "unrecognized objtype: %d", (int) objtype);
/* placate compiler, which doesn't know elog won't return */
@@ -1916,6 +1940,7 @@ check_object_ownership(Oid roleid, ObjectType objtype, ObjectAddress address,
break;
case OBJECT_TSPARSER:
case OBJECT_TSTEMPLATE:
+ case OBJECT_TABLESAMPLEMETHOD:
/* We treat these object types as being owned by superusers */
if (!superuser_arg(roleid))
ereport(ERROR,
@@ -2854,6 +2879,21 @@ getObjectDescription(const ObjectAddress *object)
break;
}
+ case OCLASS_TABLESAMPLEMETHOD:
+ {
+ HeapTuple tup;
+
+ tup = SearchSysCache1(TABLESAMPLEMETHODOID,
+ ObjectIdGetDatum(object->objectId));
+ if (!HeapTupleIsValid(tup))
+ elog(ERROR, "cache lookup failed for tablesample method %u",
+ object->objectId);
+ appendStringInfo(&buffer, _("tablesample method %s"),
+ NameStr(((Form_pg_tablesample_method) GETSTRUCT(tup))->tsmname));
+ ReleaseSysCache(tup);
+ break;
+ }
+
default:
appendStringInfo(&buffer, "unrecognized object %u %u %d",
object->classId,
@@ -3331,6 +3371,10 @@ getObjectTypeDescription(const ObjectAddress *object)
appendStringInfoString(&buffer, "policy");
break;
+ case OCLASS_TABLESAMPLEMETHOD:
+ appendStringInfoString(&buffer, "tablesample method");
+ break;
+
default:
appendStringInfo(&buffer, "unrecognized %u", object->classId);
break;
@@ -4242,6 +4286,23 @@ getObjectIdentityParts(const ObjectAddress *object,
break;
}
+ case OCLASS_TABLESAMPLEMETHOD:
+ {
+ HeapTuple tup;
+ Form_pg_tablesample_method tsmForm;
+
+ tup = SearchSysCache1(TABLESAMPLEMETHODOID,
+ ObjectIdGetDatum(object->objectId));
+ if (!HeapTupleIsValid(tup))
+ elog(ERROR, "cache lookup failed for tablesample method %u",
+ object->objectId);
+ tsmForm = (Form_pg_tablesample_method) GETSTRUCT(tup);
+ appendStringInfoString(&buffer,
+ quote_identifier(NameStr(tsmForm->tsmname)));
+ ReleaseSysCache(tup);
+ break;
+ }
+
default:
appendStringInfo(&buffer, "unrecognized object %u %u %d",
object->classId,
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index b1ac704..04fcd8c 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -18,8 +18,8 @@ OBJS = aggregatecmds.o alter.o analyze.o async.o cluster.o comment.o \
event_trigger.o explain.o extension.o foreigncmds.o functioncmds.o \
indexcmds.o lockcmds.o matview.o operatorcmds.o opclasscmds.o \
policy.o portalcmds.o prepare.o proclang.o \
- schemacmds.o seclabel.o sequence.o tablecmds.o tablespace.o trigger.o \
- tsearchcmds.o typecmds.o user.o vacuum.o vacuumlazy.o \
- variable.o view.o
+ schemacmds.o seclabel.o sequence.o tablecmds.o tablesample.o \
+ tablespace.o trigger.o tsearchcmds.o typecmds.o user.o vacuum.o \
+ vacuumlazy.o variable.o view.o
include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/commands/dropcmds.c b/src/backend/commands/dropcmds.c
index e5185ba..04d29a2 100644
--- a/src/backend/commands/dropcmds.c
+++ b/src/backend/commands/dropcmds.c
@@ -421,6 +421,10 @@ does_not_exist_skipping(ObjectType objtype, List *objname, List *objargs)
args = strVal(linitial(objargs));
}
break;
+ case OBJECT_TABLESAMPLEMETHOD:
+ msg = gettext_noop("tablesample method \"%s\" does not exist, skipping");
+ name = NameListToString(objname);
+ break;
default:
elog(ERROR, "unexpected object type (%d)", (int) objtype);
break;
diff --git a/src/backend/commands/event_trigger.c b/src/backend/commands/event_trigger.c
index 3fec57e..f774975 100644
--- a/src/backend/commands/event_trigger.c
+++ b/src/backend/commands/event_trigger.c
@@ -97,6 +97,7 @@ static event_trigger_support_data event_trigger_support[] = {
{"SEQUENCE", true},
{"SERVER", true},
{"TABLE", true},
+ {"TABLESAMPLE METHOD", true},
{"TABLESPACE", false},
{"TRIGGER", true},
{"TEXT SEARCH CONFIGURATION", true},
@@ -1087,6 +1088,7 @@ EventTriggerSupportsObjectType(ObjectType obtype)
case OBJECT_SEQUENCE:
case OBJECT_TABCONSTRAINT:
case OBJECT_TABLE:
+ case OBJECT_TABLESAMPLEMETHOD:
case OBJECT_TRIGGER:
case OBJECT_TSCONFIGURATION:
case OBJECT_TSDICTIONARY:
@@ -1144,6 +1146,7 @@ EventTriggerSupportsObjectClass(ObjectClass objclass)
case OCLASS_DEFACL:
case OCLASS_EXTENSION:
case OCLASS_POLICY:
+ case OCLASS_TABLESAMPLEMETHOD:
return true;
case MAX_OCLASS:
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 623e6bf..aad7a58 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -8086,6 +8086,7 @@ ATExecAlterColumnType(AlteredTableInfo *tab, Relation rel,
case OCLASS_USER_MAPPING:
case OCLASS_DEFACL:
case OCLASS_EXTENSION:
+ case OCLASS_TABLESAMPLEMETHOD:
/*
* We don't expect any of these sorts of objects to depend on
diff --git a/src/backend/commands/tablesample.c b/src/backend/commands/tablesample.c
new file mode 100644
index 0000000..f40820b
--- /dev/null
+++ b/src/backend/commands/tablesample.c
@@ -0,0 +1,427 @@
+/*-------------------------------------------------------------------------
+ *
+ * tablesample.c
+ * Commands to manipulate tablesample methods
+ *
+ * Table sampling methods provide algorithms for doing sample scan over
+ * the table.
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ * src/backend/commands/tablesample.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <unistd.h>
+#include <dirent.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "catalog/dependency.h"
+#include "catalog/indexing.h"
+#include "catalog/objectaccess.h"
+#include "catalog/pg_proc.h"
+#include "catalog/pg_tablesample_method.h"
+#include "catalog/pg_type.h"
+#include "commands/defrem.h"
+#include "miscadmin.h"
+#include "parser/parse_func.h"
+#include "utils/builtins.h"
+#include "utils/rel.h"
+#include "utils/lsyscache.h"
+#include "utils/syscache.h"
+
+
+static Datum
+get_tablesample_method_func(DefElem *defel, int attnum)
+{
+ List *funcName = defGetQualifiedName(defel);
+ /* Big enough size for our needs. */
+ Oid *typeId = palloc0(7 * sizeof(Oid));
+ Oid retTypeId;
+ int nargs;
+ Oid procOid = InvalidOid;
+ FuncCandidateList clist;
+
+ switch (attnum)
+ {
+ case Anum_pg_tablesample_method_tsminit:
+ /*
+ * tsminit needs special handling because it is defined as function
+ * with 3 or more arguments and only first two arguments must have
+ * specific type, the rest is up to the tablesample method creator.
+ */
+ {
+ nargs = 2;
+ typeId[0] = INTERNALOID;
+ typeId[1] = INT4OID;
+ retTypeId = VOIDOID;
+
+ clist = FuncnameGetCandidates(funcName, -1, NIL, false, false, false);
+
+ while (clist)
+ {
+ if (clist->nargs >= 3 &&
+ memcmp(typeId, clist->args, nargs * sizeof(Oid)) == 0)
+ {
+ procOid = clist->oid;
+ /* Save real function signature for future errors. */
+ nargs = clist->nargs;
+ pfree(typeId);
+ typeId = clist->args;
+ break;
+ }
+ clist = clist->next;
+ }
+
+ if (!OidIsValid(procOid))
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_FUNCTION),
+ errmsg("function \"%s\" does not exist or does not have valid signature",
+ NameListToString(funcName)),
+ errhint("The tamplesample method init function "
+ "must have at least 3 input parameters "
+ "with first one of type INTERNAL and second of type INTEGER.")));
+ }
+ break;
+
+ case Anum_pg_tablesample_method_tsmnextblock:
+ nargs = 1;
+ typeId[0] = INTERNALOID;
+ retTypeId = INT4OID;
+ break;
+ case Anum_pg_tablesample_method_tsmnexttuple:
+ nargs = 3;
+ typeId[0] = INTERNALOID;
+ typeId[1] = INT4OID;
+ typeId[2] = INT2OID;
+ retTypeId = INT2OID;
+ break;
+ case Anum_pg_tablesample_method_tsmexaminetuple:
+ nargs = 4;
+ typeId[0] = INTERNALOID;
+ typeId[1] = INT4OID;
+ typeId[2] = INTERNALOID;
+ typeId[3] = BOOLOID;
+ retTypeId = BOOLOID;
+ break;
+ case Anum_pg_tablesample_method_tsmend:
+ case Anum_pg_tablesample_method_tsmreset:
+ nargs = 1;
+ typeId[0] = INTERNALOID;
+ retTypeId = VOIDOID;
+ break;
+ case Anum_pg_tablesample_method_tsmcost:
+ nargs = 7;
+ typeId[0] = INTERNALOID;
+ typeId[1] = INTERNALOID;
+ typeId[2] = INTERNALOID;
+ typeId[3] = INTERNALOID;
+ typeId[4] = INTERNALOID;
+ typeId[5] = INTERNALOID;
+ typeId[6] = INTERNALOID;
+ retTypeId = VOIDOID;
+ break;
+ default:
+ /* should not be here */
+ elog(ERROR, "unrecognized attribute for tablesample method: %d",
+ attnum);
+ nargs = 0; /* keep compiler quiet */
+ }
+
+ if (!OidIsValid(procOid))
+ procOid = LookupFuncName(funcName, nargs, typeId, false);
+ if (get_func_rettype(procOid) != retTypeId)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("function %s should return type %s",
+ func_signature_string(funcName, nargs, NIL, typeId),
+ format_type_be(retTypeId))));
+
+ return ObjectIdGetDatum(procOid);
+}
+
+/*
+ * make pg_depend entries for a new pg_tablesample_method entry
+ */
+static void
+makeTablesampleMethodDeps(HeapTuple tuple)
+{
+ Form_pg_tablesample_method tsm = (Form_pg_tablesample_method) GETSTRUCT(tuple);
+ ObjectAddress myself,
+ referenced;
+
+ myself.classId = TableSampleMethodRelationId;
+ myself.objectId = HeapTupleGetOid(tuple);
+ myself.objectSubId = 0;
+
+ /* dependency on extension */
+ recordDependencyOnCurrentExtension(&myself, false);
+
+ /* dependencies on functions */
+ referenced.classId = ProcedureRelationId;
+ referenced.objectSubId = 0;
+
+ referenced.objectId = tsm->tsminit;
+ recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);
+
+ referenced.objectId = tsm->tsmnextblock;
+ recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);
+
+ referenced.objectId = tsm->tsmnexttuple;
+ recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);
+
+ if (OidIsValid(tsm->tsmexaminetuple))
+ {
+ referenced.objectId = tsm->tsmexaminetuple;
+ recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);
+ }
+
+ referenced.objectId = tsm->tsmend;
+ recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);
+
+ referenced.objectId = tsm->tsmreset;
+ recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);
+
+ referenced.objectId = tsm->tsmcost;
+ recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);
+}
+
+/*
+ * Create a table sampling method
+ *
+ * Only superusers can create a table sampling methods.
+ */
+Oid
+DefineTablesampleMethod(List *names, List *parameters)
+{
+ char *tsmname = strVal(linitial(names));
+ Oid tsmoid;
+ ListCell *pl;
+ Relation rel;
+ Datum values[Natts_pg_tablesample_method];
+ bool nulls[Natts_pg_tablesample_method];
+ HeapTuple tuple;
+
+ /* Must be super user. */
+ if (!superuser())
+ ereport(ERROR,
+ (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+ errmsg("permission denied to create tablesample method \"%s\"",
+ tsmname),
+ errhint("Must be superuser to create a tablesample method.")));
+
+ /* Must not already exist. */
+ tsmoid = get_tablesample_method_oid(tsmname, true);
+ if (OidIsValid(tsmoid))
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_OBJECT),
+ errmsg("tablesample method \"%s\" already exists",
+ tsmname)));
+
+ /* Initialize the values. */
+ memset(values, 0, sizeof(values));
+ memset(nulls, false, sizeof(nulls));
+
+ values[Anum_pg_tablesample_method_tsmname - 1] =
+ DirectFunctionCall1(namein, CStringGetDatum(tsmname));
+
+ /*
+ * loop over the definition list and extract the information we need.
+ */
+ foreach(pl, parameters)
+ {
+ DefElem *defel = (DefElem *) lfirst(pl);
+
+ if (pg_strcasecmp(defel->defname, "seqscan") == 0)
+ {
+ values[Anum_pg_tablesample_method_tsmseqscan - 1] =
+ BoolGetDatum(defGetBoolean(defel));
+ }
+ else if (pg_strcasecmp(defel->defname, "pagemode") == 0)
+ {
+ values[Anum_pg_tablesample_method_tsmpagemode - 1] =
+ BoolGetDatum(defGetBoolean(defel));
+ }
+ else if (pg_strcasecmp(defel->defname, "init") == 0)
+ {
+ values[Anum_pg_tablesample_method_tsminit - 1] =
+ get_tablesample_method_func(defel,
+ Anum_pg_tablesample_method_tsminit);
+ }
+ else if (pg_strcasecmp(defel->defname, "nextblock") == 0)
+ {
+ values[Anum_pg_tablesample_method_tsmnextblock - 1] =
+ get_tablesample_method_func(defel,
+ Anum_pg_tablesample_method_tsmnextblock);
+ }
+ else if (pg_strcasecmp(defel->defname, "nexttuple") == 0)
+ {
+ values[Anum_pg_tablesample_method_tsmnexttuple - 1] =
+ get_tablesample_method_func(defel,
+ Anum_pg_tablesample_method_tsmnexttuple);
+ }
+ else if (pg_strcasecmp(defel->defname, "examinetuple") == 0)
+ {
+ values[Anum_pg_tablesample_method_tsmexaminetuple - 1] =
+ get_tablesample_method_func(defel,
+ Anum_pg_tablesample_method_tsmexaminetuple);
+ }
+ else if (pg_strcasecmp(defel->defname, "end") == 0)
+ {
+ values[Anum_pg_tablesample_method_tsmend - 1] =
+ get_tablesample_method_func(defel,
+ Anum_pg_tablesample_method_tsmend);
+ }
+ else if (pg_strcasecmp(defel->defname, "reset") == 0)
+ {
+ values[Anum_pg_tablesample_method_tsmreset - 1] =
+ get_tablesample_method_func(defel,
+ Anum_pg_tablesample_method_tsmreset);
+ }
+ else if (pg_strcasecmp(defel->defname, "cost") == 0)
+ {
+ values[Anum_pg_tablesample_method_tsmcost - 1] =
+ get_tablesample_method_func(defel,
+ Anum_pg_tablesample_method_tsmcost);
+ }
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("tablesample method parameter \"%s\" not recognized",
+ defel->defname)));
+ }
+
+ /*
+ * Validation.
+ */
+ if (!OidIsValid(DatumGetObjectId(values[Anum_pg_tablesample_method_tsminit - 1])))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("tablesample method init function is required")));
+
+ if (!OidIsValid(DatumGetObjectId(values[Anum_pg_tablesample_method_tsmnextblock - 1])))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("tablesample method nextblock function is required")));
+
+ if (!OidIsValid(DatumGetObjectId(values[Anum_pg_tablesample_method_tsmnexttuple - 1])))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("tablesample method nexttuple function is required")));
+
+ if (!OidIsValid(DatumGetObjectId(values[Anum_pg_tablesample_method_tsmend - 1])))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("tablesample method end function is required")));
+
+ if (!OidIsValid(DatumGetObjectId(values[Anum_pg_tablesample_method_tsmreset - 1])))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("tablesample method reset function is required")));
+
+ if (!OidIsValid(DatumGetObjectId(values[Anum_pg_tablesample_method_tsmcost - 1])))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("tablesample method cost function is required")));
+
+ /*
+ * Insert tuple into pg_tablesample_method.
+ */
+ rel = heap_open(TableSampleMethodRelationId, RowExclusiveLock);
+
+ tuple = heap_form_tuple(rel->rd_att, values, nulls);
+
+ tsmoid = simple_heap_insert(rel, tuple);
+
+ CatalogUpdateIndexes(rel, tuple);
+
+ makeTablesampleMethodDeps(tuple);
+
+ heap_freetuple(tuple);
+
+ /* Post creation hook for new tablesample method */
+ InvokeObjectPostCreateHook(TableSampleMethodRelationId, tsmoid, 0);
+
+ heap_close(rel, RowExclusiveLock);
+
+ return tsmoid;
+}
+
+/*
+ * Drop a tablesample method.
+ */
+void
+RemoveTablesampleMethodById(Oid tsmoid)
+{
+ Relation rel;
+ HeapTuple tuple;
+ Form_pg_tablesample_method tsm;
+
+ /*
+ * Find the target tuple
+ */
+ rel = heap_open(TableSampleMethodRelationId, RowExclusiveLock);
+
+ tuple = SearchSysCache1(TABLESAMPLEMETHODOID, ObjectIdGetDatum(tsmoid));
+ if (!HeapTupleIsValid(tuple))
+ elog(ERROR, "cache lookup failed for tablesample method %u",
+ tsmoid);
+
+ tsm = (Form_pg_tablesample_method) GETSTRUCT(tuple);
+ /* Can't drop builtin tablesample methods. */
+ if (tsmoid == TABLESAMPLE_METHOD_SYSTEM_OID ||
+ tsmoid == TABLESAMPLE_METHOD_BERNOULLI_OID)
+ ereport(ERROR,
+ (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+ errmsg("permission denied for tablesample method %s",
+ NameStr(tsm->tsmname))));
+
+ /*
+ * Remove the pg_tablespace tuple (this will roll back if we fail below)
+ */
+ simple_heap_delete(rel, &tuple->t_self);
+
+ ReleaseSysCache(tuple);
+
+ heap_close(rel, RowExclusiveLock);
+}
+
+/*
+ * get_tablesample_method_oid - given a tablesample method name,
+ * look up the OID
+ *
+ * If missing_ok is false, throw an error if tablesample method name not found.
+ * If true, just return InvalidOid.
+ */
+Oid
+get_tablesample_method_oid(const char *tsmname, bool missing_ok)
+{
+ Oid result;
+ HeapTuple tuple;
+
+ tuple = SearchSysCache1(TABLESAMPLEMETHODNAME, PointerGetDatum(tsmname));
+ if (HeapTupleIsValid(tuple))
+ {
+ result = HeapTupleGetOid(tuple);
+ ReleaseSysCache(tuple);
+ }
+ else
+ result = InvalidOid;
+
+ if (!OidIsValid(result) && !missing_ok)
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_OBJECT),
+ errmsg("tablesample method \"%s\" does not exist",
+ tsmname)));
+
+ return result;
+}
+
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 41f71d6..9f02d31 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -590,7 +590,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
LEADING LEAKPROOF LEAST LEFT LEVEL LIKE LIMIT LISTEN LOAD LOCAL
LOCALTIME LOCALTIMESTAMP LOCATION LOCK_P LOCKED LOGGED
- MAPPING MATCH MATERIALIZED MAXVALUE MINUTE_P MINVALUE MODE MONTH_P MOVE
+ MAPPING MATCH MATERIALIZED MAXVALUE METHOD MINUTE_P MINVALUE MODE MONTH_P
+ MOVE
NAME_P NAMES NATIONAL NATURAL NCHAR NEXT NO NONE
NOT NOTHING NOTIFY NOTNULL NOWAIT NULL_P NULLIF
@@ -5091,6 +5092,15 @@ DefineStmt:
n->definition = list_make1(makeDefElem("from", (Node *) $5));
$$ = (Node *)n;
}
+ | CREATE TABLESAMPLE METHOD name definition
+ {
+ DefineStmt *n = makeNode(DefineStmt);
+ n->kind = OBJECT_TABLESAMPLEMETHOD;
+ n->args = NIL;
+ n->defnames = list_make1(makeString($4));
+ n->definition = $5;
+ $$ = (Node *)n;
+ }
;
definition: '(' def_list ')' { $$ = $2; }
@@ -5549,6 +5559,7 @@ drop_type: TABLE { $$ = OBJECT_TABLE; }
| TEXT_P SEARCH DICTIONARY { $$ = OBJECT_TSDICTIONARY; }
| TEXT_P SEARCH TEMPLATE { $$ = OBJECT_TSTEMPLATE; }
| TEXT_P SEARCH CONFIGURATION { $$ = OBJECT_TSCONFIGURATION; }
+ | TABLESAMPLE METHOD { $$ = OBJECT_TABLESAMPLEMETHOD; }
;
any_name_list:
@@ -13443,6 +13454,7 @@ unreserved_keyword:
| MATCH
| MATERIALIZED
| MAXVALUE
+ | METHOD
| MINUTE_P
| MINVALUE
| MODE
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 126e38d..862113b 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -23,6 +23,7 @@
#include "access/xlog.h"
#include "catalog/catalog.h"
#include "catalog/namespace.h"
+#include "catalog/pg_tablesample_method.h"
#include "catalog/toasting.h"
#include "commands/alter.h"
#include "commands/async.h"
@@ -1135,6 +1136,11 @@ ProcessUtilitySlow(Node *parsetree,
Assert(stmt->args == NIL);
DefineCollation(stmt->defnames, stmt->definition);
break;
+ case OBJECT_TABLESAMPLEMETHOD:
+ Assert(stmt->args == NIL);
+ Assert(list_length(stmt->defnames) == 1);
+ DefineTablesampleMethod(stmt->defnames, stmt->definition);
+ break;
default:
elog(ERROR, "unrecognized define stmt type: %d",
(int) stmt->kind);
@@ -2003,6 +2009,9 @@ CreateCommandTag(Node *parsetree)
case OBJECT_POLICY:
tag = "DROP POLICY";
break;
+ case OBJECT_TABLESAMPLEMETHOD:
+ tag = "DROP TABLESAMPLE METHOD";
+ break;
default:
tag = "???";
}
@@ -2099,6 +2108,9 @@ CreateCommandTag(Node *parsetree)
case OBJECT_COLLATION:
tag = "CREATE COLLATION";
break;
+ case OBJECT_TABLESAMPLEMETHOD:
+ tag = "CREATE TABLESAMPLE METHOD";
+ break;
default:
tag = "???";
}
diff --git a/src/bin/pg_dump/common.c b/src/bin/pg_dump/common.c
index 1a0a587..8a64e4b 100644
--- a/src/bin/pg_dump/common.c
+++ b/src/bin/pg_dump/common.c
@@ -103,6 +103,7 @@ getSchemaData(Archive *fout, DumpOptions *dopt, int *numTablesPtr)
int numForeignServers;
int numDefaultACLs;
int numEventTriggers;
+ int numTSMs;
if (g_verbose)
write_msg(NULL, "reading schemas\n");
@@ -251,6 +252,10 @@ getSchemaData(Archive *fout, DumpOptions *dopt, int *numTablesPtr)
write_msg(NULL, "reading policies\n");
getPolicies(fout, tblinfo, numTables);
+ if (g_verbose)
+ write_msg(NULL, "reading tablesample methods\n");
+ getTableSampleMethods(fout, &numTSMs);
+
*numTablesPtr = numTables;
return tblinfo;
}
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index fdfb431..20630cf 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -182,6 +182,7 @@ static void dumpSequenceData(Archive *fout, TableDataInfo *tdinfo);
static void dumpIndex(Archive *fout, DumpOptions *dopt, IndxInfo *indxinfo);
static void dumpConstraint(Archive *fout, DumpOptions *dopt, ConstraintInfo *coninfo);
static void dumpTableConstraintComment(Archive *fout, DumpOptions *dopt, ConstraintInfo *coninfo);
+static void dumpTableSampleMethod(Archive *fout, DumpOptions *dopt, TSMInfo *tbinfo);
static void dumpTSParser(Archive *fout, DumpOptions *dopt, TSParserInfo *prsinfo);
static void dumpTSDictionary(Archive *fout, DumpOptions *dopt, TSDictInfo *dictinfo);
static void dumpTSTemplate(Archive *fout, DumpOptions *dopt, TSTemplateInfo *tmplinfo);
@@ -7134,6 +7135,78 @@ getTableAttrs(Archive *fout, DumpOptions *dopt, TableInfo *tblinfo, int numTable
}
/*
+ * getTableSampleMethods:
+ * read all tablesample methods in the system catalogs and return them
+ * in the TSMInfo* structure
+ *
+ * numTSMs is set to the number of tablesample methods read in
+ */
+TSMInfo *
+getTableSampleMethods(Archive *fout, int *numTSMs)
+{
+ PGresult *res;
+ int ntups;
+ int i;
+ PQExpBuffer query;
+ TSMInfo *tsminfo;
+ int i_tableoid,
+ i_oid,
+ i_tsmname,
+ i_tsmseqscan,
+ i_tsmpagemode;
+
+ /* Before 9.5, there were no tablesample methods */
+ if (fout->remoteVersion < 90500)
+ {
+ *numTSMs = 0;
+ return NULL;
+ }
+
+ query = createPQExpBuffer();
+
+ appendPQExpBuffer(query,
+ "SELECT tableoid, oid, tsmname, tsmseqscan, tsmpagemode "
+ "FROM pg_catalog.pg_tablesample_method "
+ "WHERE oid >= '%u'::pg_catalog.oid",
+ FirstNormalObjectId);
+
+ res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+
+ ntups = PQntuples(res);
+ *numTSMs = ntups;
+
+ tsminfo = (TSMInfo *) pg_malloc(ntups * sizeof(TSMInfo));
+
+ i_tableoid = PQfnumber(res, "tableoid");
+ i_oid = PQfnumber(res, "oid");
+ i_tsmname = PQfnumber(res, "tsmname");
+ i_tsmseqscan = PQfnumber(res, "tsmseqscan");
+ i_tsmpagemode = PQfnumber(res, "tsmpagemode");
+
+ for (i = 0; i < ntups; i++)
+ {
+ tsminfo[i].dobj.objType = DO_TABLESAMPLE_METHOD;
+ tsminfo[i].dobj.catId.tableoid = atooid(PQgetvalue(res, i, i_tableoid));
+ tsminfo[i].dobj.catId.oid = atooid(PQgetvalue(res, i, i_oid));
+ AssignDumpId(&tsminfo[i].dobj);
+ tsminfo[i].dobj.name = pg_strdup(PQgetvalue(res, i, i_tsmname));
+ tsminfo[i].dobj.namespace = NULL;
+ tsminfo[i].tsmseqscan = PQgetvalue(res, i, i_tsmseqscan)[0] == 't';
+ tsminfo[i].tsmpagemode = PQgetvalue(res, i, i_tsmpagemode)[0] == 't';
+
+ /* Decide whether we want to dump it */
+ selectDumpableObject(&(tsminfo[i].dobj));
+ }
+
+ PQclear(res);
+
+ destroyPQExpBuffer(query);
+
+ return tsminfo;
+}
+
+
+/*
* Test whether a column should be printed as part of table's CREATE TABLE.
* Column number is zero-based.
*
@@ -8226,6 +8299,9 @@ dumpDumpableObject(Archive *fout, DumpOptions *dopt, DumpableObject *dobj)
case DO_POLICY:
dumpPolicy(fout, dopt, (PolicyInfo *) dobj);
break;
+ case DO_TABLESAMPLE_METHOD:
+ dumpTableSampleMethod(fout, dopt, (TSMInfo *) dobj);
+ break;
case DO_PRE_DATA_BOUNDARY:
case DO_POST_DATA_BOUNDARY:
/* never dumped, nothing to do */
@@ -12226,6 +12302,106 @@ dumpAgg(Archive *fout, DumpOptions *dopt, AggInfo *agginfo)
}
/*
+ * dumpTableSampleMethod
+ * write the declaration of one user-defined tablesample method
+ */
+static void
+dumpTableSampleMethod(Archive *fout, DumpOptions *dopt, TSMInfo *tsminfo)
+{
+ PGresult *res;
+ PQExpBuffer q;
+ PQExpBuffer delq;
+ PQExpBuffer labelq;
+ PQExpBuffer query;
+ char *tsminit;
+ char *tsmnextblock;
+ char *tsmnexttuple;
+ char *tsmexaminetuple;
+ char *tsmend;
+ char *tsmreset;
+ char *tsmcost;
+
+ /* Skip if not to be dumped */
+ if (!tsminfo->dobj.dump || dopt->dataOnly)
+ return;
+
+ q = createPQExpBuffer();
+ delq = createPQExpBuffer();
+ labelq = createPQExpBuffer();
+ query = createPQExpBuffer();
+
+ /* Make sure we are in proper schema */
+ selectSourceSchema(fout, "pg_catalog");
+
+ appendPQExpBuffer(query, "SELECT tsminit, tsmnextblock, "
+ "tsmnexttuple, tsmexaminetuple, "
+ "tsmend, tsmreset, tsmcost "
+ "FROM pg_catalog.pg_tablesample_method "
+ "WHERE oid = '%u'::pg_catalog.oid",
+ tsminfo->dobj.catId.oid);
+
+ res = ExecuteSqlQueryForSingleRow(fout, query->data);
+
+ tsminit = PQgetvalue(res, 0, PQfnumber(res, "tsminit"));
+ tsmnexttuple = PQgetvalue(res, 0, PQfnumber(res, "tsmnexttuple"));
+ tsmnextblock = PQgetvalue(res, 0, PQfnumber(res, "tsmnextblock"));
+ tsmexaminetuple = PQgetvalue(res, 0, PQfnumber(res, "tsmexaminetuple"));
+ tsmend = PQgetvalue(res, 0, PQfnumber(res, "tsmend"));
+ tsmreset = PQgetvalue(res, 0, PQfnumber(res, "tsmreset"));
+ tsmcost = PQgetvalue(res, 0, PQfnumber(res, "tsmcost"));
+
+ appendPQExpBuffer(q, "CREATE TABLESAMPLE METHOD %s (\n",
+ fmtId(tsminfo->dobj.name));
+
+ appendPQExpBuffer(q, " INIT = %s,\n", tsminit);
+ appendPQExpBuffer(q, " NEXTTUPLE = %s,\n", tsmnexttuple);
+ appendPQExpBuffer(q, " NEXTBLOCK = %s,\n", tsmnextblock);
+ appendPQExpBuffer(q, " END = %s,\n", tsmend);
+ appendPQExpBuffer(q, " RESET = %s,\n", tsmreset);
+ appendPQExpBuffer(q, " COST = %s", tsmcost);
+
+ if (strcmp(tsmexaminetuple, "-") != 0)
+ appendPQExpBuffer(q, ",\n EXAMINETUPLE = %s", tsmexaminetuple);
+
+ if (tsminfo->tsmseqscan)
+ appendPQExpBufferStr(q, ",\n SEQSCAN = true");
+
+ if (tsminfo->tsmpagemode)
+ appendPQExpBufferStr(q, ",\n PAGEMODE = true");
+
+ appendPQExpBufferStr(q, "\n);");
+
+ appendPQExpBuffer(delq, "DROP TABLESAMPLE METHOD %s",
+ fmtId(tsminfo->dobj.name));
+
+ appendPQExpBuffer(labelq, "TABLESAMPLE METHOD %s",
+ fmtId(tsminfo->dobj.name));
+
+ if (dopt->binary_upgrade)
+ binary_upgrade_extension_member(q, &tsminfo->dobj, labelq->data);
+
+ ArchiveEntry(fout, tsminfo->dobj.catId, tsminfo->dobj.dumpId,
+ tsminfo->dobj.name,
+ NULL,
+ NULL,
+ "",
+ false, "TABLESAMPLE METHOD", SECTION_PRE_DATA,
+ q->data, delq->data, NULL,
+ NULL, 0,
+ NULL, NULL);
+
+ /* Dump Parser Comments */
+ dumpComment(fout, dopt, labelq->data,
+ NULL, "",
+ tsminfo->dobj.catId, 0, tsminfo->dobj.dumpId);
+
+ PQclear(res);
+ destroyPQExpBuffer(q);
+ destroyPQExpBuffer(delq);
+ destroyPQExpBuffer(labelq);
+}
+
+/*
* dumpTSParser
* write out a single text search parser
*/
@@ -15655,6 +15831,7 @@ addBoundaryDependencies(DumpableObject **dobjs, int numObjs,
case DO_FDW:
case DO_FOREIGN_SERVER:
case DO_BLOB:
+ case DO_TABLESAMPLE_METHOD:
/* Pre-data objects: must come before the pre-data boundary */
addObjectDependency(preDataBound, dobj->dumpId);
break;
diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h
index a9d3c10..87bef24 100644
--- a/src/bin/pg_dump/pg_dump.h
+++ b/src/bin/pg_dump/pg_dump.h
@@ -76,7 +76,8 @@ typedef enum
DO_POST_DATA_BOUNDARY,
DO_EVENT_TRIGGER,
DO_REFRESH_MATVIEW,
- DO_POLICY
+ DO_POLICY,
+ DO_TABLESAMPLE_METHOD
} DumpableObjectType;
typedef struct _dumpableObject
@@ -383,6 +384,13 @@ typedef struct _inhInfo
Oid inhparent; /* OID of its parent */
} InhInfo;
+typedef struct _tsmInfo
+{
+ DumpableObject dobj;
+ bool tsmseqscan;
+ bool tsmpagemode;
+} TSMInfo;
+
typedef struct _prsInfo
{
DumpableObject dobj;
@@ -536,6 +544,7 @@ extern ProcLangInfo *getProcLangs(Archive *fout, int *numProcLangs);
extern CastInfo *getCasts(Archive *fout, DumpOptions *dopt, int *numCasts);
extern void getTableAttrs(Archive *fout, DumpOptions *dopt, TableInfo *tbinfo, int numTables);
extern bool shouldPrintColumn(DumpOptions *dopt, TableInfo *tbinfo, int colno);
+extern TSMInfo *getTableSampleMethods(Archive *fout, int *numTSMs);
extern TSParserInfo *getTSParsers(Archive *fout, int *numTSParsers);
extern TSDictInfo *getTSDictionaries(Archive *fout, int *numTSDicts);
extern TSTemplateInfo *getTSTemplates(Archive *fout, int *numTSTemplates);
diff --git a/src/bin/pg_dump/pg_dump_sort.c b/src/bin/pg_dump/pg_dump_sort.c
index c5ed593..9567cf6 100644
--- a/src/bin/pg_dump/pg_dump_sort.c
+++ b/src/bin/pg_dump/pg_dump_sort.c
@@ -73,7 +73,8 @@ static const int oldObjectTypePriority[] =
13, /* DO_POST_DATA_BOUNDARY */
20, /* DO_EVENT_TRIGGER */
15, /* DO_REFRESH_MATVIEW */
- 21 /* DO_POLICY */
+ 21, /* DO_POLICY */
+ 5 /* DO_TABLESAMPLE_METHOD */
};
/*
@@ -122,7 +123,8 @@ static const int newObjectTypePriority[] =
25, /* DO_POST_DATA_BOUNDARY */
32, /* DO_EVENT_TRIGGER */
33, /* DO_REFRESH_MATVIEW */
- 34 /* DO_POLICY */
+ 34, /* DO_POLICY */
+ 17 /* DO_TABLESAMPLE_METHOD */
};
static DumpId preDataBoundId;
@@ -1460,6 +1462,11 @@ describeDumpableObject(DumpableObject *obj, char *buf, int bufsize)
"POLICY (ID %d OID %u)",
obj->dumpId, obj->catId.oid);
return;
+ case DO_TABLESAMPLE_METHOD:
+ snprintf(buf, bufsize,
+ "TABLESAMPLE METHOD %s (ID %d OID %u)",
+ obj->name, obj->dumpId, obj->catId.oid);
+ return;
case DO_PRE_DATA_BOUNDARY:
snprintf(buf, bufsize,
"PRE-DATA BOUNDARY (ID %d)",
diff --git a/src/include/catalog/dependency.h b/src/include/catalog/dependency.h
index 6481ac8..30653f8 100644
--- a/src/include/catalog/dependency.h
+++ b/src/include/catalog/dependency.h
@@ -148,6 +148,7 @@ typedef enum ObjectClass
OCLASS_EXTENSION, /* pg_extension */
OCLASS_EVENT_TRIGGER, /* pg_event_trigger */
OCLASS_POLICY, /* pg_policy */
+ OCLASS_TABLESAMPLEMETHOD, /* pg_tablesample_method */
MAX_OCLASS /* MUST BE LAST */
} ObjectClass;
diff --git a/src/include/catalog/pg_tablesample_method.h b/src/include/catalog/pg_tablesample_method.h
index 98df9cb..217e3e0 100644
--- a/src/include/catalog/pg_tablesample_method.h
+++ b/src/include/catalog/pg_tablesample_method.h
@@ -71,7 +71,18 @@ typedef FormData_pg_tablesample_method *Form_pg_tablesample_method;
DATA(insert OID = 3283 ( system false true tsm_system_init tsm_system_nextblock tsm_system_nexttuple - tsm_system_end tsm_system_reset tsm_system_cost ));
DESCR("SYSTEM table sampling method");
+#define TABLESAMPLE_METHOD_SYSTEM_OID 3283
DATA(insert OID = 3284 ( bernoulli true false tsm_bernoulli_init tsm_bernoulli_nextblock tsm_bernoulli_nexttuple - tsm_bernoulli_end tsm_bernoulli_reset tsm_bernoulli_cost ));
DESCR("BERNOULLI table sampling method");
+#define TABLESAMPLE_METHOD_BERNOULLI_OID 3284
+
+/* ----------------
+ * functions for manipulation of pg_tablesample_method
+ * ----------------
+ */
+
+extern Oid DefineTablesampleMethod(List *names, List *parameters);
+extern void RemoveTablesampleMethodById(Oid tsmoid);
+extern Oid get_tablesample_method_oid(const char *tsmname, bool missing_ok);
#endif /* PG_TABLESAMPLE_METHOD_H */
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 3672e37..d4a7b69 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -1298,6 +1298,7 @@ typedef enum ObjectType
OBJECT_SEQUENCE,
OBJECT_TABCONSTRAINT,
OBJECT_TABLE,
+ OBJECT_TABLESAMPLEMETHOD,
OBJECT_TABLESPACE,
OBJECT_TRIGGER,
OBJECT_TSCONFIGURATION,
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index 0a7b650..515304c 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -236,6 +236,7 @@ PG_KEYWORD("mapping", MAPPING, UNRESERVED_KEYWORD)
PG_KEYWORD("match", MATCH, UNRESERVED_KEYWORD)
PG_KEYWORD("materialized", MATERIALIZED, UNRESERVED_KEYWORD)
PG_KEYWORD("maxvalue", MAXVALUE, UNRESERVED_KEYWORD)
+PG_KEYWORD("method", METHOD, UNRESERVED_KEYWORD)
PG_KEYWORD("minute", MINUTE_P, UNRESERVED_KEYWORD)
PG_KEYWORD("minvalue", MINVALUE, UNRESERVED_KEYWORD)
PG_KEYWORD("mode", MODE, UNRESERVED_KEYWORD)
diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile
index 93d93af..37ea524 100644
--- a/src/test/modules/Makefile
+++ b/src/test/modules/Makefile
@@ -9,7 +9,8 @@ SUBDIRS = \
worker_spi \
dummy_seclabel \
test_shm_mq \
- test_parser
+ test_parser \
+ tablesample
all: submake-errcodes
diff --git a/src/test/modules/tablesample/.gitignore b/src/test/modules/tablesample/.gitignore
new file mode 100644
index 0000000..5dcb3ff
--- /dev/null
+++ b/src/test/modules/tablesample/.gitignore
@@ -0,0 +1,4 @@
+# Generated subdirectories
+/log/
+/results/
+/tmp_check/
diff --git a/src/test/modules/tablesample/Makefile b/src/test/modules/tablesample/Makefile
new file mode 100644
index 0000000..469b004
--- /dev/null
+++ b/src/test/modules/tablesample/Makefile
@@ -0,0 +1,21 @@
+# src/test/modules/tsm_test/Makefile
+
+MODULE_big = tsm_test
+OBJS = tsm_test.o $(WIN32RES)
+PGFILEDESC = "tsm_test - example of a custom tablesample method"
+
+EXTENSION = tsm_test
+DATA = tsm_test--1.0.sql
+
+REGRESS = tablesample
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/tablesample
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/src/test/modules/tablesample/expected/tablesample.out b/src/test/modules/tablesample/expected/tablesample.out
new file mode 100644
index 0000000..ad62e32
--- /dev/null
+++ b/src/test/modules/tablesample/expected/tablesample.out
@@ -0,0 +1,38 @@
+CREATE EXTENSION tsm_test;
+CREATE TABLE test_tsm AS SELECT md5(i::text) a, 0.5::float b FROM generate_series(1,10) g(i);
+SELECT * FROM test_tsm TABLESAMPLE tsm_test('b') REPEATABLE (1);
+ a | b
+----------------------------------+-----
+ c4ca4238a0b923820dcc509a6f75849b | 0.5
+ c81e728d9d4c2f636f067f89cc14862c | 0.5
+ a87ff679a2f3e71d9181a67b7542122c | 0.5
+ 1679091c5a880faf6fb5e6087eb1b2dc | 0.5
+ 8f14e45fceea167a5a36dedd4bea2543 | 0.5
+ d3d9446802a44259755d38e6d163e820 | 0.5
+(6 rows)
+
+CREATE VIEW test_tsm_v AS SELECT * FROM test_tsm TABLESAMPLE tsm_test('b') REPEATABLE (9999);
+SELECT * FROM test_tsm_v;
+ a | b
+----------------------------------+-----
+ c4ca4238a0b923820dcc509a6f75849b | 0.5
+ e4da3b7fbbce2345d7772b0674a318d5 | 0.5
+ 1679091c5a880faf6fb5e6087eb1b2dc | 0.5
+ 8f14e45fceea167a5a36dedd4bea2543 | 0.5
+ c9f0f895fb98ab9159f51fd0297e236d | 0.5
+(5 rows)
+
+DROP TABLESAMPLE METHOD tsm_test;
+ERROR: cannot drop tablesample method tsm_test because extension tsm_test requires it
+HINT: You can drop extension tsm_test instead.
+DROP EXTENSION tsm_test;
+ERROR: cannot drop extension tsm_test because other objects depend on it
+DETAIL: view test_tsm_v depends on tablesample method tsm_test
+HINT: Use DROP ... CASCADE to drop the dependent objects too.
+DROP EXTENSION tsm_test CASCADE;
+NOTICE: drop cascades to view test_tsm_v
+SELECT * FROM pg_tablesample_method WHERE tsmname = 'tsm_test';
+ tsmname | tsmseqscan | tsmpagemode | tsminit | tsmnextblock | tsmnexttuple | tsmexaminetuple | tsmend | tsmreset | tsmcost
+---------+------------+-------------+---------+--------------+--------------+-----------------+--------+----------+---------
+(0 rows)
+
diff --git a/src/test/modules/tablesample/sql/tablesample.sql b/src/test/modules/tablesample/sql/tablesample.sql
new file mode 100644
index 0000000..b1104d6
--- /dev/null
+++ b/src/test/modules/tablesample/sql/tablesample.sql
@@ -0,0 +1,14 @@
+CREATE EXTENSION tsm_test;
+
+CREATE TABLE test_tsm AS SELECT md5(i::text) a, 0.5::float b FROM generate_series(1,10) g(i);
+
+SELECT * FROM test_tsm TABLESAMPLE tsm_test('b') REPEATABLE (1);
+
+CREATE VIEW test_tsm_v AS SELECT * FROM test_tsm TABLESAMPLE tsm_test('b') REPEATABLE (9999);
+SELECT * FROM test_tsm_v;
+
+DROP TABLESAMPLE METHOD tsm_test;
+DROP EXTENSION tsm_test;
+DROP EXTENSION tsm_test CASCADE;
+
+SELECT * FROM pg_tablesample_method WHERE tsmname = 'tsm_test';
diff --git a/src/test/modules/tablesample/tsm_test--1.0.sql b/src/test/modules/tablesample/tsm_test--1.0.sql
new file mode 100644
index 0000000..e5a9ae8
--- /dev/null
+++ b/src/test/modules/tablesample/tsm_test--1.0.sql
@@ -0,0 +1,52 @@
+/* src/test/modules/tablesample/tsm_test--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION tsm_test" to load this file. \quit
+
+CREATE FUNCTION tsm_test_init(internal, int4, text)
+RETURNS void
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION tsm_test_nextblock(internal)
+RETURNS int4
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION tsm_test_nexttuple(internal, int4, int2)
+RETURNS int2
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION tsm_test_examinetuple(internal, int4, internal, bool)
+RETURNS bool
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION tsm_test_end(internal)
+RETURNS void
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION tsm_test_reset(internal)
+RETURNS void
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION tsm_test_cost(internal, internal, internal, internal, internal, internal, internal)
+RETURNS void
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+
+CREATE TABLESAMPLE METHOD tsm_test (
+ SEQSCAN = true,
+ PAGEMODE = true,
+ INIT = tsm_test_init,
+ NEXTBLOCK = tsm_test_nextblock,
+ NEXTTUPLE = tsm_test_nexttuple,
+ EXAMINETUPLE = tsm_test_examinetuple,
+ END = tsm_test_end,
+ RESET = tsm_test_reset,
+ COST = tsm_test_cost
+);
diff --git a/src/test/modules/tablesample/tsm_test.c b/src/test/modules/tablesample/tsm_test.c
new file mode 100644
index 0000000..be4dcb9
--- /dev/null
+++ b/src/test/modules/tablesample/tsm_test.c
@@ -0,0 +1,228 @@
+/*-------------------------------------------------------------------------
+ *
+ * tsm_test.c
+ * Simple example of a custom tablesample method
+ *
+ * Copyright (c) 2007-2014, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/test/modules/tablesample/tsm_test.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "fmgr.h"
+
+#include "access/htup_details.h"
+#include "access/relscan.h"
+#include "access/tupdesc.h"
+#include "catalog/pg_type.h"
+#include "nodes/execnodes.h"
+#include "nodes/relation.h"
+#include "optimizer/clauses.h"
+#include "storage/bufmgr.h"
+#include "utils/builtins.h"
+#include "utils/lsyscache.h"
+#include "utils/rel.h"
+#include "utils/sampling.h"
+#include "utils/tablesample.h"
+
+PG_MODULE_MAGIC;
+
+/* State */
+typedef struct
+{
+ uint32 seed; /* random seed */
+ AttrNumber attnum; /* column to check */
+ TupleDesc tupDesc; /* tuple descriptor of table */
+ BlockNumber startblock; /* starting block, we use ths for syncscan support */
+ BlockNumber nblocks; /* total blocks in relation */
+ BlockNumber blockno; /* current block */
+ OffsetNumber lt; /* last tuple returned from current block */
+ SamplerRandomState randstate; /* random generator state */
+} tsm_test_state;
+
+
+PG_FUNCTION_INFO_V1(tsm_test_init);
+PG_FUNCTION_INFO_V1(tsm_test_nextblock);
+PG_FUNCTION_INFO_V1(tsm_test_nexttuple);
+PG_FUNCTION_INFO_V1(tsm_test_examinetuple);
+PG_FUNCTION_INFO_V1(tsm_test_end);
+PG_FUNCTION_INFO_V1(tsm_test_reset);
+PG_FUNCTION_INFO_V1(tsm_test_cost);
+
+/*
+ * Initialize the state.
+ */
+Datum
+tsm_test_init(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ uint32 seed = PG_GETARG_UINT32(1);
+ char *attname;
+ AttrNumber attnum;
+ Oid atttype;
+ Relation rel = scanstate->ss.ss_currentRelation;
+ HeapScanDesc scan = scanstate->ss.ss_currentScanDesc;
+ tsm_test_state *state;
+ TupleDesc tupDesc = RelationGetDescr(rel);
+
+ if (PG_ARGISNULL(2))
+ ereport(ERROR,
+ (errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED),
+ errmsg("invalid parameter for tablesample method tsm_test"),
+ errhint("attnum cannot be NULL.")));
+
+ attname = text_to_cstring(PG_GETARG_TEXT_P(2));
+
+ attnum = get_attnum(rel->rd_id, attname);
+ if (!AttrNumberIsForUserDefinedAttr(attnum))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid parameter for tablesample method tsm_test"),
+ errhint("column %s does not exist", attname)));
+
+ atttype = get_atttype(rel->rd_id, attnum);
+ if (atttype != FLOAT8OID)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid parameter for tablesample method tsm_test"),
+ errhint("column %s is not of type float.", attname)));
+
+ state = palloc0(sizeof(tsm_test_state));
+
+ /* Remember initial values for reinit */
+ state->seed = seed;
+ state->attnum = attnum;
+ state->tupDesc = tupDesc;
+ state->startblock = scan->rs_startblock;
+ state->nblocks = scan->rs_nblocks;
+ state->blockno = InvalidBlockNumber;
+ state->lt = InvalidOffsetNumber;
+ sampler_random_init_state(state->seed, state->randstate);
+
+ scanstate->tsmdata = (void *) state;
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Get next block number to read or InvalidBlockNumber if we are at the
+ * end of the relation.
+ */
+Datum
+tsm_test_nextblock(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ tsm_test_state *state = (tsm_test_state *) scanstate->tsmdata;
+
+ /* Cycle from startblock to startblock to support syncscan. */
+ if (state->blockno == InvalidBlockNumber)
+ state->blockno = state->startblock;
+ else
+ {
+ state->blockno++;
+
+ if (state->blockno >= state->nblocks)
+ state->blockno = 0;
+
+ if (state->blockno == state->startblock)
+ PG_RETURN_UINT32(InvalidBlockNumber);
+ }
+
+ PG_RETURN_UINT32(state->blockno);
+}
+
+/*
+ * Get next tuple from current block.
+ */
+Datum
+tsm_test_nexttuple(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ OffsetNumber maxoffset = PG_GETARG_UINT16(2);
+ tsm_test_state *state = (tsm_test_state *) scanstate->tsmdata;
+
+ if (state->lt == InvalidOffsetNumber)
+ state->lt = FirstOffsetNumber;
+ else if (++state->lt > maxoffset)
+ PG_RETURN_UINT16(InvalidOffsetNumber);
+
+ PG_RETURN_UINT16(state->lt);
+}
+
+/*
+ * Examine tuple and decide if it should be returned.
+ */
+Datum
+tsm_test_examinetuple(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ HeapTuple tuple = (HeapTuple) PG_GETARG_POINTER(2);
+ bool visible = PG_GETARG_BOOL(3);
+ tsm_test_state *state = (tsm_test_state *) scanstate->tsmdata;
+ bool isnull;
+ float8 val, rand;
+
+ if (!visible)
+ PG_RETURN_BOOL(false);
+
+ val = DatumGetFloat8(heap_getattr(tuple, state->attnum, state->tupDesc, &isnull));
+ rand = sampler_random_fract(state->randstate);
+ if (isnull || val < rand)
+ PG_RETURN_BOOL(false);
+ else
+ PG_RETURN_BOOL(true);
+}
+
+/*
+ * Cleanup method.
+ */
+Datum
+tsm_test_end(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+
+ pfree(scanstate->tsmdata);
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Reset state (called by ReScan).
+ */
+Datum
+tsm_test_reset(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ tsm_test_state *state = (tsm_test_state *) scanstate->tsmdata;
+
+ state->blockno = InvalidBlockNumber;
+ state->lt = InvalidOffsetNumber;
+
+ sampler_random_init_state(state->seed, state->randstate);
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Costing function.
+ */
+Datum
+tsm_test_cost(PG_FUNCTION_ARGS)
+{
+ Path *path = (Path *) PG_GETARG_POINTER(1);
+ RelOptInfo *baserel = (RelOptInfo *) PG_GETARG_POINTER(2);
+ BlockNumber *pages = (BlockNumber *) PG_GETARG_POINTER(4);
+ double *tuples = (double *) PG_GETARG_POINTER(5);
+
+ *pages = baserel->pages;
+
+ /* This is very bad estimation */
+ *tuples = path->rows = path->rows/2;
+
+ PG_RETURN_VOID();
+}
+
diff --git a/src/test/modules/tablesample/tsm_test.control b/src/test/modules/tablesample/tsm_test.control
new file mode 100644
index 0000000..a7b2741
--- /dev/null
+++ b/src/test/modules/tablesample/tsm_test.control
@@ -0,0 +1,5 @@
+# tsm_test extension
+comment = 'test module for custom tablesample method'
+default_version = '1.0'
+module_pathname = '$libdir/tsm_test'
+relocatable = true
--
1.9.1
0002-tablesample-v10.patchtext/x-diff; name=0002-tablesample-v10.patchDownload
>From 170685a1c3d6b8d8c74d1cc0f1af455f655438cb Mon Sep 17 00:00:00 2001
From: Petr Jelinek <pjmodos@pjmodos.net>
Date: Wed, 7 Jan 2015 23:37:55 +0100
Subject: [PATCH 2/3] tablesample v10
---
contrib/file_fdw/file_fdw.c | 2 +-
contrib/postgres_fdw/postgres_fdw.c | 2 +-
doc/src/sgml/catalogs.sgml | 112 ++++++
doc/src/sgml/ref/select.sgml | 38 +-
src/backend/access/Makefile | 3 +-
src/backend/access/heap/heapam.c | 43 ++-
src/backend/catalog/Makefile | 2 +-
src/backend/commands/analyze.c | 2 +-
src/backend/commands/explain.c | 7 +
src/backend/executor/Makefile | 2 +-
src/backend/executor/execAmi.c | 8 +
src/backend/executor/execCurrent.c | 1 +
src/backend/executor/execProcnode.c | 14 +
src/backend/executor/nodeSamplescan.c | 556 ++++++++++++++++++++++++++++
src/backend/nodes/copyfuncs.c | 60 +++
src/backend/nodes/equalfuncs.c | 37 ++
src/backend/nodes/nodeFuncs.c | 12 +
src/backend/nodes/outfuncs.c | 48 +++
src/backend/nodes/readfuncs.c | 45 +++
src/backend/optimizer/path/allpaths.c | 49 +++
src/backend/optimizer/path/costsize.c | 68 ++++
src/backend/optimizer/plan/createplan.c | 69 ++++
src/backend/optimizer/plan/setrefs.c | 11 +
src/backend/optimizer/plan/subselect.c | 1 +
src/backend/optimizer/util/pathnode.c | 22 ++
src/backend/parser/gram.y | 39 +-
src/backend/parser/parse_clause.c | 48 ++-
src/backend/parser/parse_func.c | 131 +++++++
src/backend/utils/Makefile | 3 +-
src/backend/utils/adt/ruleutils.c | 50 +++
src/backend/utils/cache/syscache.c | 23 ++
src/backend/utils/misc/sampling.c | 33 +-
src/backend/utils/tablesample/Makefile | 17 +
src/backend/utils/tablesample/bernoulli.c | 224 +++++++++++
src/backend/utils/tablesample/system.c | 185 +++++++++
src/include/access/heapam.h | 4 +
src/include/access/relscan.h | 1 +
src/include/catalog/indexing.h | 5 +
src/include/catalog/pg_proc.h | 25 ++
src/include/catalog/pg_tablesample_method.h | 77 ++++
src/include/executor/nodeSamplescan.h | 24 ++
src/include/nodes/execnodes.h | 18 +
src/include/nodes/nodes.h | 4 +
src/include/nodes/parsenodes.h | 36 ++
src/include/nodes/plannodes.h | 6 +
src/include/optimizer/cost.h | 1 +
src/include/optimizer/pathnode.h | 2 +
src/include/parser/kwlist.h | 3 +-
src/include/parser/parse_func.h | 4 +
src/include/utils/rel.h | 1 -
src/include/utils/sampling.h | 15 +-
src/include/utils/syscache.h | 2 +
src/include/utils/tablesample.h | 27 ++
src/test/regress/expected/sanity_check.out | 1 +
src/test/regress/expected/tablesample.out | 168 +++++++++
src/test/regress/parallel_schedule | 2 +-
src/test/regress/serial_schedule | 1 +
src/test/regress/sql/tablesample.sql | 42 +++
58 files changed, 2397 insertions(+), 39 deletions(-)
create mode 100644 src/backend/executor/nodeSamplescan.c
create mode 100644 src/backend/utils/tablesample/Makefile
create mode 100644 src/backend/utils/tablesample/bernoulli.c
create mode 100644 src/backend/utils/tablesample/system.c
create mode 100644 src/include/catalog/pg_tablesample_method.h
create mode 100644 src/include/executor/nodeSamplescan.h
create mode 100644 src/include/utils/tablesample.h
create mode 100644 src/test/regress/expected/tablesample.out
create mode 100644 src/test/regress/sql/tablesample.sql
diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index 249d541..6a813a3 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -1096,7 +1096,7 @@ file_acquire_sample_rows(Relation onerel, int elevel,
* Found a suitable tuple, so save it, replacing one old tuple
* at random
*/
- int k = (int) (targrows * sampler_random_fract());
+ int k = (int) (targrows * sampler_random_fract(rstate.randstate));
Assert(k >= 0 && k < targrows);
heap_freetuple(rows[k]);
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 59aaff7..5b2335f 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -2539,7 +2539,7 @@ analyze_row_processor(PGresult *res, int row, PgFdwAnalyzeState *astate)
if (astate->rowstoskip <= 0)
{
/* Choose a random reservoir element to replace. */
- pos = (int) (targrows * sampler_random_fract());
+ pos = (int) (targrows * sampler_random_fract(astate->rstate.randstate));
Assert(pos >= 0 && pos < targrows);
heap_freetuple(astate->rows[pos]);
}
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 2325962..0335abc 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -269,6 +269,11 @@
</row>
<row>
+ <entry><link linkend="catalog-pg-tablesample-method"><structname>pg_tablesample_method</structname></link></entry>
+ <entry>table sampling methods</entry>
+ </row>
+
+ <row>
<entry><link linkend="catalog-pg-tablespace"><structname>pg_tablespace</structname></link></entry>
<entry>tablespaces within this database cluster</entry>
</row>
@@ -5980,6 +5985,113 @@
</sect1>
+ <sect1 id="catalog-pg-tablesample-method">
+ <title><structname>pg_tabesample_method</structname></title>
+
+ <indexterm zone="catalog-pg-tablesample-method">
+ <primary>pg_am</primary>
+ </indexterm>
+
+ <para>
+ The catalog <structname>pg_tablesample_method</structname> stores
+ information about table sampling methods which can be used in
+ <command>TABLESAMPLE</command> clause of a <command>SELECT</command>
+ statement.
+ </para>
+
+ <table>
+ <title><structname>pg_tablesample_method</> Columns</title>
+
+ <tgroup cols="4">
+ <thead>
+ <row>
+ <entry>Name</entry>
+ <entry>Type</entry>
+ <entry>References</entry>
+ <entry>Description</entry>
+ </row>
+ </thead>
+ <tbody>
+
+ <row>
+ <entry><structfield>oid</structfield></entry>
+ <entry><type>oid</type></entry>
+ <entry></entry>
+ <entry>Row identifier (hidden attribute; must be explicitly selected)</entry>
+ </row>
+
+ <row>
+ <entry><structfield>tsmname</structfield></entry>
+ <entry><type>name</type></entry>
+ <entry></entry>
+ <entry>Name of the sampling method</entry>
+ </row>
+
+ <row>
+ <entry><structfield>tsmseqscan</structfield></entry>
+ <entry><type>bool</type></entry>
+ <entry></entry>
+ <entry>Does the sampling method scan the whole table sequentially?
+ </entry>
+ </row>
+
+ <row>
+ <entry><structfield>tsminit</structfield></entry>
+ <entry><type>regproc</type></entry>
+ <entry><literal><link linkend="catalog-pg-proc"><structname>pg_proc</structname></link>.oid</literal></entry>
+ <entry><quote>Initialize the sampling scan</quote> function</entry>
+ </row>
+
+ <row>
+ <entry><structfield>tsmnextblock</structfield></entry>
+ <entry><type>regproc</type></entry>
+ <entry><literal><link linkend="catalog-pg-proc"><structname>pg_proc</structname></link>.oid</literal></entry>
+ <entry><quote>Get next block number</quote> function</entry>
+ </row>
+
+ <row>
+ <entry><structfield>tsmnexttuple</structfield></entry>
+ <entry><type>regproc</type></entry>
+ <entry><literal><link linkend="catalog-pg-proc"><structname>pg_proc</structname></link>.oid</literal></entry>
+ <entry><quote>Get next tuple offset</quote> function</entry>
+ </row>
+
+ <row>
+ <entry><structfield>tsmexaminetuple</structfield></entry>
+ <entry><type>regproc</type></entry>
+ <entry><literal><link linkend="catalog-pg-proc"><structname>pg_proc</structname></link>.oid</literal></entry>
+ <entry>Function which examines the tuple contents and decides if to
+ return in, or zero if none</entry>
+ </row>
+
+ <row>
+ <entry><structfield>tsmend</structfield></entry>
+ <entry><type>regproc</type></entry>
+ <entry><literal><link linkend="catalog-pg-proc"><structname>pg_proc</structname></link>.oid</literal></entry>
+ <entry><quote>End the sampling scan</quote> function</entry>
+ </row>
+
+ <row>
+ <entry><structfield>tsmreset</structfield></entry>
+ <entry><type>regproc</type></entry>
+ <entry><literal><link linkend="catalog-pg-proc"><structname>pg_proc</structname></link>.oid</literal></entry>
+ <entry><quote>Restart the state of sampling scan</quote> function</entry>
+ </row>
+
+ <row>
+ <entry><structfield>tsmcost</structfield></entry>
+ <entry><type>regproc</type></entry>
+ <entry><literal><link linkend="catalog-pg-proc"><structname>pg_proc</structname></link>.oid</literal></entry>
+ <entry>Costing function</entry>
+ </row>
+
+ </tbody>
+ </tgroup>
+ </table>
+
+ </sect1>
+
+
<sect1 id="catalog-pg-tablespace">
<title><structname>pg_tablespace</structname></title>
diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml
index 01d24a5..407bf9d 100644
--- a/doc/src/sgml/ref/select.sgml
+++ b/doc/src/sgml/ref/select.sgml
@@ -49,7 +49,7 @@ SELECT [ ALL | DISTINCT [ ON ( <replaceable class="parameter">expression</replac
<phrase>where <replaceable class="parameter">from_item</replaceable> can be one of:</phrase>
- [ ONLY ] <replaceable class="parameter">table_name</replaceable> [ * ] [ [ AS ] <replaceable class="parameter">alias</replaceable> [ ( <replaceable class="parameter">column_alias</replaceable> [, ...] ) ] ]
+ [ ONLY ] <replaceable class="parameter">table_name</replaceable> [ * ] [ TABLESAMPLE <replaceable class="parameter">sampling_method</replaceable> ( <replaceable class="parameter">argument</replaceable> [, ...] ) [ REPEATABLE ( <replaceable class="parameter">seed</replaceable> ) ] ] [ [ AS ] <replaceable class="parameter">alias</replaceable> [ ( <replaceable class="parameter">column_alias</replaceable> [, ...] ) ] ]
[ LATERAL ] ( <replaceable class="parameter">select</replaceable> ) [ AS ] <replaceable class="parameter">alias</replaceable> [ ( <replaceable class="parameter">column_alias</replaceable> [, ...] ) ]
<replaceable class="parameter">with_query_name</replaceable> [ [ AS ] <replaceable class="parameter">alias</replaceable> [ ( <replaceable class="parameter">column_alias</replaceable> [, ...] ) ] ]
[ LATERAL ] <replaceable class="parameter">function_name</replaceable> ( [ <replaceable class="parameter">argument</replaceable> [, ...] ] )
@@ -317,6 +317,42 @@ TABLE [ ONLY ] <replaceable class="parameter">table_name</replaceable> [ * ]
</varlistentry>
<varlistentry>
+ <term>TABLESAMPLE <replaceable class="parameter">sampling_method</replaceable> ( <replaceable class="parameter">argument</replaceable> [, ...] ) [ REPEATABLE ( <replaceable class="parameter">seed</replaceable> ) ]</term>
+ <listitem>
+ <para>
+ Table sample clause after
+ <replaceable class="parameter">table_name</replaceable> indicates that
+ a <replaceable class="parameter">sampling_method</replaceable> should
+ be used to retrieve subset of rows in the table.
+ The <replaceable class="parameter">sampling_method</replaceable> can be
+ one of:
+ <itemizedlist>
+ <listitem>
+ <para><literal>SYSTEM</literal></para>
+ </listitem>
+ <listitem>
+ <para><literal>BERNOULLI</literal></para>
+ </listitem>
+ </itemizedlist>
+ Both of those sampling methods currently accept only single argument
+ which is the percent (floating point from 0 to 100) of the rows to
+ be returned.
+ The <literal>SYSTEM</literal> sampling method does block level
+ sampling with each block having same chance of being selected and
+ returns all rows from each selected block.
+ The <literal>BERNOULLI</literal> scans whole table and returns
+ individual rows with equal probability.
+ The optional numeric parameter <literal>REPEATABLE</literal> is used
+ as random seed for sampling. Note that subsequent commands may return
+ different results even if same <literal>REPEATABLE</literal> clause
+ was specified. This happens because <acronym>DML</acronym> statements
+ and maintenance operations such as <command>VACUUM</> affect physical
+ distribution of data.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><replaceable class="parameter">alias</replaceable></term>
<listitem>
<para>
diff --git a/src/backend/access/Makefile b/src/backend/access/Makefile
index 21721b4..238057a 100644
--- a/src/backend/access/Makefile
+++ b/src/backend/access/Makefile
@@ -8,6 +8,7 @@ subdir = src/backend/access
top_builddir = ../../..
include $(top_builddir)/src/Makefile.global
-SUBDIRS = brin common gin gist hash heap index nbtree rmgrdesc spgist transam
+SUBDIRS = brin common gin gist hash heap index nbtree rmgrdesc spgist \
+ transam
include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index cb6f8a3..76c2d3a 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -79,8 +79,9 @@ bool synchronize_seqscans = true;
static HeapScanDesc heap_beginscan_internal(Relation relation,
Snapshot snapshot,
int nkeys, ScanKey key,
- bool allow_strat, bool allow_sync,
- bool is_bitmapscan, bool temp_snap);
+ bool allow_strat, bool allow_sync, bool allow_pagemode,
+ bool is_bitmapscan, bool is_samplescan,
+ bool temp_snap);
static HeapTuple heap_prepare_insert(Relation relation, HeapTuple tup,
TransactionId xid, CommandId cid, int options);
static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
@@ -293,9 +294,10 @@ initscan(HeapScanDesc scan, ScanKey key, bool is_rescan)
/*
* Currently, we don't have a stats counter for bitmap heap scans (but the
- * underlying bitmap index scans will be counted).
+ * underlying bitmap index scans will be counted) or sample scans (we only
+ * update stats for tuple fetches there)
*/
- if (!scan->rs_bitmapscan)
+ if (!scan->rs_bitmapscan && !scan->rs_samplescan)
pgstat_count_heap_scan(scan->rs_rd);
}
@@ -314,7 +316,7 @@ heap_setscanlimits(HeapScanDesc scan, BlockNumber startBlk, BlockNumber numBlks)
* In page-at-a-time mode it performs additional work, namely determining
* which tuples on the page are visible.
*/
-static void
+void
heapgetpage(HeapScanDesc scan, BlockNumber page)
{
Buffer buffer;
@@ -1289,7 +1291,7 @@ heap_openrv_extended(const RangeVar *relation, LOCKMODE lockmode,
* heap_beginscan - begin relation scan
*
* heap_beginscan_strat offers an extended API that lets the caller control
- * whether a nondefault buffer access strategy can be used, and whether
+ * whether a nondefault buffer access strategy can be used and whether
* syncscan can be chosen (possibly resulting in the scan not starting from
* block zero). Both of these default to TRUE with plain heap_beginscan.
*
@@ -1297,6 +1299,9 @@ heap_openrv_extended(const RangeVar *relation, LOCKMODE lockmode,
* HeapScanDesc for a bitmap heap scan. Although that scan technology is
* really quite unlike a standard seqscan, there is just enough commonality
* to make it worth using the same data structure.
+ *
+ * heap_beginscan_ss is alternate entry point for setting up a
+ * HeapScanDesc for a TABLESAMPLE scan.
* ----------------
*/
HeapScanDesc
@@ -1304,7 +1309,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
int nkeys, ScanKey key)
{
return heap_beginscan_internal(relation, snapshot, nkeys, key,
- true, true, false, false);
+ true, true, true, false, false, false);
}
HeapScanDesc
@@ -1314,7 +1319,7 @@ heap_beginscan_catalog(Relation relation, int nkeys, ScanKey key)
Snapshot snapshot = RegisterSnapshot(GetCatalogSnapshot(relid));
return heap_beginscan_internal(relation, snapshot, nkeys, key,
- true, true, false, true);
+ true, true, true, false, false, true);
}
HeapScanDesc
@@ -1323,7 +1328,8 @@ heap_beginscan_strat(Relation relation, Snapshot snapshot,
bool allow_strat, bool allow_sync)
{
return heap_beginscan_internal(relation, snapshot, nkeys, key,
- allow_strat, allow_sync, false, false);
+ allow_strat, allow_sync, true,
+ false, false, false);
}
HeapScanDesc
@@ -1331,14 +1337,24 @@ heap_beginscan_bm(Relation relation, Snapshot snapshot,
int nkeys, ScanKey key)
{
return heap_beginscan_internal(relation, snapshot, nkeys, key,
- false, false, true, false);
+ false, false, true, true, false, false);
+}
+
+HeapScanDesc
+heap_beginscan_ss(Relation relation, Snapshot snapshot,
+ int nkeys, ScanKey key,
+ bool allow_strat, bool allow_pagemode)
+{
+ return heap_beginscan_internal(relation, snapshot, nkeys, key,
+ allow_strat, false, allow_pagemode,
+ false, true, false);
}
static HeapScanDesc
heap_beginscan_internal(Relation relation, Snapshot snapshot,
int nkeys, ScanKey key,
- bool allow_strat, bool allow_sync,
- bool is_bitmapscan, bool temp_snap)
+ bool allow_strat, bool allow_sync, bool allow_pagemode,
+ bool is_bitmapscan, bool is_samplescan, bool temp_snap)
{
HeapScanDesc scan;
@@ -1360,6 +1376,7 @@ heap_beginscan_internal(Relation relation, Snapshot snapshot,
scan->rs_snapshot = snapshot;
scan->rs_nkeys = nkeys;
scan->rs_bitmapscan = is_bitmapscan;
+ scan->rs_samplescan = is_samplescan;
scan->rs_strategy = NULL; /* set in initscan */
scan->rs_allow_strat = allow_strat;
scan->rs_allow_sync = allow_sync;
@@ -1368,7 +1385,7 @@ heap_beginscan_internal(Relation relation, Snapshot snapshot,
/*
* we can use page-at-a-time mode if it's an MVCC-safe snapshot
*/
- scan->rs_pageatatime = IsMVCCSnapshot(snapshot);
+ scan->rs_pageatatime = allow_pagemode && IsMVCCSnapshot(snapshot);
/*
* For a seqscan in a serializable transaction, acquire a predicate lock
diff --git a/src/backend/catalog/Makefile b/src/backend/catalog/Makefile
index a403c64..34db3e6 100644
--- a/src/backend/catalog/Makefile
+++ b/src/backend/catalog/Makefile
@@ -39,7 +39,7 @@ POSTGRES_BKI_SRCS = $(addprefix $(top_srcdir)/src/include/catalog/,\
pg_ts_config.h pg_ts_config_map.h pg_ts_dict.h \
pg_ts_parser.h pg_ts_template.h pg_extension.h \
pg_foreign_data_wrapper.h pg_foreign_server.h pg_user_mapping.h \
- pg_foreign_table.h pg_policy.h \
+ pg_foreign_table.h pg_policy.h pg_tablesample_method.h \
pg_default_acl.h pg_seclabel.h pg_shseclabel.h pg_collation.h pg_range.h \
toasting.h indexing.h \
)
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index fc9dd44..63feb07 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -1146,7 +1146,7 @@ acquire_sample_rows(Relation onerel, int elevel,
* Found a suitable tuple, so save it, replacing one
* old tuple at random
*/
- int k = (int) (targrows * sampler_random_fract());
+ int k = (int) (targrows * sampler_random_fract(rstate.randstate));
Assert(k >= 0 && k < targrows);
heap_freetuple(rows[k]);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index a951c55..20cc7a1 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -732,6 +732,7 @@ ExplainPreScanNode(PlanState *planstate, Bitmapset **rels_used)
case T_WorkTableScan:
case T_ForeignScan:
case T_CustomScan:
+ case T_SampleScan:
*rels_used = bms_add_member(*rels_used,
((Scan *) plan)->scanrelid);
break;
@@ -957,6 +958,9 @@ ExplainNode(PlanState *planstate, List *ancestors,
else
pname = sname;
break;
+ case T_SampleScan:
+ pname = sname = "Sample Scan";
+ break;
case T_Material:
pname = sname = "Materialize";
break;
@@ -1074,6 +1078,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
case T_WorkTableScan:
case T_ForeignScan:
case T_CustomScan:
+ case T_SampleScan:
ExplainScanTarget((Scan *) plan, es);
break;
case T_IndexScan:
@@ -1326,6 +1331,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
case T_CteScan:
case T_WorkTableScan:
case T_SubqueryScan:
+ case T_SampleScan:
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
@@ -2220,6 +2226,7 @@ ExplainTargetRel(Plan *plan, Index rti, ExplainState *es)
case T_TidScan:
case T_ForeignScan:
case T_CustomScan:
+ case T_SampleScan:
case T_ModifyTable:
/* Assert it's on a real relation */
Assert(rte->rtekind == RTE_RELATION);
diff --git a/src/backend/executor/Makefile b/src/backend/executor/Makefile
index af707b0..75f799c 100644
--- a/src/backend/executor/Makefile
+++ b/src/backend/executor/Makefile
@@ -21,7 +21,7 @@ OBJS = execAmi.o execCurrent.o execGrouping.o execJunk.o execMain.o \
nodeLimit.o nodeLockRows.o \
nodeMaterial.o nodeMergeAppend.o nodeMergejoin.o nodeModifyTable.o \
nodeNestloop.o nodeFunctionscan.o nodeRecursiveunion.o nodeResult.o \
- nodeSeqscan.o nodeSetOp.o nodeSort.o nodeUnique.o \
+ nodeSamplescan.o nodeSeqscan.o nodeSetOp.o nodeSort.o nodeUnique.o \
nodeValuesscan.o nodeCtescan.o nodeWorktablescan.o \
nodeGroup.o nodeSubplan.o nodeSubqueryscan.o nodeTidscan.o \
nodeForeignscan.o nodeWindowAgg.o tstoreReceiver.o spi.o
diff --git a/src/backend/executor/execAmi.c b/src/backend/executor/execAmi.c
index 6ebad2f..4948a26 100644
--- a/src/backend/executor/execAmi.c
+++ b/src/backend/executor/execAmi.c
@@ -39,6 +39,7 @@
#include "executor/nodeNestloop.h"
#include "executor/nodeRecursiveunion.h"
#include "executor/nodeResult.h"
+#include "executor/nodeSamplescan.h"
#include "executor/nodeSeqscan.h"
#include "executor/nodeSetOp.h"
#include "executor/nodeSort.h"
@@ -155,6 +156,10 @@ ExecReScan(PlanState *node)
ExecReScanSeqScan((SeqScanState *) node);
break;
+ case T_SampleScanState:
+ ExecReScanSampleScan((SampleScanState *) node);
+ break;
+
case T_IndexScanState:
ExecReScanIndexScan((IndexScanState *) node);
break;
@@ -480,6 +485,9 @@ ExecSupportsBackwardScan(Plan *node)
}
return false;
+ case T_SampleScan:
+ return false;
+
case T_Material:
case T_Sort:
/* these don't evaluate tlist */
diff --git a/src/backend/executor/execCurrent.c b/src/backend/executor/execCurrent.c
index 1c8be25..5cfe549 100644
--- a/src/backend/executor/execCurrent.c
+++ b/src/backend/executor/execCurrent.c
@@ -261,6 +261,7 @@ search_plan_tree(PlanState *node, Oid table_oid)
* Relation scan nodes can all be treated alike
*/
case T_SeqScanState:
+ case T_SampleScanState:
case T_IndexScanState:
case T_IndexOnlyScanState:
case T_BitmapHeapScanState:
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 9892499..03c2feb 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -102,6 +102,7 @@
#include "executor/nodeNestloop.h"
#include "executor/nodeRecursiveunion.h"
#include "executor/nodeResult.h"
+#include "executor/nodeSamplescan.h"
#include "executor/nodeSeqscan.h"
#include "executor/nodeSetOp.h"
#include "executor/nodeSort.h"
@@ -190,6 +191,11 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
estate, eflags);
break;
+ case T_SampleScan:
+ result = (PlanState *) ExecInitSampleScan((SampleScan *) node,
+ estate, eflags);
+ break;
+
case T_IndexScan:
result = (PlanState *) ExecInitIndexScan((IndexScan *) node,
estate, eflags);
@@ -406,6 +412,10 @@ ExecProcNode(PlanState *node)
result = ExecSeqScan((SeqScanState *) node);
break;
+ case T_SampleScanState:
+ result = ExecSampleScan((SampleScanState *) node);
+ break;
+
case T_IndexScanState:
result = ExecIndexScan((IndexScanState *) node);
break;
@@ -644,6 +654,10 @@ ExecEndNode(PlanState *node)
ExecEndSeqScan((SeqScanState *) node);
break;
+ case T_SampleScanState:
+ ExecEndSampleScan((SampleScanState *) node);
+ break;
+
case T_IndexScanState:
ExecEndIndexScan((IndexScanState *) node);
break;
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
new file mode 100644
index 0000000..c7951d1
--- /dev/null
+++ b/src/backend/executor/nodeSamplescan.c
@@ -0,0 +1,556 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeSamplescan.c
+ * Support routines for sample scans of relations (table sampling).
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ * src/backend/executor/nodeSamplescan.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/relscan.h"
+#include "catalog/pg_tablesample_method.h"
+#include "executor/executor.h"
+#include "executor/nodeSamplescan.h"
+#include "miscadmin.h"
+#include "parser/parsetree.h"
+#include "pgstat.h"
+#include "storage/bufmgr.h"
+#include "storage/predicate.h"
+#include "utils/rel.h"
+#include "utils/syscache.h"
+#include "utils/tqual.h"
+
+static void InitScanRelation(SampleScanState *node, EState *estate,
+ int eflags, TableSampleClause *tablesample);
+static TupleTableSlot *SampleNext(SampleScanState *node);
+static HeapTuple samplenexttup(SampleScanState *node, HeapScanDesc scan);
+
+
+/*
+ * Initialize the sampling method - loads function info and
+ * calls the tsminit function.
+ *
+ * We need special handling for this because the tsminit function
+ * is allowed to take optional additional arguments.
+ */
+static void
+InitSamplingMethod(SampleScanState *scanstate, TableSampleClause *tablesample)
+{
+ FunctionCallInfoData fcinfo;
+ int i;
+ List *args = tablesample->args;
+ ListCell *arg;
+ ExprContext *econtext = scanstate->ss.ps.ps_ExprContext;
+
+ /* Load functions */
+ fmgr_info(tablesample->tsminit, &(scanstate->tsminit));
+ fmgr_info(tablesample->tsmnextblock, &(scanstate->tsmnextblock));
+ fmgr_info(tablesample->tsmnexttuple, &(scanstate->tsmnexttuple));
+ if (OidIsValid(tablesample->tsmexaminetuple))
+ fmgr_info(tablesample->tsmexaminetuple, &(scanstate->tsmexaminetuple));
+ else
+ scanstate->tsmexaminetuple.fn_oid = InvalidOid;
+ fmgr_info(tablesample->tsmend, &(scanstate->tsmend));
+ fmgr_info(tablesample->tsmreset, &(scanstate->tsmreset));
+
+ InitFunctionCallInfoData(fcinfo, &scanstate->tsminit,
+ list_length(args) + 2,
+ InvalidOid, NULL, NULL);
+
+ /* First arg is always SampleScanState */
+ fcinfo.arg[0] = PointerGetDatum(scanstate);
+ fcinfo.argnull[0] = false;
+
+ /*
+ * Second arg is always REPEATABLE
+ * When tablesample->repeatable is NULL then REPEATABLE clause was not
+ * specified.
+ * When specified, the expression cannot evaluate to NULL.
+ */
+ if (tablesample->repeatable)
+ {
+ ExprState *argstate = ExecInitExpr((Expr *) tablesample->repeatable,
+ (PlanState *) scanstate);
+ fcinfo.arg[1] = ExecEvalExpr(argstate, econtext,
+ &fcinfo.argnull[1], NULL);
+ if (fcinfo.argnull[1])
+ ereport(ERROR,
+ (errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED),
+ errmsg("REPEATABLE clause cannot be NULL")));
+ }
+ else
+ {
+ fcinfo.arg[1] = UInt32GetDatum(random());
+ fcinfo.argnull[1] = false;
+ }
+
+
+ i = 2;
+ foreach(arg, args)
+ {
+ Expr *argexpr = (Expr *) lfirst(arg);
+ ExprState *argstate = ExecInitExpr(argexpr, (PlanState *) scanstate);
+
+ if (argstate == NULL)
+ {
+ fcinfo.argnull[i] = true;
+ fcinfo.arg[i] = (Datum) 0;;
+ }
+
+ fcinfo.arg[i] = ExecEvalExpr(argstate, econtext,
+ &fcinfo.argnull[i], NULL);
+ i++;
+ }
+ Assert(i == fcinfo.nargs);
+
+ (void) FunctionCallInvoke(&fcinfo);
+}
+
+
+/* ----------------------------------------------------------------
+ * Scan Support
+ * ----------------------------------------------------------------
+ */
+
+/* ----------------------------------------------------------------
+ * SampleNext
+ *
+ * This is a workhorse for ExecSampleScan
+ * ----------------------------------------------------------------
+ */
+static TupleTableSlot *
+SampleNext(SampleScanState *node)
+{
+ TupleTableSlot *slot;
+ HeapScanDesc scan;
+ HeapTuple tuple;
+
+ /*
+ * get information from the scan state
+ */
+ slot = node->ss.ss_ScanTupleSlot;
+ scan = node->ss.ss_currentScanDesc;
+
+ tuple = samplenexttup(node, scan);
+
+ if (tuple)
+ ExecStoreTuple(tuple, /* tuple to store */
+ slot, /* slot to store in */
+ scan->rs_cbuf, /* buffer associated with this tuple */
+ false); /* don't pfree this pointer */
+ else
+ ExecClearTuple(slot);
+
+ return slot;
+}
+
+/*
+ * Check visibility of the tuple.
+ */
+static bool
+SampleTupleVisible(HeapTuple tuple, OffsetNumber tupoffset, HeapScanDesc scan)
+{
+ /*
+ * If this scan is reading whole pages at a time, there is already
+ * visibilty info present in rs_vistuples so we can just search it
+ * for the tupoffset.
+ */
+ if (scan->rs_pageatatime)
+ {
+ int start = 0,
+ end = scan->rs_ntuples - 1;
+
+ /*
+ * Do the binary search over rs_vistuples, it's already sorted by
+ * OffsetNumber so we don't need to do any sorting ourselves here.
+ *
+ * We could use bsearch() here but it's slower for integers because
+ * of the function call overhead and because it needs boiler plate code
+ * it would not save us anything code-wise anyway.
+ */
+ while (start <= end)
+ {
+ int mid = start + (end - start) / 2;
+ OffsetNumber curoffset = scan->rs_vistuples[mid];
+
+ if (curoffset == tupoffset)
+ return true;
+ else if (curoffset > tupoffset)
+ end = mid - 1;
+ else
+ start = mid + 1;
+ }
+
+ return false;
+ }
+ else
+ {
+ /* No pagemode, we have to check the tuple itself. */
+ Snapshot snapshot = scan->rs_snapshot;
+ Buffer buffer = scan->rs_cbuf;
+
+ bool visible = HeapTupleSatisfiesVisibility(tuple, snapshot, buffer);
+
+ CheckForSerializableConflictOut(visible, scan->rs_rd, tuple, buffer,
+ snapshot);
+
+ return visible;
+ }
+}
+
+/*
+ * Read next tuple using the correct sampling method.
+ */
+static HeapTuple
+samplenexttup(SampleScanState *node, HeapScanDesc scan)
+{
+ HeapTuple tuple = &(scan->rs_ctup);
+ bool pagemode = scan->rs_pageatatime;
+ BlockNumber blockno;
+ Page page;
+ ItemId itemid;
+ OffsetNumber tupoffset,
+ maxoffset;
+
+ if (!scan->rs_inited)
+ {
+ /*
+ * return null immediately if relation is empty
+ */
+ if (scan->rs_nblocks == 0)
+ {
+ Assert(!BufferIsValid(scan->rs_cbuf));
+ tuple->t_data = NULL;
+ return NULL;
+ }
+ blockno = DatumGetInt32(FunctionCall1(&node->tsmnextblock,
+ PointerGetDatum(node)));
+ if (!BlockNumberIsValid(blockno))
+ {
+ tuple->t_data = NULL;
+ return NULL;
+ }
+
+ heapgetpage(scan, blockno);
+ scan->rs_inited = true;
+ }
+ else
+ {
+ /* continue from previously returned page/tuple */
+ blockno = scan->rs_cblock; /* current page */
+ }
+
+ /*
+ * When pagemode is disabled, the scan will do visibility checks for each
+ * tuple it finds so the buffer needs to be locked.
+ */
+ if (!pagemode)
+ LockBuffer(scan->rs_cbuf, BUFFER_LOCK_SHARE);
+
+ page = (Page) BufferGetPage(scan->rs_cbuf);
+ maxoffset = PageGetMaxOffsetNumber(page);
+
+ for (;;)
+ {
+ CHECK_FOR_INTERRUPTS();
+
+ tupoffset = DatumGetUInt16(FunctionCall3(&node->tsmnexttuple,
+ PointerGetDatum(node),
+ UInt32GetDatum(blockno),
+ UInt16GetDatum(maxoffset)));
+
+ if (OffsetNumberIsValid(tupoffset))
+ {
+ bool visible;
+ bool found;
+
+ /* Skip invalid tuple pointers. */
+ itemid = PageGetItemId(page, tupoffset);
+ if (!ItemIdIsNormal(itemid))
+ continue;
+
+ tuple->t_data = (HeapTupleHeader) PageGetItem((Page) page, itemid);
+ tuple->t_len = ItemIdGetLength(itemid);
+ ItemPointerSet(&(tuple->t_self), blockno, tupoffset);
+
+ visible = SampleTupleVisible(tuple, tupoffset, scan);
+
+ /*
+ * Let the sampling method examine the actual tuple and decide if we
+ * should return it.
+ *
+ * Note that we let it examine even invisible tuples for
+ * statistical purposes, but not return them since user should
+ * never see invisible tuples.
+ */
+ if (OidIsValid(node->tsmexaminetuple.fn_oid))
+ {
+ found = DatumGetBool(FunctionCall4(&node->tsmexaminetuple,
+ PointerGetDatum(node),
+ UInt32GetDatum(blockno),
+ PointerGetDatum(tuple),
+ BoolGetDatum(visible)));
+ /* Should not happen if sampling method is well written. */
+ if (found && !visible)
+ elog(ERROR, "Sampling method wanted to return invisible tuple");
+ }
+ else
+ found = visible;
+
+ /* Found visible tuple, return it. */
+ if (found)
+ {
+ if (!pagemode)
+ LockBuffer(scan->rs_cbuf, BUFFER_LOCK_UNLOCK);
+ break;
+ }
+ else
+ {
+ /* Try next tuple from same page. */
+ continue;
+ }
+ }
+
+
+ if (!pagemode)
+ LockBuffer(scan->rs_cbuf, BUFFER_LOCK_UNLOCK);
+
+ blockno = DatumGetInt32(FunctionCall1(&node->tsmnextblock,
+ PointerGetDatum(node)));
+
+ /*
+ * Report our new scan position for synchronization purposes. We
+ * don't do that when moving backwards, however. That would just
+ * mess up any other forward-moving scanners.
+ *
+ * Note: we do this before checking for end of scan so that the
+ * final state of the position hint is back at the start of the
+ * rel. That's not strictly necessary, but otherwise when you run
+ * the same query multiple times the starting position would shift
+ * a little bit backwards on every invocation, which is confusing.
+ * We don't guarantee any specific ordering in general, though.
+ */
+ if (scan->rs_syncscan)
+ ss_report_location(scan->rs_rd, BlockNumberIsValid(blockno) ?
+ blockno : scan->rs_startblock);
+
+ /*
+ * Reached end of scan.
+ */
+ if (!BlockNumberIsValid(blockno))
+ {
+ if (BufferIsValid(scan->rs_cbuf))
+ ReleaseBuffer(scan->rs_cbuf);
+ scan->rs_cbuf = InvalidBuffer;
+ scan->rs_cblock = InvalidBlockNumber;
+ tuple->t_data = NULL;
+ scan->rs_inited = false;
+ return NULL;
+ }
+
+ heapgetpage(scan, blockno);
+
+ if (!pagemode)
+ LockBuffer(scan->rs_cbuf, BUFFER_LOCK_SHARE);
+
+ page = (Page) BufferGetPage(scan->rs_cbuf);
+ maxoffset = PageGetMaxOffsetNumber(page);
+ }
+
+ pgstat_count_heap_getnext(scan->rs_rd);
+
+ return &(scan->rs_ctup);
+}
+
+/*
+ * SampleRecheck -- access method routine to recheck a tuple in EvalPlanQual
+ */
+static bool
+SampleRecheck(SampleScanState *node, TupleTableSlot *slot)
+{
+ /* No need to recheck for SampleScan */
+ return true;
+}
+
+/* ----------------------------------------------------------------
+ * ExecSampleScan(node)
+ *
+ * Scans the relation using the sampling method and returns
+ * the next qualifying tuple.
+ * We call the ExecScan() routine and pass it the appropriate
+ * access method functions.
+ * ----------------------------------------------------------------
+ */
+TupleTableSlot *
+ExecSampleScan(SampleScanState *node)
+{
+ return ExecScan((ScanState *) node,
+ (ExecScanAccessMtd) SampleNext,
+ (ExecScanRecheckMtd) SampleRecheck);
+}
+
+/* ----------------------------------------------------------------
+ * InitScanRelation
+ *
+ * Set up to access the scan relation.
+ * ----------------------------------------------------------------
+ */
+static void
+InitScanRelation(SampleScanState *node, EState *estate, int eflags,
+ TableSampleClause *tablesample)
+{
+ Relation currentRelation;
+
+ /*
+ * get the relation object id from the relid'th entry in the range table,
+ * open that relation and acquire appropriate lock on it.
+ */
+ currentRelation = ExecOpenScanRelation(estate,
+ ((SampleScan *) node->ss.ps.plan)->scanrelid,
+ eflags);
+
+ node->ss.ss_currentRelation = currentRelation;
+
+ /*
+ * Even though we aren't going to do a conventional seqscan, it is useful
+ * to create a HeapScanDesc --- many of the fields in it are usable.
+ */
+ node->ss.ss_currentScanDesc =
+ heap_beginscan_ss(currentRelation, estate->es_snapshot, 0, NULL,
+ tablesample->tsmseqscan, tablesample->tsmpagemode);
+
+ /* and report the scan tuple slot's rowtype */
+ ExecAssignScanType(&node->ss, RelationGetDescr(currentRelation));
+}
+
+
+/* ----------------------------------------------------------------
+ * ExecInitSampleScan
+ * ----------------------------------------------------------------
+ */
+SampleScanState *
+ExecInitSampleScan(SampleScan *node, EState *estate, int eflags)
+{
+ SampleScanState *scanstate;
+ RangeTblEntry *rte = rt_fetch(node->scanrelid,
+ estate->es_range_table);
+
+ Assert(outerPlan(node) == NULL);
+ Assert(innerPlan(node) == NULL);
+ Assert(rte->tablesample != NULL);
+
+ /*
+ * create state structure
+ */
+ scanstate = makeNode(SampleScanState);
+ scanstate->ss.ps.plan = (Plan *) node;
+ scanstate->ss.ps.state = estate;
+
+ /*
+ * Miscellaneous initialization
+ *
+ * create expression context for node
+ */
+ ExecAssignExprContext(estate, &scanstate->ss.ps);
+
+ /*
+ * initialize child expressions
+ */
+ scanstate->ss.ps.targetlist = (List *)
+ ExecInitExpr((Expr *) node->plan.targetlist,
+ (PlanState *) scanstate);
+ scanstate->ss.ps.qual = (List *)
+ ExecInitExpr((Expr *) node->plan.qual,
+ (PlanState *) scanstate);
+
+ /*
+ * tuple table initialization
+ */
+ ExecInitResultTupleSlot(estate, &scanstate->ss.ps);
+ ExecInitScanTupleSlot(estate, &scanstate->ss);
+
+ /*
+ * initialize scan relation
+ */
+ InitScanRelation(scanstate, estate, eflags, rte->tablesample);
+
+ scanstate->ss.ps.ps_TupFromTlist = false;
+
+ /*
+ * Initialize result tuple type and projection info.
+ */
+ ExecAssignResultTypeFromTL(&scanstate->ss.ps);
+ ExecAssignScanProjectionInfo(&scanstate->ss);
+
+ InitSamplingMethod(scanstate, rte->tablesample);
+
+ return scanstate;
+}
+
+/* ----------------------------------------------------------------
+ * ExecEndSampleScan
+ *
+ * frees any storage allocated through C routines.
+ * ----------------------------------------------------------------
+ */
+void
+ExecEndSampleScan(SampleScanState *node)
+{
+ /*
+ * Tell sampling function that we finished thes can.
+ */
+ FunctionCall1(&node->tsmend, PointerGetDatum(node));
+
+ /*
+ * Free the exprcontext
+ */
+ ExecFreeExprContext(&node->ss.ps);
+
+ /*
+ * clean out the tuple table
+ */
+ ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
+
+ /*
+ * close heap scan
+ */
+ heap_endscan(node->ss.ss_currentScanDesc);
+
+ /*
+ * close the heap relation.
+ */
+ ExecCloseScanRelation(node->ss.ss_currentRelation);
+}
+
+/* ----------------------------------------------------------------
+ * Join Support
+ * ----------------------------------------------------------------
+ */
+
+/* ----------------------------------------------------------------
+ * ExecReScanSampleScan
+ *
+ * Rescans the relation.
+ *
+ * ----------------------------------------------------------------
+ */
+void
+ExecReScanSampleScan(SampleScanState *node)
+{
+ heap_rescan(node->ss.ss_currentScanDesc, NULL);
+
+ /*
+ * Tell sampling function to reset its state for rescan.
+ */
+ FunctionCall1(&node->tsmreset, PointerGetDatum(node));
+
+ ExecScanReScan(&node->ss);
+}
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 291e6a7..24be3ad 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -630,6 +630,22 @@ _copyCustomScan(const CustomScan *from)
}
/*
+ * _copySampleScan
+ */
+static SampleScan *
+_copySampleScan(const SampleScan *from)
+{
+ SampleScan *newnode = makeNode(SampleScan);
+
+ /*
+ * copy node superclass fields
+ */
+ CopyScanFields((const Scan *) from, (Scan *) newnode);
+
+ return newnode;
+}
+
+/*
* CopyJoinFields
*
* This function copies the fields of the Join node. It is used by
@@ -2013,6 +2029,7 @@ _copyRangeTblEntry(const RangeTblEntry *from)
COPY_SCALAR_FIELD(rtekind);
COPY_SCALAR_FIELD(relid);
COPY_SCALAR_FIELD(relkind);
+ COPY_NODE_FIELD(tablesample);
COPY_NODE_FIELD(subquery);
COPY_SCALAR_FIELD(security_barrier);
COPY_SCALAR_FIELD(jointype);
@@ -2145,6 +2162,40 @@ _copyCommonTableExpr(const CommonTableExpr *from)
return newnode;
}
+static RangeTableSample *
+_copyRangeTableSample(const RangeTableSample *from)
+{
+ RangeTableSample *newnode = makeNode(RangeTableSample);
+
+ COPY_NODE_FIELD(relation);
+ COPY_STRING_FIELD(method);
+ COPY_NODE_FIELD(repeatable);
+ COPY_NODE_FIELD(args);
+
+ return newnode;
+}
+
+static TableSampleClause *
+_copyTableSampleClause(const TableSampleClause *from)
+{
+ TableSampleClause *newnode = makeNode(TableSampleClause);
+
+ COPY_SCALAR_FIELD(tsmid);
+ COPY_SCALAR_FIELD(tsmseqscan);
+ COPY_SCALAR_FIELD(tsmpagemode);
+ COPY_SCALAR_FIELD(tsminit);
+ COPY_SCALAR_FIELD(tsmnextblock);
+ COPY_SCALAR_FIELD(tsmnexttuple);
+ COPY_SCALAR_FIELD(tsmexaminetuple);
+ COPY_SCALAR_FIELD(tsmend);
+ COPY_SCALAR_FIELD(tsmreset);
+ COPY_SCALAR_FIELD(tsmcost);
+ COPY_NODE_FIELD(repeatable);
+ COPY_NODE_FIELD(args);
+
+ return newnode;
+}
+
static A_Expr *
_copyAExpr(const A_Expr *from)
{
@@ -4086,6 +4137,9 @@ copyObject(const void *from)
case T_CustomScan:
retval = _copyCustomScan(from);
break;
+ case T_SampleScan:
+ retval = _copySampleScan(from);
+ break;
case T_Join:
retval = _copyJoin(from);
break;
@@ -4734,6 +4788,12 @@ copyObject(const void *from)
case T_CommonTableExpr:
retval = _copyCommonTableExpr(from);
break;
+ case T_RangeTableSample:
+ retval = _copyRangeTableSample(from);
+ break;
+ case T_TableSampleClause:
+ retval = _copyTableSampleClause(from);
+ break;
case T_FuncWithArgs:
retval = _copyFuncWithArgs(from);
break;
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index fcd58ad..8ab3b0b 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -2322,6 +2322,7 @@ _equalRangeTblEntry(const RangeTblEntry *a, const RangeTblEntry *b)
COMPARE_SCALAR_FIELD(rtekind);
COMPARE_SCALAR_FIELD(relid);
COMPARE_SCALAR_FIELD(relkind);
+ COMPARE_NODE_FIELD(tablesample);
COMPARE_NODE_FIELD(subquery);
COMPARE_SCALAR_FIELD(security_barrier);
COMPARE_SCALAR_FIELD(jointype);
@@ -2441,6 +2442,36 @@ _equalCommonTableExpr(const CommonTableExpr *a, const CommonTableExpr *b)
}
static bool
+_equalRangeTableSample(const RangeTableSample *a, const RangeTableSample *b)
+{
+ COMPARE_NODE_FIELD(relation);
+ COMPARE_STRING_FIELD(method);
+ COMPARE_NODE_FIELD(repeatable);
+ COMPARE_NODE_FIELD(args);
+
+ return true;
+}
+
+static bool
+_equalTableSampleClause(const TableSampleClause *a, const TableSampleClause *b)
+{
+ COMPARE_SCALAR_FIELD(tsmid);
+ COMPARE_SCALAR_FIELD(tsmseqscan);
+ COMPARE_SCALAR_FIELD(tsmpagemode);
+ COMPARE_SCALAR_FIELD(tsminit);
+ COMPARE_SCALAR_FIELD(tsmnextblock);
+ COMPARE_SCALAR_FIELD(tsmnexttuple);
+ COMPARE_SCALAR_FIELD(tsmexaminetuple);
+ COMPARE_SCALAR_FIELD(tsmend);
+ COMPARE_SCALAR_FIELD(tsmreset);
+ COMPARE_SCALAR_FIELD(tsmcost);
+ COMPARE_NODE_FIELD(repeatable);
+ COMPARE_NODE_FIELD(args);
+
+ return true;
+}
+
+static bool
_equalXmlSerialize(const XmlSerialize *a, const XmlSerialize *b)
{
COMPARE_SCALAR_FIELD(xmloption);
@@ -3159,6 +3190,12 @@ equal(const void *a, const void *b)
case T_CommonTableExpr:
retval = _equalCommonTableExpr(a, b);
break;
+ case T_RangeTableSample:
+ retval = _equalRangeTableSample(a, b);
+ break;
+ case T_TableSampleClause:
+ retval = _equalTableSampleClause(a, b);
+ break;
case T_FuncWithArgs:
retval = _equalFuncWithArgs(a, b);
break;
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index d6f1f5b..7742189 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -3219,6 +3219,18 @@ raw_expression_tree_walker(Node *node,
return walker(((WithClause *) node)->ctes, context);
case T_CommonTableExpr:
return walker(((CommonTableExpr *) node)->ctequery, context);
+ case T_RangeTableSample:
+ {
+ RangeTableSample *rts = (RangeTableSample *) node;
+
+ if (walker(rts->relation, context))
+ return true;
+ if (walker(rts->repeatable, context))
+ return true;
+ if (walker(rts->args, context))
+ return true;
+ }
+ break;
default:
elog(ERROR, "unrecognized node type: %d",
(int) nodeTag(node));
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index fc418fc..c087495 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -580,6 +580,14 @@ _outCustomScan(StringInfo str, const CustomScan *node)
}
static void
+_outSampleScan(StringInfo str, const SampleScan *node)
+{
+ WRITE_NODE_TYPE("SAMPLESCAN");
+
+ _outScanInfo(str, (const Scan *) node);
+}
+
+static void
_outJoin(StringInfo str, const Join *node)
{
WRITE_NODE_TYPE("JOIN");
@@ -2402,6 +2410,36 @@ _outCommonTableExpr(StringInfo str, const CommonTableExpr *node)
}
static void
+_outRangeTableSample(StringInfo str, const RangeTableSample *node)
+{
+ WRITE_NODE_TYPE("RANGETABLESAMPLE");
+
+ WRITE_NODE_FIELD(relation);
+ WRITE_STRING_FIELD(method);
+ WRITE_NODE_FIELD(repeatable);
+ WRITE_NODE_FIELD(args);
+}
+
+static void
+_outTableSampleClause(StringInfo str, const TableSampleClause *node)
+{
+ WRITE_NODE_TYPE("TABLESAMPLECLAUSE");
+
+ WRITE_OID_FIELD(tsmid);
+ WRITE_BOOL_FIELD(tsmseqscan);
+ WRITE_BOOL_FIELD(tsmpagemode);
+ WRITE_OID_FIELD(tsminit);
+ WRITE_OID_FIELD(tsmnextblock);
+ WRITE_OID_FIELD(tsmnexttuple);
+ WRITE_OID_FIELD(tsmexaminetuple);
+ WRITE_OID_FIELD(tsmend);
+ WRITE_OID_FIELD(tsmreset);
+ WRITE_OID_FIELD(tsmcost);
+ WRITE_NODE_FIELD(repeatable);
+ WRITE_NODE_FIELD(args);
+}
+
+static void
_outSetOperationStmt(StringInfo str, const SetOperationStmt *node)
{
WRITE_NODE_TYPE("SETOPERATIONSTMT");
@@ -2431,6 +2469,7 @@ _outRangeTblEntry(StringInfo str, const RangeTblEntry *node)
case RTE_RELATION:
WRITE_OID_FIELD(relid);
WRITE_CHAR_FIELD(relkind);
+ WRITE_NODE_FIELD(tablesample);
break;
case RTE_SUBQUERY:
WRITE_NODE_FIELD(subquery);
@@ -2929,6 +2968,9 @@ _outNode(StringInfo str, const void *obj)
case T_CustomScan:
_outCustomScan(str, obj);
break;
+ case T_SampleScan:
+ _outSampleScan(str, obj);
+ break;
case T_Join:
_outJoin(str, obj);
break;
@@ -3270,6 +3312,12 @@ _outNode(StringInfo str, const void *obj)
case T_CommonTableExpr:
_outCommonTableExpr(str, obj);
break;
+ case T_RangeTableSample:
+ _outRangeTableSample(str, obj);
+ break;
+ case T_TableSampleClause:
+ _outTableSampleClause(str, obj);
+ break;
case T_SetOperationStmt:
_outSetOperationStmt(str, obj);
break;
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 563209c..05ed9a8 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -350,6 +350,46 @@ _readCommonTableExpr(void)
}
/*
+ * _readRangeTableSample
+ */
+static RangeTableSample *
+_readRangeTableSample(void)
+{
+ READ_LOCALS(RangeTableSample);
+
+ READ_NODE_FIELD(relation);
+ READ_STRING_FIELD(method);
+ READ_NODE_FIELD(repeatable);
+ READ_NODE_FIELD(args);
+
+ READ_DONE();
+}
+
+/*
+ * _readTableSampleClause
+ */
+static TableSampleClause *
+_readTableSampleClause(void)
+{
+ READ_LOCALS(TableSampleClause);
+
+ READ_OID_FIELD(tsmid);
+ READ_BOOL_FIELD(tsmseqscan);
+ READ_BOOL_FIELD(tsmpagemode);
+ READ_OID_FIELD(tsminit);
+ READ_OID_FIELD(tsmnextblock);
+ READ_OID_FIELD(tsmnexttuple);
+ READ_OID_FIELD(tsmexaminetuple);
+ READ_OID_FIELD(tsmend);
+ READ_OID_FIELD(tsmreset);
+ READ_OID_FIELD(tsmcost);
+ READ_NODE_FIELD(repeatable);
+ READ_NODE_FIELD(args);
+
+ READ_DONE();
+}
+
+/*
* _readSetOperationStmt
*/
static SetOperationStmt *
@@ -1218,6 +1258,7 @@ _readRangeTblEntry(void)
case RTE_RELATION:
READ_OID_FIELD(relid);
READ_CHAR_FIELD(relkind);
+ READ_NODE_FIELD(tablesample);
break;
case RTE_SUBQUERY:
READ_NODE_FIELD(subquery);
@@ -1313,6 +1354,10 @@ parseNodeString(void)
return_value = _readRowMarkClause();
else if (MATCH("COMMONTABLEEXPR", 15))
return_value = _readCommonTableExpr();
+ else if (MATCH("RANGETABLESAMPLE", 16))
+ return_value = _readRangeTableSample();
+ else if (MATCH("TABLESAMPLECLAUSE", 17))
+ return_value = _readTableSampleClause();
else if (MATCH("SETOPERATIONSTMT", 16))
return_value = _readSetOperationStmt();
else if (MATCH("ALIAS", 5))
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 58d78e6..5f12477 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -71,6 +71,10 @@ static void set_plain_rel_size(PlannerInfo *root, RelOptInfo *rel,
RangeTblEntry *rte);
static void set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
RangeTblEntry *rte);
+static void set_tablesample_rel_size(PlannerInfo *root, RelOptInfo *rel,
+ RangeTblEntry *rte);
+static void set_tablesample_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
+ RangeTblEntry *rte);
static void set_foreign_size(PlannerInfo *root, RelOptInfo *rel,
RangeTblEntry *rte);
static void set_foreign_pathlist(PlannerInfo *root, RelOptInfo *rel,
@@ -265,6 +269,11 @@ set_rel_size(PlannerInfo *root, RelOptInfo *rel,
/* Foreign table */
set_foreign_size(root, rel, rte);
}
+ else if (rte->tablesample != NULL)
+ {
+ /* Sampled relation */
+ set_tablesample_rel_size(root, rel, rte);
+ }
else
{
/* Plain relation */
@@ -332,6 +341,11 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
/* Foreign table */
set_foreign_pathlist(root, rel, rte);
}
+ else if (rte->tablesample != NULL)
+ {
+ /* Build sample scan on relation */
+ set_tablesample_rel_pathlist(root, rel, rte);
+ }
else
{
/* Plain relation */
@@ -418,6 +432,41 @@ set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
}
/*
+ * set_tablesample_rel_size
+ * Set size estimates for a sampled relation.
+ */
+static void
+set_tablesample_rel_size(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
+{
+ /* Mark rel with estimated output rows, width, etc */
+ set_baserel_size_estimates(root, rel);
+}
+
+/*
+ * set_tablesample_rel_pathlist
+ * Build access paths for a sampled relation
+ *
+ * There is only one possible path - sampling scan
+ */
+static void
+set_tablesample_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
+{
+ Relids required_outer;
+ Path *path;
+
+ /*
+ * We don't support pushing join clauses into the quals of a seqscan, but
+ * it could still have required parameterization due to LATERAL refs in
+ * its tlist.
+ */
+ required_outer = rel->lateral_relids;
+
+ /* We only do sample scan if it was requested */
+ path = create_samplescan_path(root, rel, required_outer);
+ rel->pathlist = list_make1(path);
+}
+
+/*
* set_foreign_size
* Set size estimates for a foreign table RTE
*/
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 1a0d358..51583a1 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -90,6 +90,7 @@
#include "utils/lsyscache.h"
#include "utils/selfuncs.h"
#include "utils/spccache.h"
+#include "utils/tablesample.h"
#include "utils/tuplesort.h"
@@ -220,6 +221,73 @@ cost_seqscan(Path *path, PlannerInfo *root,
}
/*
+ * cost_samplescan
+ * Determines and returns the cost of scanning a relation using sampling.
+ *
+ * From planner/optimizer perspective, we dont't care all that much about cost
+ * itself since there is always only one scan path to consider when sampling
+ * scan is present, but number of rows estimation is still important.
+ *
+ * 'baserel' is the relation to be scanned
+ * 'param_info' is the ParamPathInfo if this is a parameterized path, else NULL
+ */
+void
+cost_samplescan(Path *path, PlannerInfo *root, RelOptInfo *baserel)
+{
+ Cost startup_cost = 0;
+ Cost run_cost = 0;
+ double spc_seq_page_cost,
+ spc_random_page_cost,
+ spc_page_cost;
+ QualCost qpqual_cost;
+ Cost cpu_per_tuple;
+ BlockNumber pages;
+ double tuples;
+ RangeTblEntry *rte = planner_rt_fetch(baserel->relid, root);
+ TableSampleClause *tablesample = rte->tablesample;
+
+ /* Should only be applied to base relations */
+ Assert(baserel->relid > 0);
+ Assert(baserel->rtekind == RTE_RELATION);
+
+ /* Mark the path with the correct row estimate */
+ if (path->param_info)
+ path->rows = path->param_info->ppi_rows;
+ else
+ path->rows = baserel->rows;
+
+ /* Call the sampling method's costing function. */
+ OidFunctionCall6(tablesample->tsmcost, PointerGetDatum(root),
+ PointerGetDatum(path), PointerGetDatum(baserel),
+ PointerGetDatum(tablesample->args),
+ PointerGetDatum(&pages), PointerGetDatum(&tuples));
+
+ /* fetch estimated page cost for tablespace containing table */
+ get_tablespace_page_costs(baserel->reltablespace,
+ &spc_random_page_cost,
+ &spc_seq_page_cost);
+
+
+ spc_page_cost = tablesample->tsmseqscan ? spc_seq_page_cost :
+ spc_random_page_cost;
+
+ /*
+ * disk costs
+ */
+ run_cost += spc_page_cost * pages;
+
+ /* CPU costs */
+ get_restriction_qual_cost(root, baserel, path->param_info, &qpqual_cost);
+
+ startup_cost += qpqual_cost.startup;
+ cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple;
+ run_cost += cpu_per_tuple * tuples;
+
+ path->startup_cost = startup_cost;
+ path->total_cost = startup_cost + run_cost;
+}
+
+/*
* cost_index
* Determines and returns the cost of scanning a relation using an index.
*
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index cb69c03..3fc84e2 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -58,6 +58,8 @@ static Material *create_material_plan(PlannerInfo *root, MaterialPath *best_path
static Plan *create_unique_plan(PlannerInfo *root, UniquePath *best_path);
static SeqScan *create_seqscan_plan(PlannerInfo *root, Path *best_path,
List *tlist, List *scan_clauses);
+static SampleScan *create_samplescan_plan(PlannerInfo *root, Path *best_path,
+ List *tlist, List *scan_clauses);
static Scan *create_indexscan_plan(PlannerInfo *root, IndexPath *best_path,
List *tlist, List *scan_clauses, bool indexonly);
static BitmapHeapScan *create_bitmap_scan_plan(PlannerInfo *root,
@@ -100,6 +102,7 @@ static List *order_qual_clauses(PlannerInfo *root, List *clauses);
static void copy_path_costsize(Plan *dest, Path *src);
static void copy_plan_costsize(Plan *dest, Plan *src);
static SeqScan *make_seqscan(List *qptlist, List *qpqual, Index scanrelid);
+static SampleScan *make_samplescan(List *qptlist, List *qpqual, Index scanrelid);
static IndexScan *make_indexscan(List *qptlist, List *qpqual, Index scanrelid,
Oid indexid, List *indexqual, List *indexqualorig,
List *indexorderby, List *indexorderbyorig,
@@ -228,6 +231,7 @@ create_plan_recurse(PlannerInfo *root, Path *best_path)
switch (best_path->pathtype)
{
case T_SeqScan:
+ case T_SampleScan:
case T_IndexScan:
case T_IndexOnlyScan:
case T_BitmapHeapScan:
@@ -343,6 +347,13 @@ create_scan_plan(PlannerInfo *root, Path *best_path)
scan_clauses);
break;
+ case T_SampleScan:
+ plan = (Plan *) create_samplescan_plan(root,
+ best_path,
+ tlist,
+ scan_clauses);
+ break;
+
case T_IndexScan:
plan = (Plan *) create_indexscan_plan(root,
(IndexPath *) best_path,
@@ -546,6 +557,7 @@ disuse_physical_tlist(PlannerInfo *root, Plan *plan, Path *path)
switch (path->pathtype)
{
case T_SeqScan:
+ case T_SampleScan:
case T_IndexScan:
case T_IndexOnlyScan:
case T_BitmapHeapScan:
@@ -1133,6 +1145,45 @@ create_seqscan_plan(PlannerInfo *root, Path *best_path,
}
/*
+ * create_samplescan_plan
+ * Returns a samplecan plan for the base relation scanned by 'best_path'
+ * with restriction clauses 'scan_clauses' and targetlist 'tlist'.
+ */
+static SampleScan *
+create_samplescan_plan(PlannerInfo *root, Path *best_path,
+ List *tlist, List *scan_clauses)
+{
+ SampleScan *scan_plan;
+ Index scan_relid = best_path->parent->relid;
+
+ /* it should be a base rel with tablesample clause... */
+ Assert(scan_relid > 0);
+ Assert(best_path->parent->rtekind == RTE_RELATION);
+ Assert(best_path->pathtype == T_SampleScan);
+
+ /* Sort clauses into best execution order */
+ scan_clauses = order_qual_clauses(root, scan_clauses);
+
+ /* Reduce RestrictInfo list to bare expressions; ignore pseudoconstants */
+ scan_clauses = extract_actual_clauses(scan_clauses, false);
+
+ /* Replace any outer-relation variables with nestloop params */
+ if (best_path->param_info)
+ {
+ scan_clauses = (List *)
+ replace_nestloop_params(root, (Node *) scan_clauses);
+ }
+
+ scan_plan = make_samplescan(tlist,
+ scan_clauses,
+ scan_relid);
+
+ copy_path_costsize(&scan_plan->plan, best_path);
+
+ return scan_plan;
+}
+
+/*
* create_indexscan_plan
* Returns an indexscan plan for the base relation scanned by 'best_path'
* with restriction clauses 'scan_clauses' and targetlist 'tlist'.
@@ -3321,6 +3372,24 @@ make_seqscan(List *qptlist,
return node;
}
+static SampleScan *
+make_samplescan(List *qptlist,
+ List *qpqual,
+ Index scanrelid)
+{
+ SampleScan *node = makeNode(SampleScan);
+ Plan *plan = &node->plan;
+
+ /* cost should be inserted by caller */
+ plan->targetlist = qptlist;
+ plan->qual = qpqual;
+ plan->lefttree = NULL;
+ plan->righttree = NULL;
+ node->scanrelid = scanrelid;
+
+ return node;
+}
+
static IndexScan *
make_indexscan(List *qptlist,
List *qpqual,
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index ec828cd..82771dc 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -445,6 +445,17 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
fix_scan_list(root, splan->plan.qual, rtoffset);
}
break;
+ case T_SampleScan:
+ {
+ SampleScan *splan = (SampleScan *) plan;
+
+ splan->scanrelid += rtoffset;
+ splan->plan.targetlist =
+ fix_scan_list(root, splan->plan.targetlist, rtoffset);
+ splan->plan.qual =
+ fix_scan_list(root, splan->plan.qual, rtoffset);
+ }
+ break;
case T_IndexScan:
{
IndexScan *splan = (IndexScan *) plan;
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index acfd0bc..9971b54 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -2167,6 +2167,7 @@ finalize_plan(PlannerInfo *root, Plan *plan, Bitmapset *valid_params,
break;
case T_SeqScan:
+ case T_SampleScan:
context.paramids = bms_add_members(context.paramids, scan_params);
break;
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index faca30b..ea7a47b 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -706,6 +706,26 @@ create_seqscan_path(PlannerInfo *root, RelOptInfo *rel, Relids required_outer)
}
/*
+ * create_samplescan_path
+ * Like seqscan but uses sampling function while scanning.
+ */
+Path *
+create_samplescan_path(PlannerInfo *root, RelOptInfo *rel, Relids required_outer)
+{
+ Path *pathnode = makeNode(Path);
+
+ pathnode->pathtype = T_SampleScan;
+ pathnode->parent = rel;
+ pathnode->param_info = get_baserel_parampathinfo(root, rel,
+ required_outer);
+ pathnode->pathkeys = NIL; /* samplescan has unordered result */
+
+ cost_samplescan(pathnode, root, rel);
+
+ return pathnode;
+}
+
+/*
* create_index_path
* Creates a path node for an index scan.
*
@@ -1778,6 +1798,8 @@ reparameterize_path(PlannerInfo *root, Path *path,
case T_SubqueryScan:
return create_subqueryscan_path(root, rel, path->pathkeys,
required_outer);
+ case T_SampleScan:
+ return (Path *) create_samplescan_path(root, rel, required_outer);
default:
break;
}
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index cf0d317..41f71d6 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -448,6 +448,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
%type <range> relation_expr
%type <range> relation_expr_opt_alias
%type <target> target_el single_set_clause set_target insert_column_item
+%type <node> relation_expr_tablesample tablesample_clause opt_repeatable_clause
%type <str> generic_option_name
%type <node> generic_option_arg
@@ -615,8 +616,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
STATEMENT STATISTICS STDIN STDOUT STORAGE STRICT_P STRIP_P SUBSTRING
SYMMETRIC SYSID SYSTEM_P
- TABLE TABLES TABLESPACE TEMP TEMPLATE TEMPORARY TEXT_P THEN TIME TIMESTAMP
- TO TRAILING TRANSACTION TREAT TRIGGER TRIM TRUE_P
+ TABLE TABLES TABLESAMPLE TABLESPACE TEMP TEMPLATE TEMPORARY TEXT_P THEN
+ TIME TIMESTAMP TO TRAILING TRANSACTION TREAT TRIGGER TRIM TRUE_P
TRUNCATE TRUSTED TYPE_P TYPES_P
UNBOUNDED UNCOMMITTED UNENCRYPTED UNION UNIQUE UNKNOWN UNLISTEN UNLOGGED
@@ -10223,6 +10224,12 @@ table_ref: relation_expr opt_alias_clause
$1->alias = $2;
$$ = (Node *) $1;
}
+ | relation_expr_tablesample opt_alias_clause
+ {
+ RangeTableSample *n = (RangeTableSample *) $1;
+ n->relation->alias = $2;
+ $$ = (Node *) n;
+ }
| func_table func_alias_clause
{
RangeFunction *n = (RangeFunction *) $1;
@@ -10548,6 +10555,31 @@ relation_expr_opt_alias: relation_expr %prec UMINUS
}
;
+
+relation_expr_tablesample: relation_expr tablesample_clause
+ {
+ RangeTableSample *n = (RangeTableSample *) $2;
+ n->relation = $1;
+ $$ = (Node *) n;
+ }
+ ;
+
+tablesample_clause:
+ TABLESAMPLE ColId '(' func_arg_list ')' opt_repeatable_clause
+ {
+ RangeTableSample *n = makeNode(RangeTableSample);
+ n->method = $2;
+ n->args = $4;
+ n->repeatable = $6;
+ $$ = (Node *) n;
+ }
+ ;
+
+opt_repeatable_clause:
+ REPEATABLE '(' a_expr ')' { $$ = (Node *) $3; }
+ | /*EMPTY*/ { $$ = NULL; }
+ ;
+
/*
* func_table represents a function invocation in a FROM list. It can be
* a plain function call, like "foo(...)", or a ROWS FROM expression with
@@ -13463,7 +13495,6 @@ unreserved_keyword:
| RELATIVE_P
| RELEASE
| RENAME
- | REPEATABLE
| REPLACE
| REPLICA
| RESET
@@ -13589,6 +13620,7 @@ col_name_keyword:
| SETOF
| SMALLINT
| SUBSTRING
+ | TABLESAMPLE
| TIME
| TIMESTAMP
| TREAT
@@ -13636,6 +13668,7 @@ type_func_name_keyword:
| NOTNULL
| OUTER_P
| OVERLAPS
+ | REPEATABLE
| RIGHT
| SIMILAR
| VERBOSE
diff --git a/src/backend/parser/parse_clause.c b/src/backend/parser/parse_clause.c
index 8d90b50..63c6f9a 100644
--- a/src/backend/parser/parse_clause.c
+++ b/src/backend/parser/parse_clause.c
@@ -16,6 +16,7 @@
#include "postgres.h"
#include "access/heapam.h"
+#include "access/htup_details.h"
#include "catalog/heap.h"
#include "catalog/pg_type.h"
#include "commands/defrem.h"
@@ -29,6 +30,7 @@
#include "parser/parse_coerce.h"
#include "parser/parse_collate.h"
#include "parser/parse_expr.h"
+#include "parser/parse_func.h"
#include "parser/parse_oper.h"
#include "parser/parse_relation.h"
#include "parser/parse_target.h"
@@ -36,6 +38,7 @@
#include "utils/guc.h"
#include "utils/lsyscache.h"
#include "utils/rel.h"
+#include "utils/syscache.h"
/* Convenience macro for the most common makeNamespaceItem() case */
@@ -414,6 +417,28 @@ transformJoinOnClause(ParseState *pstate, JoinExpr *j, List *namespace)
return result;
}
+static RangeTblEntry *
+transformTableSampleEntry(ParseState *pstate, RangeTableSample *r)
+{
+ RangeTblEntry *rte;
+ TableSampleClause *tablesample = NULL;
+
+ rte = transformTableEntry(pstate, r->relation);
+
+ if (rte->relkind != RELKIND_RELATION &&
+ rte->relkind != RELKIND_MATVIEW)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("TABLESAMPLE clause can only be used on tables and materialized views"),
+ parser_errposition(pstate,
+ exprLocation((Node *) r))));
+
+ tablesample = ParseTableSample(pstate, r->method, r->repeatable, r->args);
+ rte->tablesample = tablesample;
+
+ return rte;
+}
+
/*
* transformTableEntry --- transform a RangeVar (simple relation reference)
*/
@@ -422,7 +447,7 @@ transformTableEntry(ParseState *pstate, RangeVar *r)
{
RangeTblEntry *rte;
- /* We need only build a range table entry */
+ /* We first need to build a range table entry */
rte = addRangeTableEntry(pstate, r, r->alias,
interpretInhOption(r->inhOpt), true);
@@ -1122,6 +1147,27 @@ transformFromClauseItem(ParseState *pstate, Node *n,
return (Node *) j;
}
+ else if (IsA(n, RangeTableSample))
+ {
+ /* Tablesample reference */
+ RangeTableSample *rv = (RangeTableSample *) n;
+ RangeTblRef *rtr;
+ RangeTblEntry *rte = NULL;
+ int rtindex;
+
+
+ rte = transformTableSampleEntry(pstate, rv);
+
+ /* assume new rte is at end */
+ rtindex = list_length(pstate->p_rtable);
+ Assert(rte == rt_fetch(rtindex, pstate->p_rtable));
+ *top_rte = rte;
+ *top_rti = rtindex;
+ *namespace = list_make1(makeDefaultNSItem(rte));
+ rtr = makeNode(RangeTblRef);
+ rtr->rtindex = rtindex;
+ return (Node *) rtr;
+ }
else
elog(ERROR, "unrecognized node type: %d", (int) nodeTag(n));
return NULL; /* can't get here, keep compiler quiet */
diff --git a/src/backend/parser/parse_func.c b/src/backend/parser/parse_func.c
index 53bbaec..090f2a6 100644
--- a/src/backend/parser/parse_func.c
+++ b/src/backend/parser/parse_func.c
@@ -18,6 +18,7 @@
#include "catalog/pg_aggregate.h"
#include "catalog/pg_proc.h"
#include "catalog/pg_type.h"
+#include "catalog/pg_tablesample_method.h"
#include "funcapi.h"
#include "lib/stringinfo.h"
#include "nodes/makefuncs.h"
@@ -26,6 +27,7 @@
#include "parser/parse_clause.h"
#include "parser/parse_coerce.h"
#include "parser/parse_func.h"
+#include "parser/parse_expr.h"
#include "parser/parse_relation.h"
#include "parser/parse_target.h"
#include "parser/parse_type.h"
@@ -760,6 +762,135 @@ ParseFuncOrColumn(ParseState *pstate, List *funcname, List *fargs,
}
+/*
+ * ParseTableSample
+ *
+ * Parse TABLESAMPLE clause and process the arguments
+ */
+extern TableSampleClause *
+ParseTableSample(ParseState *pstate, char *samplemethod, Node *repeatable,
+ List *sampleargs)
+{
+ HeapTuple tuple;
+ Form_pg_tablesample_method tsm;
+ Form_pg_proc procform;
+ TableSampleClause *tablesample;
+ List *fargs;
+ ListCell *larg;
+ int nargs, initnargs;
+ Oid actual_arg_types[FUNC_MAX_ARGS];
+ Oid init_arg_types[FUNC_MAX_ARGS];
+
+ /* Load the tablesample method */
+ tuple = SearchSysCache1(TABLESAMPLEMETHODNAME, PointerGetDatum(samplemethod));
+ if (!HeapTupleIsValid(tuple))
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_OBJECT),
+ errmsg("tablesample method \"%s\" does not exist",
+ samplemethod)));
+
+ tablesample = makeNode(TableSampleClause);
+ tablesample->tsmid = HeapTupleGetOid(tuple);
+
+ tsm = (Form_pg_tablesample_method) GETSTRUCT(tuple);
+
+ tablesample->tsmseqscan = tsm->tsmseqscan;
+ tablesample->tsmpagemode = tsm->tsmpagemode;
+ tablesample->tsminit = tsm->tsminit;
+ tablesample->tsmnextblock = tsm->tsmnextblock;
+ tablesample->tsmnexttuple = tsm->tsmnexttuple;
+ tablesample->tsmexaminetuple = tsm->tsmexaminetuple;
+ tablesample->tsmend = tsm->tsmend;
+ tablesample->tsmreset = tsm->tsmreset;
+ tablesample->tsmcost = tsm->tsmcost;
+
+ ReleaseSysCache(tuple);
+
+ /* Validate the parameters against init function definition. */
+ tuple = SearchSysCache1(PROCOID,
+ ObjectIdGetDatum(tablesample->tsminit));
+
+ if (!HeapTupleIsValid(tuple)) /* should not happen */
+ elog(ERROR, "cache lookup failed for function %u",
+ tablesample->tsminit);
+
+ procform = (Form_pg_proc) GETSTRUCT(tuple);
+ initnargs = procform->pronargs;
+ Assert(initnargs >= 3);
+
+ /*
+ * First parameter is used to pass the SampleScanState, second is
+ * seed (REPEATABLE), skip the processing for them here, just assert
+ * that the types are correct.
+ */
+ Assert(procform->proargtypes.values[0] == INTERNALOID);
+ Assert(procform->proargtypes.values[1] == INT4OID);
+ initnargs -= 2;
+ memcpy(init_arg_types, procform->proargtypes.values + 2,
+ initnargs * sizeof(Oid));
+
+ /* Now we are done with the catalog */
+ ReleaseSysCache(tuple);
+
+ /* Process repeatable (seed) */
+ if (repeatable != NULL)
+ {
+ Node *arg = repeatable;
+
+ if (arg && IsA(arg, A_Const))
+ {
+ A_Const *con = (A_Const *) arg;
+
+ if (con->val.type == T_Null)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("REPEATABLE clause must be NOT NULL numeric value"),
+ parser_errposition(pstate, con->location)));
+
+ }
+
+ arg = transformExpr(pstate, arg, EXPR_KIND_FROM_FUNCTION);
+ arg = coerce_to_specific_type(pstate, arg, INT4OID, "REPEATABLE");
+ tablesample->repeatable = arg;
+ }
+ else
+ tablesample->repeatable = NULL;
+
+ /* Transform the rest of arguments ... */
+ fargs = NIL;
+ nargs = 0;
+ foreach(larg, sampleargs)
+ {
+ Node *arg = transformExpr(pstate, (Node *) lfirst(larg), EXPR_KIND_FROM_FUNCTION);
+ Oid argtype = exprType(arg);
+
+ fargs = lappend(fargs, arg);
+
+ actual_arg_types[nargs++] = argtype;
+ }
+
+ /*
+ * Check if parameters are correct.
+ *
+ * XXX: can we do better at hinting here?
+ */
+ if (initnargs != nargs ||
+ !can_coerce_type(initnargs, actual_arg_types, init_arg_types,
+ COERCION_IMPLICIT))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("wrong parameters for tablesample method \"%s\"",
+ samplemethod)));
+
+ /* perform the necessary typecasting of arguments */
+ make_fn_arguments(pstate, fargs, actual_arg_types, init_arg_types);
+
+ /* Pass the arguments down */
+ tablesample->args = fargs;
+
+ return tablesample;
+}
+
/* func_match_argtypes()
*
* Given a list of candidate functions (having the right name and number
diff --git a/src/backend/utils/Makefile b/src/backend/utils/Makefile
index 8374533..9daa2ae 100644
--- a/src/backend/utils/Makefile
+++ b/src/backend/utils/Makefile
@@ -9,7 +9,8 @@ top_builddir = ../../..
include $(top_builddir)/src/Makefile.global
OBJS = fmgrtab.o
-SUBDIRS = adt cache error fmgr hash init mb misc mmgr resowner sort time
+SUBDIRS = adt cache error fmgr hash init mb misc mmgr resowner sort time \
+ tablesample
# location of Catalog.pm
catalogdir = $(top_srcdir)/src/backend/catalog
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index 2fa30be..49e7a84 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -31,6 +31,7 @@
#include "catalog/pg_opclass.h"
#include "catalog/pg_operator.h"
#include "catalog/pg_proc.h"
+#include "catalog/pg_tablesample_method.h"
#include "catalog/pg_trigger.h"
#include "catalog/pg_type.h"
#include "commands/defrem.h"
@@ -343,6 +344,8 @@ static void make_ruledef(StringInfo buf, HeapTuple ruletup, TupleDesc rulettc,
int prettyFlags);
static void make_viewdef(StringInfo buf, HeapTuple ruletup, TupleDesc rulettc,
int prettyFlags, int wrapColumn);
+static void get_tablesample_def(TableSampleClause *tablesample,
+ deparse_context *context);
static void get_query_def(Query *query, StringInfo buf, List *parentnamespace,
TupleDesc resultDesc,
int prettyFlags, int wrapColumn, int startIndent);
@@ -4184,6 +4187,50 @@ make_viewdef(StringInfo buf, HeapTuple ruletup, TupleDesc rulettc,
heap_close(ev_relation, AccessShareLock);
}
+/* ----------
+ * get_tablesample_def - Convert TableSampleClause back to SQL
+ * ----------
+ */
+static void
+get_tablesample_def(TableSampleClause *tablesample, deparse_context *context)
+{
+ StringInfo buf = context->buf;
+ HeapTuple tuple;
+ Form_pg_tablesample_method tsm;
+ char *tsmname;
+ int nargs;
+ ListCell *l;
+
+ /* Load the tablesample method */
+ tuple = SearchSysCache1(TABLESAMPLEMETHODOID, ObjectIdGetDatum(tablesample->tsmid));
+ if (!HeapTupleIsValid(tuple))
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_OBJECT),
+ errmsg("cache lookup failed for tablesample method %u",
+ tablesample->tsmid)));
+
+ tsm = (Form_pg_tablesample_method) GETSTRUCT(tuple);
+ tsmname = NameStr(tsm->tsmname);
+ appendStringInfo(buf, " TABLESAMPLE %s (", quote_identifier(tsmname));
+
+ ReleaseSysCache(tuple);
+
+ nargs = 0;
+ foreach(l, tablesample->args)
+ {
+ if (nargs++ > 0)
+ appendStringInfoString(buf, ", ");
+ get_rule_expr((Node *) lfirst(l), context, true);
+ }
+ appendStringInfoChar(buf, ')');
+
+ if (tablesample->repeatable != NULL)
+ {
+ appendStringInfoString(buf, " REPEATABLE (");
+ get_rule_expr(tablesample->repeatable, context, true);
+ appendStringInfoChar(buf, ')');
+ }
+}
/* ----------
* get_query_def - Parse back one query parsetree
@@ -8435,6 +8482,9 @@ get_from_clause_item(Node *jtnode, Query *query, deparse_context *context)
only_marker(rte),
generate_relation_name(rte->relid,
context->namespaces));
+
+ if (rte->tablesample)
+ get_tablesample_def(rte->tablesample, context);
break;
case RTE_SUBQUERY:
/* Subquery RTE */
diff --git a/src/backend/utils/cache/syscache.c b/src/backend/utils/cache/syscache.c
index bd27168..3a8f01e 100644
--- a/src/backend/utils/cache/syscache.c
+++ b/src/backend/utils/cache/syscache.c
@@ -55,6 +55,7 @@
#include "catalog/pg_shdescription.h"
#include "catalog/pg_shseclabel.h"
#include "catalog/pg_statistic.h"
+#include "catalog/pg_tablesample_method.h"
#include "catalog/pg_tablespace.h"
#include "catalog/pg_ts_config.h"
#include "catalog/pg_ts_config_map.h"
@@ -642,6 +643,28 @@ static const struct cachedesc cacheinfo[] = {
},
128
},
+ {TableSampleMethodRelationId, /* TABLESAMPLEMETHODNAME */
+ TableSampleMethodNameIndexId,
+ 1,
+ {
+ Anum_pg_tablesample_method_tsmname,
+ 0,
+ 0,
+ 0,
+ },
+ 2
+ },
+ {TableSampleMethodRelationId, /* TABLESAMPLEMETHODOID */
+ TableSampleMethodOidIndexId,
+ 1,
+ {
+ ObjectIdAttributeNumber,
+ 0,
+ 0,
+ 0,
+ },
+ 2
+ },
{TableSpaceRelationId, /* TABLESPACEOID */
TablespaceOidIndexId,
1,
diff --git a/src/backend/utils/misc/sampling.c b/src/backend/utils/misc/sampling.c
index 1eeabaf..f213c46 100644
--- a/src/backend/utils/misc/sampling.c
+++ b/src/backend/utils/misc/sampling.c
@@ -46,6 +46,8 @@ BlockSampler_Init(BlockSampler bs, BlockNumber nblocks, int samplesize,
bs->n = samplesize;
bs->t = 0; /* blocks scanned so far */
bs->m = 0; /* blocks selected so far */
+
+ sampler_random_init_state(randseed, bs->randstate);
}
bool
@@ -92,7 +94,7 @@ BlockSampler_Next(BlockSampler bs)
* less than k, which means that we cannot fail to select enough blocks.
*----------
*/
- V = sampler_random_fract();
+ V = sampler_random_fract(bs->randstate);
p = 1.0 - (double) k / (double) K;
while (V < p)
{
@@ -126,8 +128,14 @@ BlockSampler_Next(BlockSampler bs)
void
reservoir_init_selection_state(ReservoirState rs, int n)
{
+ /*
+ * Reservoir sampling is not used anywhere where it would need to return
+ * repeatable results so we can initialize it randomly.
+ */
+ sampler_random_init_state(random(), rs->randstate);
+
/* Initial value of W (for use when Algorithm Z is first applied) */
- *rs = exp(-log(sampler_random_fract()) / n);
+ rs->W = exp(-log(sampler_random_fract(rs->randstate)) / n);
}
double
@@ -142,7 +150,7 @@ reservoir_get_next_S(ReservoirState rs, double t, int n)
double V,
quot;
- V = sampler_random_fract(); /* Generate V */
+ V = sampler_random_fract(rs->randstate); /* Generate V */
S = 0;
t += 1;
/* Note: "num" in Vitter's code is always equal to t - n */
@@ -158,7 +166,7 @@ reservoir_get_next_S(ReservoirState rs, double t, int n)
else
{
/* Now apply Algorithm Z */
- double W = *rs;
+ double W = rs->W;
double term = t - (double) n + 1;
for (;;)
@@ -174,7 +182,7 @@ reservoir_get_next_S(ReservoirState rs, double t, int n)
tmp;
/* Generate U and X */
- U = sampler_random_fract();
+ U = sampler_random_fract(rs->randstate);
X = t * (W - 1.0);
S = floor(X); /* S is tentatively set to floor(X) */
/* Test if U <= h(S)/cg(X) in the manner of (6.3) */
@@ -203,11 +211,11 @@ reservoir_get_next_S(ReservoirState rs, double t, int n)
y *= numer / denom;
denom -= 1;
}
- W = exp(-log(sampler_random_fract()) / n); /* Generate W in advance */
+ W = exp(-log(sampler_random_fract(rs->randstate)) / n); /* Generate W in advance */
if (exp(log(y) / n) <= (t + X) / t)
break;
}
- *rs = W;
+ rs->W = W;
}
return S;
}
@@ -217,10 +225,17 @@ reservoir_get_next_S(ReservoirState rs, double t, int n)
* Random number generator used by sampling
*----------
*/
+void
+sampler_random_init_state(long seed, SamplerRandomState randstate)
+{
+ randstate[0] = 0x330e;
+ randstate[1] = (unsigned short) seed;
+ randstate[2] = (unsigned short) (seed >> 16);
+}
/* Select a random value R uniformly distributed in (0 - 1) */
double
-sampler_random_fract()
+sampler_random_fract(SamplerRandomState randstate)
{
- return ((double) random() + 1) / ((double) MAX_RANDOM_VALUE + 2);
+ return pg_erand48(randstate);
}
diff --git a/src/backend/utils/tablesample/Makefile b/src/backend/utils/tablesample/Makefile
new file mode 100644
index 0000000..df92939
--- /dev/null
+++ b/src/backend/utils/tablesample/Makefile
@@ -0,0 +1,17 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+# Makefile for utils/tablesample
+#
+# IDENTIFICATION
+# src/backend/utils/tablesample/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/backend/utils/tablesample
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS = system.o bernoulli.o
+
+include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/utils/tablesample/bernoulli.c b/src/backend/utils/tablesample/bernoulli.c
new file mode 100644
index 0000000..36f4bcb
--- /dev/null
+++ b/src/backend/utils/tablesample/bernoulli.c
@@ -0,0 +1,224 @@
+/*-------------------------------------------------------------------------
+ *
+ * bernoulli.c
+ * interface routines for BERNOULLI tablesample method
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/utils/tablesample/bernoulli.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "fmgr.h"
+
+#include "access/relscan.h"
+#include "nodes/execnodes.h"
+#include "nodes/relation.h"
+#include "optimizer/clauses.h"
+#include "storage/bufmgr.h"
+#include "utils/sampling.h"
+#include "utils/tablesample.h"
+
+
+/* State */
+typedef struct
+{
+ uint32 seed; /* random seed */
+ BlockNumber startblock; /* starting block, we use ths for syncscan support */
+ BlockNumber nblocks; /* number of blocks */
+ BlockNumber blockno; /* current block */
+ float4 probability; /* probabilty that tuple will be returned (0.0-1.0) */
+ OffsetNumber lt; /* last tuple returned from current block */
+ SamplerRandomState randstate; /* random generator state */
+} BernoulliSamplerData;
+
+/*
+ * Initialize the state.
+ */
+Datum
+tsm_bernoulli_init(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ uint32 seed = PG_GETARG_UINT32(1);
+ float4 percent = PG_ARGISNULL(2) ? -1 : PG_GETARG_FLOAT4(2);
+ HeapScanDesc scan = scanstate->ss.ss_currentScanDesc;
+ BernoulliSamplerData *sampler;
+
+ if (percent < 0 || percent > 100)
+ ereport(ERROR,
+ (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+ errmsg("invalid sample size"),
+ errhint("Sample size must be numeric value between 0 and 100 (inclusive).")));
+
+ sampler = palloc0(sizeof(BernoulliSamplerData));
+
+ /* Remember initial values for reinit */
+ sampler->seed = seed;
+ sampler->startblock = scan->rs_startblock;
+ sampler->nblocks = scan->rs_nblocks;
+ sampler->blockno = InvalidBlockNumber;
+ sampler->probability = percent / 100;
+ sampler->lt = InvalidOffsetNumber;
+ sampler_random_init_state(sampler->seed, sampler->randstate);
+
+ scanstate->tsmdata = (void *) sampler;
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Get next block number to read or InvalidBlockNumber if we are at the
+ * end of the relation.
+ */
+Datum
+tsm_bernoulli_nextblock(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ BernoulliSamplerData *sampler =
+ (BernoulliSamplerData *) scanstate->tsmdata;
+
+ /*
+ * Bernoulli sampling scans all blocks on the table and supports
+ * syncscan so loop from startblock to startblock instead of
+ * from 0 to nblocks.
+ */
+ if (sampler->blockno == InvalidBlockNumber)
+ sampler->blockno = sampler->startblock;
+ else
+ {
+ sampler->blockno++;
+
+ if (sampler->blockno >= sampler->nblocks)
+ sampler->blockno = 0;
+
+ if (sampler->blockno == sampler->startblock)
+ PG_RETURN_UINT32(InvalidBlockNumber);
+ }
+
+ PG_RETURN_UINT32(sampler->blockno);
+}
+
+/*
+ * Get next tuple from current block.
+ *
+ * This method implements the main logic in bernoulli sampling.
+ * The algorithm simply generates new random number (in 0.0-1.0 range) and if
+ * it falls within user specified probability (in the same range) return the
+ * tuple offset.
+ *
+ * If we reach end of the block return InvalidOffsetNumber which tells
+ * SampleScan to go to next block.
+ */
+Datum
+tsm_bernoulli_nexttuple(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ OffsetNumber maxoffset = PG_GETARG_UINT16(2);
+ BernoulliSamplerData *sampler =
+ (BernoulliSamplerData *) scanstate->tsmdata;
+ OffsetNumber tupoffset = sampler->lt;
+ float4 probability = sampler->probability;
+
+ if (tupoffset == InvalidOffsetNumber)
+ tupoffset = FirstOffsetNumber;
+ else
+ tupoffset++;
+
+ /*
+ * Loop over tuple offsets until the random generator returns value that
+ * is within the probability of returning the tuple or until we reach
+ * end of the block.
+ *
+ * (This is our implementation of bernoulli trial)
+ */
+ while (sampler_random_fract(sampler->randstate) > probability)
+ {
+ tupoffset++;
+
+ if (tupoffset > maxoffset)
+ break;
+ }
+
+ if (tupoffset > maxoffset)
+ /* Tell SampleScan that we want next block. */
+ tupoffset = InvalidOffsetNumber;
+
+ sampler->lt = tupoffset;
+
+ PG_RETURN_UINT16(tupoffset);
+}
+
+/*
+ * Cleanup method.
+ */
+Datum
+tsm_bernoulli_end(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+
+ pfree(scanstate->tsmdata);
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Reset state (called by ReScan).
+ */
+Datum
+tsm_bernoulli_reset(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ BernoulliSamplerData *sampler =
+ (BernoulliSamplerData *) scanstate->tsmdata;
+
+ sampler->blockno = InvalidBlockNumber;
+ sampler->lt = InvalidOffsetNumber;
+ sampler_random_init_state(sampler->seed, sampler->randstate);
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Costing function.
+ */
+Datum
+tsm_bernoulli_cost(PG_FUNCTION_ARGS)
+{
+ PlannerInfo *root = (PlannerInfo *) PG_GETARG_POINTER(0);
+ Path *path = (Path *) PG_GETARG_POINTER(1);
+ RelOptInfo *baserel = (RelOptInfo *) PG_GETARG_POINTER(2);
+ List *args = (List *) PG_GETARG_POINTER(3);
+ BlockNumber *pages = (BlockNumber *) PG_GETARG_POINTER(4);
+ double *tuples = (double *) PG_GETARG_POINTER(5);
+ Node *pctnode;
+ float4 samplesize;
+
+ *pages = baserel->pages;
+
+ pctnode = linitial(args);
+ pctnode = estimate_expression_value(root, pctnode);
+
+ if (IsA(pctnode, RelabelType))
+ pctnode = (Node *) ((RelabelType *) pctnode)->arg;
+
+ if (IsA(pctnode, Const))
+ {
+ samplesize = DatumGetFloat4(((Const *) pctnode)->constvalue);
+ samplesize /= 100.0;
+ }
+ else
+ {
+ /* Default samplesize if the estimation didn't return Const. */
+ samplesize = 0.1f;
+ }
+
+ *tuples = path->rows * samplesize;
+ path->rows = *tuples;
+
+ PG_RETURN_VOID();
+}
diff --git a/src/backend/utils/tablesample/system.c b/src/backend/utils/tablesample/system.c
new file mode 100644
index 0000000..07d1f3a
--- /dev/null
+++ b/src/backend/utils/tablesample/system.c
@@ -0,0 +1,185 @@
+/*-------------------------------------------------------------------------
+ *
+ * system.c
+ * interface routines for system tablesample method
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/utils/tablesample/system.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "fmgr.h"
+
+#include "nodes/execnodes.h"
+#include "nodes/relation.h"
+#include "optimizer/clauses.h"
+#include "storage/bufmgr.h"
+#include "utils/sampling.h"
+#include "utils/tablesample.h"
+
+
+/*
+ * State
+ */
+typedef struct
+{
+ BlockSamplerData bs;
+ uint32 seed; /* random seed */
+ BlockNumber tblocks; /* total blocks in relation */
+ int samplesize; /* number of blocks to return */
+ OffsetNumber lt; /* last tuple returned from current block */
+} SystemSamplerData;
+
+
+/*
+ * Initializes the state.
+ */
+Datum
+tsm_system_init(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ uint32 seed = PG_GETARG_UINT32(1);
+ float4 percent = PG_ARGISNULL(2) ? -1 : PG_GETARG_FLOAT4(2);
+ SystemSamplerData *sampler;
+
+ if (percent < 0 || percent > 100)
+ ereport(ERROR,
+ (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+ errmsg("invalid sample size"),
+ errhint("Sample size must be numeric value between 0 and 100 (inclusive).")));
+
+ sampler = palloc0(sizeof(SystemSamplerData));
+
+ /* Remember initial values for reinit */
+ sampler->seed = seed;
+ sampler->tblocks =
+ RelationGetNumberOfBlocks(scanstate->ss.ss_currentRelation);
+ sampler->samplesize = 1 + (int) (sampler->tblocks * (percent / 100.0));
+ sampler->lt = InvalidOffsetNumber;
+
+ BlockSampler_Init(&sampler->bs, sampler->tblocks, sampler->samplesize,
+ sampler->seed);
+
+ scanstate->tsmdata = (void *) sampler;
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Get next block number or InvalidBlockNumber when we're done.
+ *
+ * Uses the same logic as VACUUM for picking the random blocks.
+ */
+Datum
+tsm_system_nextblock(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ SystemSamplerData *sampler = (SystemSamplerData *) scanstate->tsmdata;
+ BlockNumber blockno;
+
+ if (!BlockSampler_HasMore(&sampler->bs))
+ PG_RETURN_UINT32(InvalidBlockNumber);
+
+ blockno = BlockSampler_Next(&sampler->bs);
+
+ PG_RETURN_UINT32(blockno);
+}
+
+/*
+ * Get next tuple offset in current block or InvalidOffsetNumber if we are done
+ * with this block.
+ */
+Datum
+tsm_system_nexttuple(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ OffsetNumber maxoffset = PG_GETARG_UINT16(2);
+ SystemSamplerData *sampler = (SystemSamplerData *) scanstate->tsmdata;
+ OffsetNumber tupoffset = sampler->lt;
+
+ if (tupoffset == InvalidOffsetNumber)
+ tupoffset = FirstOffsetNumber;
+ else
+ tupoffset++;
+
+ if (tupoffset > maxoffset)
+ tupoffset = InvalidOffsetNumber;
+
+ sampler->lt = tupoffset;
+
+ PG_RETURN_UINT16(tupoffset);
+}
+
+/*
+ * Cleanup method.
+ */
+Datum
+tsm_system_end(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+
+ pfree(scanstate->tsmdata);
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Reset state (called by ReScan).
+ */
+Datum
+tsm_system_reset(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ SystemSamplerData *sampler = (SystemSamplerData *) scanstate->tsmdata;
+
+ sampler->lt = InvalidOffsetNumber;
+ BlockSampler_Init(&sampler->bs, sampler->tblocks, sampler->samplesize,
+ sampler->seed);
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Costing function.
+ */
+Datum
+tsm_system_cost(PG_FUNCTION_ARGS)
+{
+ PlannerInfo *root = (PlannerInfo *) PG_GETARG_POINTER(0);
+ Path *path = (Path *) PG_GETARG_POINTER(1);
+ RelOptInfo *baserel = (RelOptInfo *) PG_GETARG_POINTER(2);
+ List *args = (List *) PG_GETARG_POINTER(3);
+ BlockNumber *pages = (BlockNumber *) PG_GETARG_POINTER(4);
+ double *tuples = (double *) PG_GETARG_POINTER(5);
+ Node *pctnode;
+ float4 samplesize;
+
+ pctnode = linitial(args);
+ pctnode = estimate_expression_value(root, pctnode);
+
+ if (IsA(pctnode, RelabelType))
+ pctnode = (Node *) ((RelabelType *) pctnode)->arg;
+
+ if (IsA(pctnode, Const))
+ {
+ samplesize = DatumGetFloat4(((Const *) pctnode)->constvalue);
+ samplesize /= 100.0;
+ }
+ else
+ {
+ /* Default samplesize if the estimation didn't return Const. */
+ samplesize = 0.1f;
+ }
+
+ *pages = baserel->pages * samplesize;
+ *tuples = path->rows * samplesize;
+ path->rows = *tuples;
+
+ PG_RETURN_VOID();
+}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 939d93d..94458aa 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -113,8 +113,12 @@ extern HeapScanDesc heap_beginscan_strat(Relation relation, Snapshot snapshot,
bool allow_strat, bool allow_sync);
extern HeapScanDesc heap_beginscan_bm(Relation relation, Snapshot snapshot,
int nkeys, ScanKey key);
+extern HeapScanDesc heap_beginscan_ss(Relation relation, Snapshot snapshot,
+ int nkeys, ScanKey key,
+ bool allow_strat, bool allow_pagemode);
extern void heap_setscanlimits(HeapScanDesc scan, BlockNumber startBlk,
BlockNumber endBlk);
+extern void heapgetpage(HeapScanDesc scan, BlockNumber page);
extern void heap_rescan(HeapScanDesc scan, ScanKey key);
extern void heap_endscan(HeapScanDesc scan);
extern HeapTuple heap_getnext(HeapScanDesc scan, ScanDirection direction);
diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
index 9bb6362..e2b2b4f 100644
--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -29,6 +29,7 @@ typedef struct HeapScanDescData
int rs_nkeys; /* number of scan keys */
ScanKey rs_key; /* array of scan key descriptors */
bool rs_bitmapscan; /* true if this is really a bitmap scan */
+ bool rs_samplescan; /* true if this is really a sample scan */
bool rs_pageatatime; /* verify visibility page-at-a-time? */
bool rs_allow_strat; /* allow or disallow use of access strategy */
bool rs_allow_sync; /* allow or disallow use of syncscan */
diff --git a/src/include/catalog/indexing.h b/src/include/catalog/indexing.h
index a680229..c711cca 100644
--- a/src/include/catalog/indexing.h
+++ b/src/include/catalog/indexing.h
@@ -305,6 +305,11 @@ DECLARE_UNIQUE_INDEX(pg_policy_oid_index, 3257, on pg_policy using btree(oid oid
DECLARE_UNIQUE_INDEX(pg_policy_polrelid_polname_index, 3258, on pg_policy using btree(polrelid oid_ops, polname name_ops));
#define PolicyPolrelidPolnameIndexId 3258
+DECLARE_UNIQUE_INDEX(pg_tablesample_method_name_index, 3281, on pg_tablesample_method using btree(tsmname name_ops));
+#define TableSampleMethodNameIndexId 3281
+DECLARE_UNIQUE_INDEX(pg_tablesample_method_oid_index, 3282, on pg_tablesample_method using btree(oid oid_ops));
+#define TableSampleMethodOidIndexId 3282
+
/* last step of initialization script: build the indexes declared above */
BUILD_INDICES
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index b8a3660..a1a6f16 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -5149,6 +5149,31 @@ DESCR("rank of hypothetical row without gaps");
DATA(insert OID = 3993 ( dense_rank_final PGNSP PGUID 12 1 0 2276 0 f f f f f f i 2 0 20 "2281 2276" "{2281,2276}" "{i,v}" _null_ _null_ hypothetical_dense_rank_final _null_ _null_ _null_ ));
DESCR("aggregate final function");
+DATA(insert OID = 3285 ( tsm_system_init PGNSP PGUID 12 1 0 0 0 f f f f t f v 3 0 2278 "2281 23 700" _null_ _null_ _null_ _null_ tsm_system_init _null_ _null_ _null_ ));
+DESCR("tsm_system_init(internal)");
+DATA(insert OID = 3286 ( tsm_system_nextblock PGNSP PGUID 12 1 0 0 0 f f f f t f v 2 0 23 "2281 16" _null_ _null_ _null_ _null_ tsm_system_nextblock _null_ _null_ _null_ ));
+DESCR("tsm_system_nextblock(internal)");
+DATA(insert OID = 3287 ( tsm_system_nexttuple PGNSP PGUID 12 1 0 0 0 f f f f t f v 4 0 21 "2281 23 21 16" _null_ _null_ _null_ _null_ tsm_system_nexttuple _null_ _null_ _null_ ));
+DESCR("tsm_system_nexttuple(internal)");
+DATA(insert OID = 3288 ( tsm_system_end PGNSP PGUID 12 1 0 0 0 f f f f t f v 1 0 2278 "2281" _null_ _null_ _null_ _null_ tsm_system_end _null_ _null_ _null_ ));
+DESCR("tsm_system_end(internal)");
+DATA(insert OID = 3289 ( tsm_system_reset PGNSP PGUID 12 1 0 0 0 f f f f t f v 1 0 2278 "2281" _null_ _null_ _null_ _null_ tsm_system_reset _null_ _null_ _null_ ));
+DESCR("tsm_system_reset(internal)");
+DATA(insert OID = 3290 ( tsm_system_cost PGNSP PGUID 12 1 0 0 0 f f f f t f v 7 0 2278 "2281 2281 2281 2281 2281 2281 2281" _null_ _null_ _null_ _null_ tsm_system_cost _null_ _null_ _null_ ));
+DESCR("tsm_system_cost(internal)");
+
+DATA(insert OID = 3291 ( tsm_bernoulli_init PGNSP PGUID 12 1 0 0 0 f f f f t f v 3 0 2278 "2281 23 700" _null_ _null_ _null_ _null_ tsm_bernoulli_init _null_ _null_ _null_ ));
+DESCR("tsm_bernoulli_init(internal)");
+DATA(insert OID = 3292 ( tsm_bernoulli_nextblock PGNSP PGUID 12 1 0 0 0 f f f f t f v 2 0 23 "2281 16" _null_ _null_ _null_ _null_ tsm_bernoulli_nextblock _null_ _null_ _null_ ));
+DESCR("tsm_bernoulli_nextblock(internal)");
+DATA(insert OID = 3293 ( tsm_bernoulli_nexttuple PGNSP PGUID 12 1 0 0 0 f f f f t f v 4 0 21 "2281 23 21 16" _null_ _null_ _null_ _null_ tsm_bernoulli_nexttuple _null_ _null_ _null_ ));
+DESCR("tsm_bernoulli_nexttuple(internal)");
+DATA(insert OID = 3294 ( tsm_bernoulli_end PGNSP PGUID 12 1 0 0 0 f f f f t f v 1 0 2278 "2281" _null_ _null_ _null_ _null_ tsm_bernoulli_end _null_ _null_ _null_ ));
+DESCR("tsm_bernoulli_end(internal)");
+DATA(insert OID = 3296 ( tsm_bernoulli_reset PGNSP PGUID 12 1 0 0 0 f f f f t f v 1 0 2278 "2281" _null_ _null_ _null_ _null_ tsm_bernoulli_reset _null_ _null_ _null_ ));
+DESCR("tsm_bernoulli_reset(internal)");
+DATA(insert OID = 3297 ( tsm_bernoulli_cost PGNSP PGUID 12 1 0 0 0 f f f f t f v 7 0 2278 "2281 2281 2281 2281 2281 2281 2281" _null_ _null_ _null_ _null_ tsm_bernoulli_cost _null_ _null_ _null_ ));
+DESCR("tsm_bernoulli_cost(internal)");
/*
* Symbolic values for provolatile column: these indicate whether the result
diff --git a/src/include/catalog/pg_tablesample_method.h b/src/include/catalog/pg_tablesample_method.h
new file mode 100644
index 0000000..98df9cb
--- /dev/null
+++ b/src/include/catalog/pg_tablesample_method.h
@@ -0,0 +1,77 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_tablesample_method.h
+ * definition of the table scan methods.
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/catalog/pg_tablesample_method.h
+ *
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PG_TABLESAMPLE_METHOD_H
+#define PG_TABLESAMPLE_METHOD_H
+
+#include "catalog/genbki.h"
+
+/* ----------------
+ * pg_tablesample_method definition. cpp turns this into
+ * typedef struct FormData_pg_tablesample_method
+ * ----------------
+ */
+#define TableSampleMethodRelationId 3280
+
+CATALOG(pg_tablesample_method,3280)
+{
+ NameData tsmname; /* tablesample method name */
+ bool tsmseqscan; /* does this method scan whole table sequentially? */
+ bool tsmpagemode; /* does this method scan page at a time? */
+ regproc tsminit; /* init scan function */
+ regproc tsmnextblock; /* function returning next block to sample
+ or InvalidBlockOffset if finished */
+ regproc tsmnexttuple; /* function returning next tuple offset from current block
+ or InvalidOffsetNumber if end of the block was reacher */
+ regproc tsmexaminetuple; /* optional function which can examine tuple contents and
+ decide if tuple should be returned or not */
+ regproc tsmend; /* end scan function*/
+ regproc tsmreset; /* reset state - used by rescan */
+ regproc tsmcost; /* costing function */
+} FormData_pg_tablesample_method;
+
+/* ----------------
+ * Form_pg_tablesample_method corresponds to a pointer to a tuple with
+ * the format of pg_tablesample_method relation.
+ * ----------------
+ */
+typedef FormData_pg_tablesample_method *Form_pg_tablesample_method;
+
+/* ----------------
+ * compiler constants for pg_tablesample_method
+ * ----------------
+ */
+#define Natts_pg_tablesample_method 10
+#define Anum_pg_tablesample_method_tsmname 1
+#define Anum_pg_tablesample_method_tsmseqscan 2
+#define Anum_pg_tablesample_method_tsmpagemode 3
+#define Anum_pg_tablesample_method_tsminit 4
+#define Anum_pg_tablesample_method_tsmnextblock 5
+#define Anum_pg_tablesample_method_tsmnexttuple 6
+#define Anum_pg_tablesample_method_tsmexaminetuple 7
+#define Anum_pg_tablesample_method_tsmend 8
+#define Anum_pg_tablesample_method_tsmreset 9
+#define Anum_pg_tablesample_method_tsmcost 10
+
+/* ----------------
+ * initial contents of pg_tablesample_method
+ * ----------------
+ */
+
+DATA(insert OID = 3283 ( system false true tsm_system_init tsm_system_nextblock tsm_system_nexttuple - tsm_system_end tsm_system_reset tsm_system_cost ));
+DESCR("SYSTEM table sampling method");
+DATA(insert OID = 3284 ( bernoulli true false tsm_bernoulli_init tsm_bernoulli_nextblock tsm_bernoulli_nexttuple - tsm_bernoulli_end tsm_bernoulli_reset tsm_bernoulli_cost ));
+DESCR("BERNOULLI table sampling method");
+
+#endif /* PG_TABLESAMPLE_METHOD_H */
diff --git a/src/include/executor/nodeSamplescan.h b/src/include/executor/nodeSamplescan.h
new file mode 100644
index 0000000..4b769da
--- /dev/null
+++ b/src/include/executor/nodeSamplescan.h
@@ -0,0 +1,24 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeSamplescan.h
+ *
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/executor/nodeSamplescan.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef NODESAMPLESCAN_H
+#define NODESAMPLESCAN_H
+
+#include "nodes/execnodes.h"
+
+extern SampleScanState *ExecInitSampleScan(SampleScan *node, EState *estate, int eflags);
+extern TupleTableSlot *ExecSampleScan(SampleScanState *node);
+extern void ExecEndSampleScan(SampleScanState *node);
+extern void ExecReScanSampleScan(SampleScanState *node);
+
+#endif /* NODESAMPLESCAN_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 59b17f3..0a7f4da 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1213,6 +1213,24 @@ typedef struct ScanState
typedef ScanState SeqScanState;
/*
+ * SampleScan
+ */
+typedef struct SampleScanState
+{
+ ScanState ss;
+
+ /* Sampling method functions. */
+ FmgrInfo tsminit;
+ FmgrInfo tsmnextblock;
+ FmgrInfo tsmnexttuple;
+ FmgrInfo tsmexaminetuple;
+ FmgrInfo tsmend;
+ FmgrInfo tsmreset;
+
+ void *tsmdata; /* for use by table scan method */
+} SampleScanState;
+
+/*
* These structs store information about index quals that don't have simple
* constant right-hand sides. See comments for ExecIndexBuildScanKeys()
* for discussion.
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 38469ef..caaedbf 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -61,6 +61,7 @@ typedef enum NodeTag
T_ValuesScan,
T_CteScan,
T_WorkTableScan,
+ T_SampleScan,
T_ForeignScan,
T_CustomScan,
T_Join,
@@ -97,6 +98,7 @@ typedef enum NodeTag
T_BitmapOrState,
T_ScanState,
T_SeqScanState,
+ T_SampleScanState,
T_IndexScanState,
T_IndexOnlyScanState,
T_BitmapIndexScanState,
@@ -414,6 +416,8 @@ typedef enum NodeTag
T_WithClause,
T_CommonTableExpr,
T_RoleSpec,
+ T_RangeTableSample,
+ T_TableSampleClause,
/*
* TAGS FOR REPLICATION GRAMMAR PARSE NODES (replnodes.h)
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 38ed661..3672e37 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -334,6 +334,26 @@ typedef struct FuncCall
} FuncCall;
/*
+ * TableSampleClause - a sampling method information
+ */
+typedef struct TableSampleClause
+{
+ NodeTag type;
+ Oid tsmid;
+ bool tsmseqscan;
+ bool tsmpagemode;
+ Oid tsminit;
+ Oid tsmnextblock;
+ Oid tsmnexttuple;
+ Oid tsmexaminetuple;
+ Oid tsmend;
+ Oid tsmreset;
+ Oid tsmcost;
+ Node *repeatable;
+ List *args;
+} TableSampleClause;
+
+/*
* A_Star - '*' representing all columns of a table or compound field
*
* This can appear within ColumnRef.fields, A_Indirection.indirection, and
@@ -534,6 +554,21 @@ typedef struct RangeFunction
} RangeFunction;
/*
+ * RangeTableSample - represents <table> TABLESAMPLE <method> (<params>) REPEATABLE (<num>)
+ *
+ * We are more generic than SQL Standard so we pass generic function
+ * arguments to the sampling method.
+ */
+typedef struct RangeTableSample
+{
+ NodeTag type;
+ RangeVar *relation;
+ char *method; /* sampling method */
+ Node *repeatable;
+ List *args; /* arguments for sampling method */
+} RangeTableSample;
+
+/*
* ColumnDef - column definition (used in various creates)
*
* If the column has a default value, we may have the value expression
@@ -778,6 +813,7 @@ typedef struct RangeTblEntry
*/
Oid relid; /* OID of the relation */
char relkind; /* relation kind (see pg_class.relkind) */
+ TableSampleClause *tablesample; /* sampling method and parameters */
/*
* Fields valid for a subquery RTE (else NULL):
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index f6683f0..5289c43 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -279,6 +279,12 @@ typedef struct Scan
typedef Scan SeqScan;
/* ----------------
+ * table sample scan node
+ * ----------------
+ */
+typedef Scan SampleScan;
+
+/* ----------------
* index scan node
*
* indexqualorig is an implicitly-ANDed list of index qual expressions, each
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 9c2000b..24003ae 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -68,6 +68,7 @@ extern double index_pages_fetched(double tuples_fetched, BlockNumber pages,
double index_pages, PlannerInfo *root);
extern void cost_seqscan(Path *path, PlannerInfo *root, RelOptInfo *baserel,
ParamPathInfo *param_info);
+extern void cost_samplescan(Path *path, PlannerInfo *root, RelOptInfo *baserel);
extern void cost_index(IndexPath *path, PlannerInfo *root,
double loop_count);
extern void cost_bitmap_heap_scan(Path *path, PlannerInfo *root, RelOptInfo *baserel,
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 9923f0e..89c8ded 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -32,6 +32,8 @@ extern bool add_path_precheck(RelOptInfo *parent_rel,
extern Path *create_seqscan_path(PlannerInfo *root, RelOptInfo *rel,
Relids required_outer);
+extern Path *create_samplescan_path(PlannerInfo *root, RelOptInfo *rel,
+ Relids required_outer);
extern IndexPath *create_index_path(PlannerInfo *root,
IndexOptInfo *index,
List *indexclauses,
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index 7c243ec..0a7b650 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -312,7 +312,7 @@ PG_KEYWORD("reindex", REINDEX, UNRESERVED_KEYWORD)
PG_KEYWORD("relative", RELATIVE_P, UNRESERVED_KEYWORD)
PG_KEYWORD("release", RELEASE, UNRESERVED_KEYWORD)
PG_KEYWORD("rename", RENAME, UNRESERVED_KEYWORD)
-PG_KEYWORD("repeatable", REPEATABLE, UNRESERVED_KEYWORD)
+PG_KEYWORD("repeatable", REPEATABLE, TYPE_FUNC_NAME_KEYWORD)
PG_KEYWORD("replace", REPLACE, UNRESERVED_KEYWORD)
PG_KEYWORD("replica", REPLICA, UNRESERVED_KEYWORD)
PG_KEYWORD("reset", RESET, UNRESERVED_KEYWORD)
@@ -366,6 +366,7 @@ PG_KEYWORD("sysid", SYSID, UNRESERVED_KEYWORD)
PG_KEYWORD("system", SYSTEM_P, UNRESERVED_KEYWORD)
PG_KEYWORD("table", TABLE, RESERVED_KEYWORD)
PG_KEYWORD("tables", TABLES, UNRESERVED_KEYWORD)
+PG_KEYWORD("tablesample", TABLESAMPLE, COL_NAME_KEYWORD)
PG_KEYWORD("tablespace", TABLESPACE, UNRESERVED_KEYWORD)
PG_KEYWORD("temp", TEMP, UNRESERVED_KEYWORD)
PG_KEYWORD("template", TEMPLATE, UNRESERVED_KEYWORD)
diff --git a/src/include/parser/parse_func.h b/src/include/parser/parse_func.h
index 3264691..6727e55 100644
--- a/src/include/parser/parse_func.h
+++ b/src/include/parser/parse_func.h
@@ -33,6 +33,10 @@ typedef enum
extern Node *ParseFuncOrColumn(ParseState *pstate, List *funcname, List *fargs,
FuncCall *fn, int location);
+extern TableSampleClause *ParseTableSample(ParseState *pstate,
+ char *samplemethod,
+ Node *repeatable, List *args);
+
extern FuncDetailCode func_get_detail(List *funcname,
List *fargs, List *fargnames,
int nargs, Oid *argtypes,
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index 6bd786d..185bd81 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -63,7 +63,6 @@ typedef struct RelationAmInfo
FmgrInfo amcanreturn;
} RelationAmInfo;
-
/*
* Here are the contents of a relation cache entry.
*/
diff --git a/src/include/utils/sampling.h b/src/include/utils/sampling.h
index e3e7f9c..4ac208d 100644
--- a/src/include/utils/sampling.h
+++ b/src/include/utils/sampling.h
@@ -15,7 +15,12 @@
#include "storage/bufmgr.h"
-extern double sampler_random_fract(void);
+/* Random generator for sampling code */
+typedef unsigned short SamplerRandomState[3];
+
+extern void sampler_random_init_state(long seed,
+ SamplerRandomState randstate);
+extern double sampler_random_fract(SamplerRandomState randstate);
/* Block sampling methods */
/* Data structure for Algorithm S from Knuth 3.4.2 */
@@ -25,6 +30,7 @@ typedef struct
int n; /* desired sample size */
BlockNumber t; /* current block number */
int m; /* blocks selected so far */
+ SamplerRandomState randstate; /* random generator state */
} BlockSamplerData;
typedef BlockSamplerData *BlockSampler;
@@ -35,7 +41,12 @@ extern bool BlockSampler_HasMore(BlockSampler bs);
extern BlockNumber BlockSampler_Next(BlockSampler bs);
/* Reservoid sampling methods */
-typedef double ReservoirStateData;
+typedef struct
+{
+ double W;
+ SamplerRandomState randstate; /* random generator state */
+} ReservoirStateData;
+
typedef ReservoirStateData *ReservoirState;
extern void reservoir_init_selection_state(ReservoirState rs, int n);
diff --git a/src/include/utils/syscache.h b/src/include/utils/syscache.h
index ba0b090..6b628f6 100644
--- a/src/include/utils/syscache.h
+++ b/src/include/utils/syscache.h
@@ -79,6 +79,8 @@ enum SysCacheIdentifier
RELOID,
RULERELNAME,
STATRELATTINH,
+ TABLESAMPLEMETHODNAME,
+ TABLESAMPLEMETHODOID,
TABLESPACEOID,
TSCONFIGMAP,
TSCONFIGNAMENSP,
diff --git a/src/include/utils/tablesample.h b/src/include/utils/tablesample.h
new file mode 100644
index 0000000..1a24cb6
--- /dev/null
+++ b/src/include/utils/tablesample.h
@@ -0,0 +1,27 @@
+/*--------------------------------------------------------------------------
+ * tablesample.h
+ * Header file for builtin table sampling methods.
+ *
+ * Copyright (c) 2006-2014, PostgreSQL Global Development Group
+ *
+ * src/include/utils/tablesample.h
+ *--------------------------------------------------------------------------
+ */
+#ifndef TABLESAMPLE_H
+#define TABLESAMPLE_H
+
+extern Datum tsm_system_init(PG_FUNCTION_ARGS);
+extern Datum tsm_system_nextblock(PG_FUNCTION_ARGS);
+extern Datum tsm_system_nexttuple(PG_FUNCTION_ARGS);
+extern Datum tsm_system_end(PG_FUNCTION_ARGS);
+extern Datum tsm_system_reset(PG_FUNCTION_ARGS);
+extern Datum tsm_system_cost(PG_FUNCTION_ARGS);
+
+extern Datum tsm_bernoulli_init(PG_FUNCTION_ARGS);
+extern Datum tsm_bernoulli_nextblock(PG_FUNCTION_ARGS);
+extern Datum tsm_bernoulli_nexttuple(PG_FUNCTION_ARGS);
+extern Datum tsm_bernoulli_end(PG_FUNCTION_ARGS);
+extern Datum tsm_bernoulli_reset(PG_FUNCTION_ARGS);
+extern Datum tsm_bernoulli_cost(PG_FUNCTION_ARGS);
+
+#endif /* TABLESAMPLE_H */
diff --git a/src/test/regress/expected/sanity_check.out b/src/test/regress/expected/sanity_check.out
index c7be273..5946edf 100644
--- a/src/test/regress/expected/sanity_check.out
+++ b/src/test/regress/expected/sanity_check.out
@@ -127,6 +127,7 @@ pg_shdepend|t
pg_shdescription|t
pg_shseclabel|t
pg_statistic|t
+pg_tablesample_method|t
pg_tablespace|t
pg_trigger|t
pg_ts_config|t
diff --git a/src/test/regress/expected/tablesample.out b/src/test/regress/expected/tablesample.out
new file mode 100644
index 0000000..ce9abf7
--- /dev/null
+++ b/src/test/regress/expected/tablesample.out
@@ -0,0 +1,168 @@
+CREATE TABLE test_tablesample (id INT, name text) WITH (fillfactor=10); -- force smaller pages so we don't have to load too much data to get multiple pages
+INSERT INTO test_tablesample SELECT i, repeat(i::text, 200) FROM generate_series(0, 9) s(i) ORDER BY i;
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50) REPEATABLE (10);
+ id
+----
+ 0
+ 1
+ 2
+ 3
+ 4
+ 5
+ 9
+(7 rows)
+
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (100.0/11) REPEATABLE (9999);
+ id
+----
+ 6
+ 7
+ 8
+(3 rows)
+
+SELECT count(*) FROM test_tablesample TABLESAMPLE SYSTEM (100);
+ count
+-------
+ 10
+(1 row)
+
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50) REPEATABLE (100);
+ id
+----
+ 0
+ 1
+ 2
+ 6
+ 7
+ 8
+ 9
+(7 rows)
+
+SELECT id FROM test_tablesample TABLESAMPLE BERNOULLI (50) REPEATABLE (100);
+ id
+----
+ 0
+ 1
+ 3
+ 4
+ 5
+(5 rows)
+
+SELECT id FROM test_tablesample TABLESAMPLE BERNOULLI (5.5) REPEATABLE (1);
+ id
+----
+ 0
+ 5
+(2 rows)
+
+CREATE VIEW test_tablesample_v1 AS SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (10*2) REPEATABLE (2);
+CREATE VIEW test_tablesample_v2 AS SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (99);
+SELECT pg_get_viewdef('test_tablesample_v1'::regclass);
+ pg_get_viewdef
+--------------------------------------------------------------------------------
+ SELECT test_tablesample.id +
+ FROM test_tablesample TABLESAMPLE system (((10 * 2))::real) REPEATABLE (2);
+(1 row)
+
+SELECT pg_get_viewdef('test_tablesample_v2'::regclass);
+ pg_get_viewdef
+-----------------------------------------------------------
+ SELECT test_tablesample.id +
+ FROM test_tablesample TABLESAMPLE system ((99)::real);
+(1 row)
+
+BEGIN;
+DECLARE tablesample_cur CURSOR FOR SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50) REPEATABLE (100);
+FETCH FIRST FROM tablesample_cur;
+ id
+----
+ 0
+(1 row)
+
+FETCH NEXT FROM tablesample_cur;
+ id
+----
+ 1
+(1 row)
+
+FETCH NEXT FROM tablesample_cur;
+ id
+----
+ 2
+(1 row)
+
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50) REPEATABLE (10);
+ id
+----
+ 0
+ 1
+ 2
+ 3
+ 4
+ 5
+ 9
+(7 rows)
+
+FETCH NEXT FROM tablesample_cur;
+ id
+----
+ 6
+(1 row)
+
+FETCH NEXT FROM tablesample_cur;
+ id
+----
+ 7
+(1 row)
+
+FETCH NEXT FROM tablesample_cur;
+ id
+----
+ 8
+(1 row)
+
+FETCH FIRST FROM tablesample_cur;
+ id
+----
+ 0
+(1 row)
+
+FETCH NEXT FROM tablesample_cur;
+ id
+----
+ 1
+(1 row)
+
+FETCH NEXT FROM tablesample_cur;
+ id
+----
+ 2
+(1 row)
+
+FETCH NEXT FROM tablesample_cur;
+ id
+----
+ 6
+(1 row)
+
+FETCH NEXT FROM tablesample_cur;
+ id
+----
+ 7
+(1 row)
+
+FETCH NEXT FROM tablesample_cur;
+ id
+----
+ 8
+(1 row)
+
+CLOSE tablesample_cur;
+END;
+-- should fail
+SELECT id FROM test_tablesample_v1 TABLESAMPLE BERNOULLI (1);
+ERROR: TABLESAMPLE clause can only be used on tables and materialized views
+DROP TABLE test_tablesample CASCADE;
+NOTICE: drop cascades to 2 other objects
+DETAIL: drop cascades to view test_tablesample_v1
+drop cascades to view test_tablesample_v2
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 6d3b865..300e1fb 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -83,7 +83,7 @@ test: select_into select_distinct select_distinct_on select_implicit select_havi
# ----------
# Another group of parallel tests
# ----------
-test: brin gin gist spgist privileges security_label collate matview lock replica_identity rowsecurity object_address
+test: brin gin gist spgist privileges security_label collate matview lock replica_identity rowsecurity object_address tablesample
# ----------
# Another group of parallel tests
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 8326894..d815496 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -153,3 +153,4 @@ test: with
test: xml
test: event_trigger
test: stats
+test: tablesample
diff --git a/src/test/regress/sql/tablesample.sql b/src/test/regress/sql/tablesample.sql
new file mode 100644
index 0000000..0d8ce39
--- /dev/null
+++ b/src/test/regress/sql/tablesample.sql
@@ -0,0 +1,42 @@
+CREATE TABLE test_tablesample (id INT, name text) WITH (fillfactor=10); -- force smaller pages so we don't have to load too much data to get multiple pages
+
+INSERT INTO test_tablesample SELECT i, repeat(i::text, 200) FROM generate_series(0, 9) s(i) ORDER BY i;
+
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50) REPEATABLE (10);
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (100.0/11) REPEATABLE (9999);
+SELECT count(*) FROM test_tablesample TABLESAMPLE SYSTEM (100);
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50) REPEATABLE (100);
+SELECT id FROM test_tablesample TABLESAMPLE BERNOULLI (50) REPEATABLE (100);
+SELECT id FROM test_tablesample TABLESAMPLE BERNOULLI (5.5) REPEATABLE (1);
+
+CREATE VIEW test_tablesample_v1 AS SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (10*2) REPEATABLE (2);
+CREATE VIEW test_tablesample_v2 AS SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (99);
+SELECT pg_get_viewdef('test_tablesample_v1'::regclass);
+SELECT pg_get_viewdef('test_tablesample_v2'::regclass);
+
+BEGIN;
+DECLARE tablesample_cur CURSOR FOR SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50) REPEATABLE (100);
+FETCH FIRST FROM tablesample_cur;
+FETCH NEXT FROM tablesample_cur;
+FETCH NEXT FROM tablesample_cur;
+
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50) REPEATABLE (10);
+
+FETCH NEXT FROM tablesample_cur;
+FETCH NEXT FROM tablesample_cur;
+FETCH NEXT FROM tablesample_cur;
+
+FETCH FIRST FROM tablesample_cur;
+FETCH NEXT FROM tablesample_cur;
+FETCH NEXT FROM tablesample_cur;
+FETCH NEXT FROM tablesample_cur;
+FETCH NEXT FROM tablesample_cur;
+FETCH NEXT FROM tablesample_cur;
+
+CLOSE tablesample_cur;
+END;
+
+-- should fail
+SELECT id FROM test_tablesample_v1 TABLESAMPLE BERNOULLI (1);
+
+DROP TABLE test_tablesample CASCADE;
--
1.9.1
0001-separate-block-sampling-functions-v2.patchtext/x-diff; name=0001-separate-block-sampling-functions-v2.patchDownload
>From 23a512b5a8417d0ef656381482d30b147b872cd8 Mon Sep 17 00:00:00 2001
From: Petr Jelinek <pjmodos@pjmodos.net>
Date: Wed, 7 Jan 2015 23:36:56 +0100
Subject: [PATCH 1/3] separate block sampling functions v2
---
contrib/file_fdw/file_fdw.c | 9 +-
contrib/postgres_fdw/postgres_fdw.c | 10 +-
src/backend/commands/analyze.c | 225 +----------------------------------
src/backend/utils/misc/Makefile | 2 +-
src/backend/utils/misc/sampling.c | 226 ++++++++++++++++++++++++++++++++++++
src/include/commands/vacuum.h | 3 -
src/include/utils/sampling.h | 44 +++++++
7 files changed, 287 insertions(+), 232 deletions(-)
create mode 100644 src/backend/utils/misc/sampling.c
create mode 100644 src/include/utils/sampling.h
diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index 4368897..249d541 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -34,6 +34,7 @@
#include "optimizer/var.h"
#include "utils/memutils.h"
#include "utils/rel.h"
+#include "utils/sampling.h"
PG_MODULE_MAGIC;
@@ -1005,7 +1006,7 @@ file_acquire_sample_rows(Relation onerel, int elevel,
{
int numrows = 0;
double rowstoskip = -1; /* -1 means not set yet */
- double rstate;
+ ReservoirStateData rstate;
TupleDesc tupDesc;
Datum *values;
bool *nulls;
@@ -1043,7 +1044,7 @@ file_acquire_sample_rows(Relation onerel, int elevel,
ALLOCSET_DEFAULT_MAXSIZE);
/* Prepare for sampling rows */
- rstate = anl_init_selection_state(targrows);
+ reservoir_init_selection_state(&rstate, targrows);
/* Set up callback to identify error line number. */
errcallback.callback = CopyFromErrorCallback;
@@ -1087,7 +1088,7 @@ file_acquire_sample_rows(Relation onerel, int elevel,
* not-yet-incremented value of totalrows as t.
*/
if (rowstoskip < 0)
- rowstoskip = anl_get_next_S(*totalrows, targrows, &rstate);
+ rowstoskip = reservoir_get_next_S(&rstate, *totalrows, targrows);
if (rowstoskip <= 0)
{
@@ -1095,7 +1096,7 @@ file_acquire_sample_rows(Relation onerel, int elevel,
* Found a suitable tuple, so save it, replacing one old tuple
* at random
*/
- int k = (int) (targrows * anl_random_fract());
+ int k = (int) (targrows * sampler_random_fract());
Assert(k >= 0 && k < targrows);
heap_freetuple(rows[k]);
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 63f0577..59aaff7 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -37,6 +37,7 @@
#include "utils/lsyscache.h"
#include "utils/memutils.h"
#include "utils/rel.h"
+#include "utils/sampling.h"
PG_MODULE_MAGIC;
@@ -202,7 +203,7 @@ typedef struct PgFdwAnalyzeState
/* for random sampling */
double samplerows; /* # of rows fetched */
double rowstoskip; /* # of rows to skip before next sample */
- double rstate; /* random state */
+ ReservoirStateData rstate; /* state for reservoir sampling*/
/* working memory contexts */
MemoryContext anl_cxt; /* context for per-analyze lifespan data */
@@ -2393,7 +2394,7 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,
astate.numrows = 0;
astate.samplerows = 0;
astate.rowstoskip = -1; /* -1 means not set yet */
- astate.rstate = anl_init_selection_state(targrows);
+ reservoir_init_selection_state(&astate.rstate, targrows);
/* Remember ANALYZE context, and create a per-tuple temp context */
astate.anl_cxt = CurrentMemoryContext;
@@ -2533,13 +2534,12 @@ analyze_row_processor(PGresult *res, int row, PgFdwAnalyzeState *astate)
* analyze.c; see Jeff Vitter's paper.
*/
if (astate->rowstoskip < 0)
- astate->rowstoskip = anl_get_next_S(astate->samplerows, targrows,
- &astate->rstate);
+ astate->rowstoskip = reservoir_get_next_S(&astate->rstate, astate->samplerows, targrows);
if (astate->rowstoskip <= 0)
{
/* Choose a random reservoir element to replace. */
- pos = (int) (targrows * anl_random_fract());
+ pos = (int) (targrows * sampler_random_fract());
Assert(pos >= 0 && pos < targrows);
heap_freetuple(astate->rows[pos]);
}
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index d2856a3..fc9dd44 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -50,23 +50,13 @@
#include "utils/lsyscache.h"
#include "utils/memutils.h"
#include "utils/pg_rusage.h"
+#include "utils/sampling.h"
#include "utils/sortsupport.h"
#include "utils/syscache.h"
#include "utils/timestamp.h"
#include "utils/tqual.h"
-/* Data structure for Algorithm S from Knuth 3.4.2 */
-typedef struct
-{
- BlockNumber N; /* number of blocks, known in advance */
- int n; /* desired sample size */
- BlockNumber t; /* current block number */
- int m; /* blocks selected so far */
-} BlockSamplerData;
-
-typedef BlockSamplerData *BlockSampler;
-
/* Per-index data for ANALYZE */
typedef struct AnlIndexData
{
@@ -88,10 +78,6 @@ static BufferAccessStrategy vac_strategy;
static void do_analyze_rel(Relation onerel, VacuumStmt *vacstmt,
AcquireSampleRowsFunc acquirefunc, BlockNumber relpages,
bool inh, bool in_outer_xact, int elevel);
-static void BlockSampler_Init(BlockSampler bs, BlockNumber nblocks,
- int samplesize);
-static bool BlockSampler_HasMore(BlockSampler bs);
-static BlockNumber BlockSampler_Next(BlockSampler bs);
static void compute_index_stats(Relation onerel, double totalrows,
AnlIndexData *indexdata, int nindexes,
HeapTuple *rows, int numrows,
@@ -947,94 +933,6 @@ examine_attribute(Relation onerel, int attnum, Node *index_expr)
}
/*
- * BlockSampler_Init -- prepare for random sampling of blocknumbers
- *
- * BlockSampler is used for stage one of our new two-stage tuple
- * sampling mechanism as discussed on pgsql-hackers 2004-04-02 (subject
- * "Large DB"). It selects a random sample of samplesize blocks out of
- * the nblocks blocks in the table. If the table has less than
- * samplesize blocks, all blocks are selected.
- *
- * Since we know the total number of blocks in advance, we can use the
- * straightforward Algorithm S from Knuth 3.4.2, rather than Vitter's
- * algorithm.
- */
-static void
-BlockSampler_Init(BlockSampler bs, BlockNumber nblocks, int samplesize)
-{
- bs->N = nblocks; /* measured table size */
-
- /*
- * If we decide to reduce samplesize for tables that have less or not much
- * more than samplesize blocks, here is the place to do it.
- */
- bs->n = samplesize;
- bs->t = 0; /* blocks scanned so far */
- bs->m = 0; /* blocks selected so far */
-}
-
-static bool
-BlockSampler_HasMore(BlockSampler bs)
-{
- return (bs->t < bs->N) && (bs->m < bs->n);
-}
-
-static BlockNumber
-BlockSampler_Next(BlockSampler bs)
-{
- BlockNumber K = bs->N - bs->t; /* remaining blocks */
- int k = bs->n - bs->m; /* blocks still to sample */
- double p; /* probability to skip block */
- double V; /* random */
-
- Assert(BlockSampler_HasMore(bs)); /* hence K > 0 and k > 0 */
-
- if ((BlockNumber) k >= K)
- {
- /* need all the rest */
- bs->m++;
- return bs->t++;
- }
-
- /*----------
- * It is not obvious that this code matches Knuth's Algorithm S.
- * Knuth says to skip the current block with probability 1 - k/K.
- * If we are to skip, we should advance t (hence decrease K), and
- * repeat the same probabilistic test for the next block. The naive
- * implementation thus requires an anl_random_fract() call for each block
- * number. But we can reduce this to one anl_random_fract() call per
- * selected block, by noting that each time the while-test succeeds,
- * we can reinterpret V as a uniform random number in the range 0 to p.
- * Therefore, instead of choosing a new V, we just adjust p to be
- * the appropriate fraction of its former value, and our next loop
- * makes the appropriate probabilistic test.
- *
- * We have initially K > k > 0. If the loop reduces K to equal k,
- * the next while-test must fail since p will become exactly zero
- * (we assume there will not be roundoff error in the division).
- * (Note: Knuth suggests a "<=" loop condition, but we use "<" just
- * to be doubly sure about roundoff error.) Therefore K cannot become
- * less than k, which means that we cannot fail to select enough blocks.
- *----------
- */
- V = anl_random_fract();
- p = 1.0 - (double) k / (double) K;
- while (V < p)
- {
- /* skip */
- bs->t++;
- K--; /* keep K == N - t */
-
- /* adjust p to be new cutoff point in reduced range */
- p *= 1.0 - (double) k / (double) K;
- }
-
- /* select */
- bs->m++;
- return bs->t++;
-}
-
-/*
* acquire_sample_rows -- acquire a random sample of rows from the table
*
* Selected rows are returned in the caller-allocated array rows[], which
@@ -1080,7 +978,7 @@ acquire_sample_rows(Relation onerel, int elevel,
BlockNumber totalblocks;
TransactionId OldestXmin;
BlockSamplerData bs;
- double rstate;
+ ReservoirStateData rstate;
Assert(targrows > 0);
@@ -1090,9 +988,9 @@ acquire_sample_rows(Relation onerel, int elevel,
OldestXmin = GetOldestXmin(onerel, true);
/* Prepare for sampling block numbers */
- BlockSampler_Init(&bs, totalblocks, targrows);
+ BlockSampler_Init(&bs, totalblocks, targrows, random());
/* Prepare for sampling rows */
- rstate = anl_init_selection_state(targrows);
+ reservoir_init_selection_state(&rstate, targrows);
/* Outer loop over blocks to sample */
while (BlockSampler_HasMore(&bs))
@@ -1240,8 +1138,7 @@ acquire_sample_rows(Relation onerel, int elevel,
* t.
*/
if (rowstoskip < 0)
- rowstoskip = anl_get_next_S(samplerows, targrows,
- &rstate);
+ rowstoskip = reservoir_get_next_S(&rstate, samplerows, targrows);
if (rowstoskip <= 0)
{
@@ -1249,7 +1146,7 @@ acquire_sample_rows(Relation onerel, int elevel,
* Found a suitable tuple, so save it, replacing one
* old tuple at random
*/
- int k = (int) (targrows * anl_random_fract());
+ int k = (int) (targrows * sampler_random_fract());
Assert(k >= 0 && k < targrows);
heap_freetuple(rows[k]);
@@ -1308,116 +1205,6 @@ acquire_sample_rows(Relation onerel, int elevel,
return numrows;
}
-/* Select a random value R uniformly distributed in (0 - 1) */
-double
-anl_random_fract(void)
-{
- return ((double) random() + 1) / ((double) MAX_RANDOM_VALUE + 2);
-}
-
-/*
- * These two routines embody Algorithm Z from "Random sampling with a
- * reservoir" by Jeffrey S. Vitter, in ACM Trans. Math. Softw. 11, 1
- * (Mar. 1985), Pages 37-57. Vitter describes his algorithm in terms
- * of the count S of records to skip before processing another record.
- * It is computed primarily based on t, the number of records already read.
- * The only extra state needed between calls is W, a random state variable.
- *
- * anl_init_selection_state computes the initial W value.
- *
- * Given that we've already read t records (t >= n), anl_get_next_S
- * determines the number of records to skip before the next record is
- * processed.
- */
-double
-anl_init_selection_state(int n)
-{
- /* Initial value of W (for use when Algorithm Z is first applied) */
- return exp(-log(anl_random_fract()) / n);
-}
-
-double
-anl_get_next_S(double t, int n, double *stateptr)
-{
- double S;
-
- /* The magic constant here is T from Vitter's paper */
- if (t <= (22.0 * n))
- {
- /* Process records using Algorithm X until t is large enough */
- double V,
- quot;
-
- V = anl_random_fract(); /* Generate V */
- S = 0;
- t += 1;
- /* Note: "num" in Vitter's code is always equal to t - n */
- quot = (t - (double) n) / t;
- /* Find min S satisfying (4.1) */
- while (quot > V)
- {
- S += 1;
- t += 1;
- quot *= (t - (double) n) / t;
- }
- }
- else
- {
- /* Now apply Algorithm Z */
- double W = *stateptr;
- double term = t - (double) n + 1;
-
- for (;;)
- {
- double numer,
- numer_lim,
- denom;
- double U,
- X,
- lhs,
- rhs,
- y,
- tmp;
-
- /* Generate U and X */
- U = anl_random_fract();
- X = t * (W - 1.0);
- S = floor(X); /* S is tentatively set to floor(X) */
- /* Test if U <= h(S)/cg(X) in the manner of (6.3) */
- tmp = (t + 1) / term;
- lhs = exp(log(((U * tmp * tmp) * (term + S)) / (t + X)) / n);
- rhs = (((t + X) / (term + S)) * term) / t;
- if (lhs <= rhs)
- {
- W = rhs / lhs;
- break;
- }
- /* Test if U <= f(S)/cg(X) */
- y = (((U * (t + 1)) / term) * (t + S + 1)) / (t + X);
- if ((double) n < S)
- {
- denom = t;
- numer_lim = term + S;
- }
- else
- {
- denom = t - (double) n + S;
- numer_lim = t + 1;
- }
- for (numer = t + S; numer >= numer_lim; numer -= 1)
- {
- y *= numer / denom;
- denom -= 1;
- }
- W = exp(-log(anl_random_fract()) / n); /* Generate W in advance */
- if (exp(log(y) / n) <= (t + X) / t)
- break;
- }
- *stateptr = W;
- }
- return S;
-}
-
/*
* qsort comparator for sorting rows[] array
*/
diff --git a/src/backend/utils/misc/Makefile b/src/backend/utils/misc/Makefile
index 378b77e..7889101 100644
--- a/src/backend/utils/misc/Makefile
+++ b/src/backend/utils/misc/Makefile
@@ -15,7 +15,7 @@ include $(top_builddir)/src/Makefile.global
override CPPFLAGS := -I. -I$(srcdir) $(CPPFLAGS)
OBJS = guc.o help_config.o pg_rusage.o ps_status.o rls.o \
- superuser.o timeout.o tzparser.o
+ sampling.o superuser.o timeout.o tzparser.o
# This location might depend on the installation directories. Therefore
# we can't subsitute it into pg_config.h.
diff --git a/src/backend/utils/misc/sampling.c b/src/backend/utils/misc/sampling.c
new file mode 100644
index 0000000..1eeabaf
--- /dev/null
+++ b/src/backend/utils/misc/sampling.c
@@ -0,0 +1,226 @@
+/*-------------------------------------------------------------------------
+ *
+ * sampling.c
+ * Relation block sampling routines.
+ *
+ * Portions Copyright (c) 1996-2012, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ * src/backend/utils/misc/sampling.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include <math.h>
+
+#include "utils/sampling.h"
+
+
+/*
+ * BlockSampler_Init -- prepare for random sampling of blocknumbers
+ *
+ * BlockSampler provides algorithm for block level sampling of a relation
+ * as discussed on pgsql-hackers 2004-04-02 (subject "Large DB")
+ * It selects a random sample of samplesize blocks out of
+ * the nblocks blocks in the table. If the table has less than
+ * samplesize blocks, all blocks are selected.
+ *
+ * Since we know the total number of blocks in advance, we can use the
+ * straightforward Algorithm S from Knuth 3.4.2, rather than Vitter's
+ * algorithm.
+ */
+void
+BlockSampler_Init(BlockSampler bs, BlockNumber nblocks, int samplesize,
+ long randseed)
+{
+ bs->N = nblocks; /* measured table size */
+
+ /*
+ * If we decide to reduce samplesize for tables that have less or not much
+ * more than samplesize blocks, here is the place to do it.
+ */
+ bs->n = samplesize;
+ bs->t = 0; /* blocks scanned so far */
+ bs->m = 0; /* blocks selected so far */
+}
+
+bool
+BlockSampler_HasMore(BlockSampler bs)
+{
+ return (bs->t < bs->N) && (bs->m < bs->n);
+}
+
+BlockNumber
+BlockSampler_Next(BlockSampler bs)
+{
+ BlockNumber K = bs->N - bs->t; /* remaining blocks */
+ int k = bs->n - bs->m; /* blocks still to sample */
+ double p; /* probability to skip block */
+ double V; /* random */
+
+ Assert(BlockSampler_HasMore(bs)); /* hence K > 0 and k > 0 */
+
+ if ((BlockNumber) k >= K)
+ {
+ /* need all the rest */
+ bs->m++;
+ return bs->t++;
+ }
+
+ /*----------
+ * It is not obvious that this code matches Knuth's Algorithm S.
+ * Knuth says to skip the current block with probability 1 - k/K.
+ * If we are to skip, we should advance t (hence decrease K), and
+ * repeat the same probabilistic test for the next block. The naive
+ * implementation thus requires an sampler_random_fract() call for each
+ * block number. But we can reduce this to one sampler_random_fract()
+ * call per selected block, by noting that each time the while-test
+ * succeeds, we can reinterpret V as a uniform random number in the range
+ * 0 to p. Therefore, instead of choosing a new V, we just adjust p to be
+ * the appropriate fraction of its former value, and our next loop
+ * makes the appropriate probabilistic test.
+ *
+ * We have initially K > k > 0. If the loop reduces K to equal k,
+ * the next while-test must fail since p will become exactly zero
+ * (we assume there will not be roundoff error in the division).
+ * (Note: Knuth suggests a "<=" loop condition, but we use "<" just
+ * to be doubly sure about roundoff error.) Therefore K cannot become
+ * less than k, which means that we cannot fail to select enough blocks.
+ *----------
+ */
+ V = sampler_random_fract();
+ p = 1.0 - (double) k / (double) K;
+ while (V < p)
+ {
+ /* skip */
+ bs->t++;
+ K--; /* keep K == N - t */
+
+ /* adjust p to be new cutoff point in reduced range */
+ p *= 1.0 - (double) k / (double) K;
+ }
+
+ /* select */
+ bs->m++;
+ return bs->t++;
+}
+
+/*
+ * These two routines embody Algorithm Z from "Random sampling with a
+ * reservoir" by Jeffrey S. Vitter, in ACM Trans. Math. Softw. 11, 1
+ * (Mar. 1985), Pages 37-57. Vitter describes his algorithm in terms
+ * of the count S of records to skip before processing another record.
+ * It is computed primarily based on t, the number of records already read.
+ * The only extra state needed between calls is W, a random state variable.
+ *
+ * reservoir_init_selection_state computes the initial W value.
+ *
+ * Given that we've already read t records (t >= n), reservoir_get_next_S
+ * determines the number of records to skip before the next record is
+ * processed.
+ */
+void
+reservoir_init_selection_state(ReservoirState rs, int n)
+{
+ /* Initial value of W (for use when Algorithm Z is first applied) */
+ *rs = exp(-log(sampler_random_fract()) / n);
+}
+
+double
+reservoir_get_next_S(ReservoirState rs, double t, int n)
+{
+ double S;
+
+ /* The magic constant here is T from Vitter's paper */
+ if (t <= (22.0 * n))
+ {
+ /* Process records using Algorithm X until t is large enough */
+ double V,
+ quot;
+
+ V = sampler_random_fract(); /* Generate V */
+ S = 0;
+ t += 1;
+ /* Note: "num" in Vitter's code is always equal to t - n */
+ quot = (t - (double) n) / t;
+ /* Find min S satisfying (4.1) */
+ while (quot > V)
+ {
+ S += 1;
+ t += 1;
+ quot *= (t - (double) n) / t;
+ }
+ }
+ else
+ {
+ /* Now apply Algorithm Z */
+ double W = *rs;
+ double term = t - (double) n + 1;
+
+ for (;;)
+ {
+ double numer,
+ numer_lim,
+ denom;
+ double U,
+ X,
+ lhs,
+ rhs,
+ y,
+ tmp;
+
+ /* Generate U and X */
+ U = sampler_random_fract();
+ X = t * (W - 1.0);
+ S = floor(X); /* S is tentatively set to floor(X) */
+ /* Test if U <= h(S)/cg(X) in the manner of (6.3) */
+ tmp = (t + 1) / term;
+ lhs = exp(log(((U * tmp * tmp) * (term + S)) / (t + X)) / n);
+ rhs = (((t + X) / (term + S)) * term) / t;
+ if (lhs <= rhs)
+ {
+ W = rhs / lhs;
+ break;
+ }
+ /* Test if U <= f(S)/cg(X) */
+ y = (((U * (t + 1)) / term) * (t + S + 1)) / (t + X);
+ if ((double) n < S)
+ {
+ denom = t;
+ numer_lim = term + S;
+ }
+ else
+ {
+ denom = t - (double) n + S;
+ numer_lim = t + 1;
+ }
+ for (numer = t + S; numer >= numer_lim; numer -= 1)
+ {
+ y *= numer / denom;
+ denom -= 1;
+ }
+ W = exp(-log(sampler_random_fract()) / n); /* Generate W in advance */
+ if (exp(log(y) / n) <= (t + X) / t)
+ break;
+ }
+ *rs = W;
+ }
+ return S;
+}
+
+
+/*----------
+ * Random number generator used by sampling
+ *----------
+ */
+
+/* Select a random value R uniformly distributed in (0 - 1) */
+double
+sampler_random_fract()
+{
+ return ((double) random() + 1) / ((double) MAX_RANDOM_VALUE + 2);
+}
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index 4275484..d38fead 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -178,8 +178,5 @@ extern void lazy_vacuum_rel(Relation onerel, VacuumStmt *vacstmt,
extern void analyze_rel(Oid relid, VacuumStmt *vacstmt,
bool in_outer_xact, BufferAccessStrategy bstrategy);
extern bool std_typanalyze(VacAttrStats *stats);
-extern double anl_random_fract(void);
-extern double anl_init_selection_state(int n);
-extern double anl_get_next_S(double t, int n, double *stateptr);
#endif /* VACUUM_H */
diff --git a/src/include/utils/sampling.h b/src/include/utils/sampling.h
new file mode 100644
index 0000000..e3e7f9c
--- /dev/null
+++ b/src/include/utils/sampling.h
@@ -0,0 +1,44 @@
+/*-------------------------------------------------------------------------
+ *
+ * sampling.h
+ * definitions for sampling functions
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/utils/sampling.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef SAMPLING_H
+#define SAMPLING_H
+
+#include "storage/bufmgr.h"
+
+extern double sampler_random_fract(void);
+
+/* Block sampling methods */
+/* Data structure for Algorithm S from Knuth 3.4.2 */
+typedef struct
+{
+ BlockNumber N; /* number of blocks, known in advance */
+ int n; /* desired sample size */
+ BlockNumber t; /* current block number */
+ int m; /* blocks selected so far */
+} BlockSamplerData;
+
+typedef BlockSamplerData *BlockSampler;
+
+extern void BlockSampler_Init(BlockSampler bs, BlockNumber nblocks,
+ int samplesize, long randseed);
+extern bool BlockSampler_HasMore(BlockSampler bs);
+extern BlockNumber BlockSampler_Next(BlockSampler bs);
+
+/* Reservoid sampling methods */
+typedef double ReservoirStateData;
+typedef ReservoirStateData *ReservoirState;
+
+extern void reservoir_init_selection_state(ReservoirState rs, int n);
+extern double reservoir_get_next_S(ReservoirState rs, double t, int n);
+
+#endif /* SAMPLING_H */
--
1.9.1
On 03/15/15 16:21, Petr Jelinek wrote:
I also did all the other adjustments we talked about up-thread and
rebased against current master (there was conflict with 31eae6028).
Hi,
I did a review of the version submitted on 03/15 today, and only found a
few minor issues:
1) The documentation of the pg_tablesample_method catalog is missing
documentation of the 'tsmpagemode' column, which was added later.
2) transformTableEntry() in parse_clause modifies the comment, in a way
that made sense before part of the code was moved to a separate
function. I suggest to revert the comment changes, and instead add
the comment to transformTableSampleEntry()
3) The "shared" parts of the block sampler in sampling.c (e.g. in
BlockSampler_Next) reference Vitter's algorithm (both the code and
comments) which is a bit awkward as the only part that uses it is
analyze.c. The other samplers using this code (system / bernoulli)
don't use Vitter's algorithm.
I don't think it's possible to separate this piece of code, though.
It simply has to be in there somewhere, so I'd recommend adding here
a simple comment explaining that it's needed because of analyze.c.
Otherwise I think the patch is OK. I'll wait for Petr to fix these
issues, and then mark it as ready for committer.
What do you think, Amit? (BTW you should probably add yourself as
reviewer in the CF app, as you've provided a lot of feedback here.)
--
Tomas Vondra http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Apr 1, 2015 at 6:31 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com>
wrote:
On 03/15/15 16:21, Petr Jelinek wrote:
I also did all the other adjustments we talked about up-thread and
rebased against current master (there was conflict with 31eae6028).Hi,
I did a review of the version submitted on 03/15 today, and only found a
few minor issues:
1) The documentation of the pg_tablesample_method catalog is missing
documentation of the 'tsmpagemode' column, which was added later.2) transformTableEntry() in parse_clause modifies the comment, in a way
that made sense before part of the code was moved to a separate
function. I suggest to revert the comment changes, and instead add
the comment to transformTableSampleEntry()3) The "shared" parts of the block sampler in sampling.c (e.g. in
BlockSampler_Next) reference Vitter's algorithm (both the code and
comments) which is a bit awkward as the only part that uses it is
analyze.c. The other samplers using this code (system / bernoulli)
don't use Vitter's algorithm.I don't think it's possible to separate this piece of code, though.
It simply has to be in there somewhere, so I'd recommend adding here
a simple comment explaining that it's needed because of analyze.c.Otherwise I think the patch is OK. I'll wait for Petr to fix these
issues, and then mark it as ready for committer.
What do you think, Amit? (BTW you should probably add yourself as
reviewer in the CF app, as you've provided a lot of feedback here.)
I am still not sure whether it is okay to move REPEATABLE from
unreserved to other category. In-fact last weekend I have spent some
time to see the exact reason for shift/reduce errors and tried some ways
but didn't find a way to get away with the same. Now I am planning to
spend some more time on the same probably in next few days and then
still if I cannot find a way, I will share my findings and then once
re-review
the changes made by Petr in last version. I think overall the patch is in
good shape now although I haven't looked into DDL support part of the
patch which I thought could be done in a separate patch as well.
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Wed, Apr 1, 2015 at 9:49 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
I am still not sure whether it is okay to move REPEATABLE from
unreserved to other category. In-fact last weekend I have spent some
time to see the exact reason for shift/reduce errors and tried some ways
but didn't find a way to get away with the same. Now I am planning to
spend some more time on the same probably in next few days and then
still if I cannot find a way, I will share my findings and then once
re-review
the changes made by Petr in last version. I think overall the patch is in
good shape now although I haven't looked into DDL support part of the
patch which I thought could be done in a separate patch as well.
That seems like a legitimate concern. We usually try not to make
keywords more reserved in PostgreSQL than they are in the SQL
standard, and REPEATABLE is apparently non-reserved there:
http://www.postgresql.org/docs/devel/static/sql-keywords-appendix.html
This also makes "method" an unreserved keyword, which I'm not wild
about either. Adding new keyword doesn't cost *much*, but is this
SQL-mandated syntax or something we created? If the latter, can we
find something to call it that doesn't require new keywords?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 01/04/15 17:52, Robert Haas wrote:
On Wed, Apr 1, 2015 at 9:49 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
I am still not sure whether it is okay to move REPEATABLE from
unreserved to other category. In-fact last weekend I have spent some
time to see the exact reason for shift/reduce errors and tried some ways
but didn't find a way to get away with the same. Now I am planning to
spend some more time on the same probably in next few days and then
still if I cannot find a way, I will share my findings and then once
re-review
the changes made by Petr in last version. I think overall the patch is in
good shape now although I haven't looked into DDL support part of the
patch which I thought could be done in a separate patch as well.That seems like a legitimate concern. We usually try not to make
keywords more reserved in PostgreSQL than they are in the SQL
standard, and REPEATABLE is apparently non-reserved there:http://www.postgresql.org/docs/devel/static/sql-keywords-appendix.html
This also makes "method" an unreserved keyword, which I'm not wild
about either. Adding new keyword doesn't cost *much*, but is this
SQL-mandated syntax or something we created? If the latter, can we
find something to call it that doesn't require new keywords?
REPEATABLE is mandated by standard. I did try for quite some time to
make it unreserved but was not successful (I can only make it unreserved
if I make it mandatory but that's not a solution). I haven't been in
fact even able to find out what it actually conflicts with...
METHOD is something I added. I guess we could find a way to name this
differently if we really tried. The reason why I picked METHOD was that
I already added the same unreserved keyword in the sequence AMs patch
and in that one any other name does not really make sense.
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Apr 1, 2015 at 12:15 PM, Petr Jelinek <petr@2ndquadrant.com> wrote:
REPEATABLE is mandated by standard. I did try for quite some time to make it
unreserved but was not successful (I can only make it unreserved if I make
it mandatory but that's not a solution). I haven't been in fact even able to
find out what it actually conflicts with...
Yeah, that can be hard to figure out. Did you run bison with -v and
poke around in gram.output?
METHOD is something I added. I guess we could find a way to name this
differently if we really tried. The reason why I picked METHOD was that I
already added the same unreserved keyword in the sequence AMs patch and in
that one any other name does not really make sense.
I see. Well, I guess if we've got more than one use for it it's not so bad.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 01/04/15 18:38, Robert Haas wrote:
On Wed, Apr 1, 2015 at 12:15 PM, Petr Jelinek <petr@2ndquadrant.com> wrote:
REPEATABLE is mandated by standard. I did try for quite some time to make it
unreserved but was not successful (I can only make it unreserved if I make
it mandatory but that's not a solution). I haven't been in fact even able to
find out what it actually conflicts with...Yeah, that can be hard to figure out. Did you run bison with -v and
poke around in gram.output?
Oh, no I didn't (I didn't know gram.output will be generated). This
helped quite a bit. Thanks.
I now found the reason, it's conflicting with alias but that's my
mistake - alias should be before TABLESAMPLE clause as per standard and
I put it after in parser. Now that I put it at correct place REPEATABLE
can be unreserved keyword. This change requires making TABLESAMPLE a
"type_func_name_keyword" but that's probably not really an issue as
TABLESAMPLE is reserved keyword per standard.
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi,
so here is version 11.
Addressing Tomas' comments:
1) The documentation of the pg_tablesample_method catalog is missing
documentation of the 'tsmpagemode' column, which was added later.
Fixed.
2) transformTableEntry() in parse_clause modifies the comment, in a way
that made sense before part of the code was moved to a separate
function. I suggest to revert the comment changes, and instead add
the comment to transformTableSampleEntry()
Fixed.
3) The "shared" parts of the block sampler in sampling.c (e.g. in
BlockSampler_Next) reference Vitter's algorithm (both the code and
comments) which is a bit awkward as the only part that uses it is
analyze.c. The other samplers using this code (system / bernoulli)
don't use Vitter's algorithm.
Actually the Vitter's reservoir is implemented by
reservoir_init_selection_state and reservoir_get_next_S functions in the
sampling.c and is used by analyze, file_fdw and postgres_fdw. It was
previously exported from analyze.h/c but I think it's better to have it
together with the block sampling so that we have all the sampling
methods together.
As I mentioned before in this thread I fixed the REPEATABLE keyword
issue and fixed alias (AS) positioning in parser.
I also attached skeleton docs for the API, it could use more work, but I
am afraid I won't come up with something significantly better soon.
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
0004-tablesample-api-doc-v1.patchtext/x-diff; name=0004-tablesample-api-doc-v1.patchDownload
>From 813e641c9ae202b8bfb4be4b978bb2c22a60eea4 Mon Sep 17 00:00:00 2001
From: Petr Jelinek <pjmodos@pjmodos.net>
Date: Sun, 15 Mar 2015 17:39:22 +0100
Subject: [PATCH 4/4] tablesample api doc v1
---
doc/src/sgml/filelist.sgml | 1 +
doc/src/sgml/postgres.sgml | 1 +
doc/src/sgml/tablesample-method.sgml | 169 +++++++++++++++++++++++++++++++++++
3 files changed, 171 insertions(+)
create mode 100644 doc/src/sgml/tablesample-method.sgml
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 89fff77..23d932d 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -98,6 +98,7 @@
<!ENTITY protocol SYSTEM "protocol.sgml">
<!ENTITY sources SYSTEM "sources.sgml">
<!ENTITY storage SYSTEM "storage.sgml">
+<!ENTITY tablesample-method SYSTEM "tablesample-method.sgml">
<!-- contrib information -->
<!ENTITY contrib SYSTEM "contrib.sgml">
diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index e378d69..dc1f020 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -250,6 +250,7 @@
&gin;
&brin;
&storage;
+ &tablesample-method;
&bki;
&planstats;
diff --git a/doc/src/sgml/tablesample-method.sgml b/doc/src/sgml/tablesample-method.sgml
new file mode 100644
index 0000000..2d6d323
--- /dev/null
+++ b/doc/src/sgml/tablesample-method.sgml
@@ -0,0 +1,169 @@
+<!-- doc/src/sgml/tablesample-method.sgml -->
+
+<chapter id="tablesample-method">
+ <title>Writing A TABLESAMPLE Sampling Method</title>
+
+ <indexterm zone="tablesample-method">
+ <primary>tablesample method</primary>
+ </indexterm>
+
+ <para>
+ The <command>TABLESAMPLE</command> clause implementation in
+ <productname>PostgreSQL</> supports creating a custom sampling methods.
+ These methods control what sample of the table will be returned when the
+ <command>TABLESAMPLE</command> clause is used.
+ </para>
+
+ <sect1 id="tablesample-method-functions">
+ <title>Tablesample Method Functions</title>
+
+ <para>
+ The tablesample method must provide following set of functions:
+ </para>
+
+ <para>
+<programlisting>
+void
+tsm_init (SampleScanState *scanstate,
+ uint32 seed, ...);
+</programlisting>
+ Initialize the tablesample scan. The function is called at the beginning
+ of each relation scan.
+ </para>
+ <para>
+ Note that the first two parameters are required but you can specify
+ additional parameters which then will be used by the <command>TABLESAMPLE</>
+ clause to determine the required user input in the query itself.
+ This means that if your function will specify additional float4 parameter
+ named percent, the user will have to call the tablesample method with
+ expression which evaluates (or can be coerced) to float4.
+ For example this definition:
+<programlisting>
+tsm_init (SampleScanState *scanstate,
+ uint32 seed, float4 pct);
+</programlisting>
+Will lead to SQL call like this:
+<programlisting>
+... TABLESAMPLE yourmethod(0.5) ...
+</programlisting>
+ </para>
+
+ <para>
+<programlisting>
+BlockNumber
+tsm_nextblock (SampleScanState *scanstate);
+</programlisting>
+ Returns the block number of next page to be scanned. InvalidBlockNumber
+ should be returned if the sampling has reached end of the relation.
+ </para>
+
+ <para>
+<programlisting>
+OffsetNumber
+tsm_nexttuple (SampleScanState *scanstate, BlockNumber blockno,
+ OffsetNumber maxoffset);
+</programlisting>
+ Return next tuple offset for the current page. InvalidOffsetNumber should
+ be returned if the sampling has reached end of the page.
+ </para>
+
+ <para>
+<programlisting>
+void
+tsm_end (SampleScanState *scanstate);
+</programlisting>
+ The scan has finished, cleanup any left over state.
+ </para>
+
+ <para>
+<programlisting>
+void
+tsm_reset (SampleScanState *scanstate);
+</programlisting>
+ The scan needs to rescan the relation again, reset any tablesample method
+ state.
+ </para>
+
+ <para>
+<programlisting>
+void
+tsm_cost (PlannerInfo *root, Path *path, RelOptInfo *baserel,
+ List *args, BlockNumber *pages, double *tuples);
+</programlisting>
+ This function is used by optimizer to decide best plan and is also used
+ for output of <command>EXPLAIN</>.
+ </para>
+
+ <para>
+ There is function that tablesampling method can implement in order to gain
+ more fine grained control over sampling. This function is optional:
+ </para>
+
+ <para>
+<programlisting>
+bool
+tsm_examinetuple (SampleScanState *scanstate, BlockNumber blockno,
+ HeapTuple tuple, bool visible);
+</programlisting>
+ Function that enables the sampling method to examine contents of the tuple
+ (for example to collect some internal statistics). The return value of this
+ function is used to determine if the tuple should be returned to client.
+ Note that this function will receive even invisible tuples but it is not
+ allowed to return true for such tuple (if it does,
+ <productname>PostgreSQL</> will raise an error).
+ </para>
+
+ <para>
+ As you can see most of the tablesample method interfaces get the
+ <structname>SampleScanState</> as a first parameter. This structure holds
+ state of the current scan and also provides storage for the tablesample
+ method's state. It is defined as following:
+<programlisting>
+typedef struct SampleScanState
+{
+ ScanState ss;
+ FmgrInfo tsminit;
+ FmgrInfo tsmnextblock;
+ FmgrInfo tsmnexttuple;
+ FmgrInfo tsmexaminetuple;
+ FmgrInfo tsmend;
+ FmgrInfo tsmreset;
+ void *tsmdata;
+} SampleScanState;
+</programlisting>
+ Where <structfield>ss</> is the <structname>ScanState</> itself. From it, you
+ can get <structfield>ss_currentRelation</> (currently scanned relation) and
+ <structfield>ss_currentScanDesc</> (information about the scan).
+ Those are usually useful for the <function>tsm_init</> function.
+ The <structfield>tsminit</>, <structfield>tsmnextblock</>,
+ <structfield>tsmnexttuple</>, <structfield>tsmend</> and
+ <structfield>tsmreset</> are pointers to the tablesample method functions for
+ use by the sample scan itself and the tablesample method does not need to be
+ concerned about these values. The <structfield>tsmdata</> can be used by
+ tablesample method to store any state info it might need during the scan.
+ </para>
+ </sect1>
+
+ <sect1 id="tablesample-method-sql">
+ <title>Tablesample Method Installation</title>
+
+ <para>
+ Once you have written and built the custom tablesample method, you can
+ install it using the SQL statement
+ <xref linkend="sql-createtablesamplemethod"> and removed again using
+ <xref linkend="sql-droptablesamplemethod">.
+ </para>
+
+ </sect1>
+
+ <sect1 id="tablesample-method-example">
+ <title>Tablesample Method Example</title>
+
+ <para>
+ Example of how to implement custom tablesample method can be found in the
+ <productname>PostgreSQL</>'s sources under
+ <filename>src/test/modules/tablesample</> directory.
+ </para>
+ </sect1>
+
+</chapter>
--
1.9.1
0003-tablesample-ddl-v6.patchtext/x-diff; name=0003-tablesample-ddl-v6.patchDownload
>From 01040f72fffe090e9fcd409bfd43c5ed4da26897 Mon Sep 17 00:00:00 2001
From: Petr Jelinek <pjmodos@pjmodos.net>
Date: Wed, 7 Jan 2015 23:51:44 +0100
Subject: [PATCH 3/4] tablesample-ddl v6
---
doc/src/sgml/ref/allfiles.sgml | 2 +
doc/src/sgml/ref/create_tablesamplemethod.sgml | 184 +++++++++
doc/src/sgml/ref/drop_tablesamplemethod.sgml | 87 +++++
doc/src/sgml/reference.sgml | 2 +
src/backend/catalog/dependency.c | 15 +-
src/backend/catalog/objectaddress.c | 65 +++-
src/backend/commands/Makefile | 6 +-
src/backend/commands/dropcmds.c | 4 +
src/backend/commands/event_trigger.c | 3 +
src/backend/commands/tablecmds.c | 1 +
src/backend/commands/tablesample.c | 427 +++++++++++++++++++++
src/backend/parser/gram.y | 14 +-
src/backend/tcop/utility.c | 12 +
src/bin/pg_dump/common.c | 5 +
src/bin/pg_dump/pg_dump.c | 177 +++++++++
src/bin/pg_dump/pg_dump.h | 11 +-
src/bin/pg_dump/pg_dump_sort.c | 11 +-
src/include/catalog/dependency.h | 1 +
src/include/catalog/pg_tablesample_method.h | 11 +
src/include/nodes/parsenodes.h | 1 +
src/include/parser/kwlist.h | 1 +
src/test/modules/Makefile | 3 +-
src/test/modules/tablesample/.gitignore | 4 +
src/test/modules/tablesample/Makefile | 21 +
.../modules/tablesample/expected/tablesample.out | 38 ++
src/test/modules/tablesample/sql/tablesample.sql | 14 +
src/test/modules/tablesample/tsm_test--1.0.sql | 52 +++
src/test/modules/tablesample/tsm_test.c | 228 +++++++++++
src/test/modules/tablesample/tsm_test.control | 5 +
29 files changed, 1394 insertions(+), 11 deletions(-)
create mode 100644 doc/src/sgml/ref/create_tablesamplemethod.sgml
create mode 100644 doc/src/sgml/ref/drop_tablesamplemethod.sgml
create mode 100644 src/backend/commands/tablesample.c
create mode 100644 src/test/modules/tablesample/.gitignore
create mode 100644 src/test/modules/tablesample/Makefile
create mode 100644 src/test/modules/tablesample/expected/tablesample.out
create mode 100644 src/test/modules/tablesample/sql/tablesample.sql
create mode 100644 src/test/modules/tablesample/tsm_test--1.0.sql
create mode 100644 src/test/modules/tablesample/tsm_test.c
create mode 100644 src/test/modules/tablesample/tsm_test.control
diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index 5b4692f..d31a2db 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -78,6 +78,7 @@ Complete list of usable sgml source files in this directory.
<!ENTITY createServer SYSTEM "create_server.sgml">
<!ENTITY createTable SYSTEM "create_table.sgml">
<!ENTITY createTableAs SYSTEM "create_table_as.sgml">
+<!ENTITY createTablesampleMethod SYSTEM "create_tablesamplemethod.sgml">
<!ENTITY createTableSpace SYSTEM "create_tablespace.sgml">
<!ENTITY createTrigger SYSTEM "create_trigger.sgml">
<!ENTITY createTSConfig SYSTEM "create_tsconfig.sgml">
@@ -119,6 +120,7 @@ Complete list of usable sgml source files in this directory.
<!ENTITY dropSequence SYSTEM "drop_sequence.sgml">
<!ENTITY dropServer SYSTEM "drop_server.sgml">
<!ENTITY dropTable SYSTEM "drop_table.sgml">
+<!ENTITY dropTablesampleMethod SYSTEM "drop_tablesamplemethod.sgml">
<!ENTITY dropTableSpace SYSTEM "drop_tablespace.sgml">
<!ENTITY dropTrigger SYSTEM "drop_trigger.sgml">
<!ENTITY dropTSConfig SYSTEM "drop_tsconfig.sgml">
diff --git a/doc/src/sgml/ref/create_tablesamplemethod.sgml b/doc/src/sgml/ref/create_tablesamplemethod.sgml
new file mode 100644
index 0000000..ff105d2
--- /dev/null
+++ b/doc/src/sgml/ref/create_tablesamplemethod.sgml
@@ -0,0 +1,184 @@
+<!--
+doc/src/sgml/ref/create_tablesamplemethod.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="SQL-CREATETABLESAMPLEMETHOD">
+ <indexterm zone="sql-createtablesamplemethod">
+ <primary>CREATE TABLESAMPLE METHOD</primary>
+ </indexterm>
+
+ <refmeta>
+ <refentrytitle>CREATE TABLESAMPLE METHOD</refentrytitle>
+ <manvolnum>7</manvolnum>
+ <refmiscinfo>SQL - Language Statements</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+ <refname>CREATE TABLESAMPLE METHOD</refname>
+ <refpurpose>define custom tablesample method</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+<synopsis>
+CREATE TABLESAMPLE METHOD <replaceable class="parameter">name</replaceable> (
+ INIT = <replaceable class="parameter">init_function</replaceable> ,
+ NEXTBLOCK = <replaceable class="parameter">nextblock_function</replaceable> ,
+ NEXTTUPLE = <replaceable class="parameter">nexttuple_function</replaceable> ,
+ END = <replaceable class="parameter">end_function</replaceable> ,
+ RESET = <replaceable class="parameter">reset_function</replaceable> ,
+ COST = <replaceable class="parameter">cost_function</replaceable>
+ [ , EXAMINETUPLE = <replaceable class="parameter">examinetuple_function</replaceable> ]
+ [ , SEQSCAN = <replaceable class="parameter">seqscan</replaceable> ]
+ [ , PAGEMODE = <replaceable class="parameter">pagemode</replaceable> ]
+)
+</synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+ <title>Description</title>
+
+ <para>
+ <command>CREATE TABLESAMPLE METHOD</command> creates a tablesample method.
+ A tablesample method provides alrorithm for reading sample part of a table
+ when used in <command>TABLESAMPLE</> clause of a <command>SELECT</>
+ statement.
+ </para>
+
+ <para>
+ You must be a superuser to use <command>CREATE TABLESAMPLE METHOD</command>.
+ </para>
+ </refsect1>
+
+ <refsect1>
+ <title>Parameters</title>
+
+ <variablelist>
+ <varlistentry>
+ <term><replaceable class="parameter">name</replaceable></term>
+ <listitem>
+ <para>
+ The name of the tablesample method to be created. This name must be
+ unique within the database.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable class="parameter">init_function</replaceable></term>
+ <listitem>
+ <para>
+ The name of the init function for the tablesample method.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable class="parameter">nextblock_function</replaceable></term>
+ <listitem>
+ <para>
+ The name of the get-next-block function for the tablesample method.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable class="parameter">nexttuple_function</replaceable></term>
+ <listitem>
+ <para>
+ The name of the get-next-tuple function for the tablesample method.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable class="parameter">end_function</replaceable></term>
+ <listitem>
+ <para>
+ The name of the end function for the tablesample method.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable class="parameter">reset_function</replaceable></term>
+ <listitem>
+ <para>
+ The name of the reset function for the tablesample method.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable class="parameter">cost_function</replaceable></term>
+ <listitem>
+ <para>
+ The name of the costing function for the tablesample method.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable class="parameter">examinetuple_function</replaceable></term>
+ <listitem>
+ <para>
+ The name of the function for inspecting the tuple contents in order
+ to make decision if it should be returned or not. This parameter
+ is optional.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable class="parameter">seqscan</replaceable></term>
+ <listitem>
+ <para>
+ True if the sampling method will do sequential scan of the whole table.
+ Used for cost estimation and syncscan. The default value if not specified
+ is False.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable class="parameter">pagemode</replaceable></term>
+ <listitem>
+ <para>
+ True if the sampling method will read whole page at a time. The default
+ value if not specified is False.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ </variablelist>
+
+ <para>
+ The function names can be schema-qualified if necessary. Argument types
+ are not given, since the argument list for each type of function is
+ predetermined. All functions are required.
+ </para>
+
+ <para>
+ The arguments can appear in any order, not only the one shown above.
+ </para>
+ </refsect1>
+
+ <refsect1>
+ <title>Compatibility</title>
+
+ <para>
+ There is no
+ <command>CREATE TABLESAMPLE METHOD</command> statement in the SQL
+ standard.
+ </para>
+ </refsect1>
+
+ <refsect1>
+ <title>See Also</title>
+
+ <simplelist type="inline">
+ <member><xref linkend="sql-droptablesamplemethod"></member>
+ <member><xref linkend="sql-select"></member>
+ </simplelist>
+ </refsect1>
+</refentry>
diff --git a/doc/src/sgml/ref/drop_tablesamplemethod.sgml b/doc/src/sgml/ref/drop_tablesamplemethod.sgml
new file mode 100644
index 0000000..dffd2ec
--- /dev/null
+++ b/doc/src/sgml/ref/drop_tablesamplemethod.sgml
@@ -0,0 +1,87 @@
+<!--
+doc/src/sgml/ref/drop_tablesamplemethod.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="SQL-DROPTABLESAMPLEMETHOD">
+ <indexterm zone="sql-droptablesamplemethod">
+ <primary>DROP TABLESAMPLE METHOD</primary>
+ </indexterm>
+
+ <refmeta>
+ <refentrytitle>DROP TABLESAMPLE METHOD</refentrytitle>
+ <manvolnum>7</manvolnum>
+ <refmiscinfo>SQL - Language Statements</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+ <refname>DROP TABLESAMPLE METHOD</refname>
+ <refpurpose>remove a custom tablesample method</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+<synopsis>
+DROP TABLESAMPLE METHOD [ IF EXISTS ] <replaceable class="parameter">name</replaceable>
+</synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+ <title>Description</title>
+
+ <para>
+ <command>DROP TABLESAMPLE METHOD</command> drop an existing tablesample
+ method.
+ </para>
+
+ <para>
+ You must be a superuser to use <command>CREATE TABLESAMPLE METHOD</command>.
+ </para>
+ </refsect1>
+
+ <refsect1>
+ <title>Parameters</title>
+
+ <variablelist>
+
+ <varlistentry>
+ <term><literal>IF EXISTS</literal></term>
+ <listitem>
+ <para>
+ Do not throw an error if the tablesample method does not exist.
+ A notice is issued in this case.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable class="parameter">name</replaceable></term>
+ <listitem>
+ <para>
+ The name of an existing tablesample method to be removed.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ </variablelist>
+
+ </refsect1>
+
+ <refsect1>
+ <title>Compatibility</title>
+
+ <para>
+ There is no
+ <command>DROP TABLESAMPLE METHOD</command> statement in the SQL
+ standard.
+ </para>
+ </refsect1>
+
+ <refsect1>
+ <title>See Also</title>
+
+ <simplelist type="inline">
+ <member><xref linkend="sql-createtablesamplemethod"></member>
+ <member><xref linkend="sql-select"></member>
+ </simplelist>
+ </refsect1>
+</refentry>
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index 65ad795..4f55893 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -106,6 +106,7 @@
&createServer;
&createTable;
&createTableAs;
+ &createTablesampleMethod;
&createTableSpace;
&createTSConfig;
&createTSDictionary;
@@ -147,6 +148,7 @@
&dropSequence;
&dropServer;
&dropTable;
+ &dropTablesampleMethod;
&dropTableSpace;
&dropTSConfig;
&dropTSDictionary;
diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index bacb242..6acb5b3 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -46,6 +46,7 @@
#include "catalog/pg_policy.h"
#include "catalog/pg_proc.h"
#include "catalog/pg_rewrite.h"
+#include "catalog/pg_tablesample_method.h"
#include "catalog/pg_tablespace.h"
#include "catalog/pg_trigger.h"
#include "catalog/pg_ts_config.h"
@@ -157,7 +158,8 @@ static const Oid object_classes[MAX_OCLASS] = {
DefaultAclRelationId, /* OCLASS_DEFACL */
ExtensionRelationId, /* OCLASS_EXTENSION */
EventTriggerRelationId, /* OCLASS_EVENT_TRIGGER */
- PolicyRelationId /* OCLASS_POLICY */
+ PolicyRelationId, /* OCLASS_POLICY */
+ TableSampleMethodRelationId /* OCLASS_TABLESAMPLEMETHOD */
};
@@ -1265,6 +1267,10 @@ doDeletion(const ObjectAddress *object, int flags)
RemovePolicyById(object->objectId);
break;
+ case OCLASS_TABLESAMPLEMETHOD:
+ RemoveTablesampleMethodById(object->objectId);
+ break;
+
default:
elog(ERROR, "unrecognized object class: %u",
object->classId);
@@ -1794,6 +1800,10 @@ find_expr_references_walker(Node *node,
case RTE_RELATION:
add_object_address(OCLASS_CLASS, rte->relid, 0,
context->addrs);
+ if (rte->tablesample)
+ add_object_address(OCLASS_TABLESAMPLEMETHOD,
+ rte->tablesample->tsmid, 0,
+ context->addrs);
break;
default:
break;
@@ -2373,6 +2383,9 @@ getObjectClass(const ObjectAddress *object)
case PolicyRelationId:
return OCLASS_POLICY;
+
+ case TableSampleMethodRelationId:
+ return OCLASS_TABLESAMPLEMETHOD;
}
/* shouldn't get here */
diff --git a/src/backend/catalog/objectaddress.c b/src/backend/catalog/objectaddress.c
index e82a448..3f47076 100644
--- a/src/backend/catalog/objectaddress.c
+++ b/src/backend/catalog/objectaddress.c
@@ -44,6 +44,7 @@
#include "catalog/pg_proc.h"
#include "catalog/pg_policy.h"
#include "catalog/pg_rewrite.h"
+#include "catalog/pg_tablesample_method.h"
#include "catalog/pg_tablespace.h"
#include "catalog/pg_trigger.h"
#include "catalog/pg_ts_config.h"
@@ -429,7 +430,19 @@ static const ObjectPropertyType ObjectProperty[] =
Anum_pg_type_typacl,
ACL_KIND_TYPE,
true
- }
+ },
+ {
+ TableSampleMethodRelationId,
+ TableSampleMethodOidIndexId,
+ TABLESAMPLEMETHODOID,
+ TABLESAMPLEMETHODNAME,
+ Anum_pg_tablesample_method_tsmname,
+ InvalidAttrNumber,
+ InvalidAttrNumber,
+ InvalidAttrNumber,
+ -1,
+ true
+ },
};
/*
@@ -528,7 +541,9 @@ ObjectTypeMap[] =
/* OCLASS_EVENT_TRIGGER */
{ "event trigger", OBJECT_EVENT_TRIGGER },
/* OCLASS_POLICY */
- { "policy", OBJECT_POLICY }
+ { "policy", OBJECT_POLICY },
+ /* OCLASS_TABLESAMPLEMETHOD */
+ { "tablesample method", OBJECT_TABLESAMPLEMETHOD }
};
const ObjectAddress InvalidObjectAddress =
@@ -683,6 +698,7 @@ get_object_address(ObjectType objtype, List *objname, List *objargs,
case OBJECT_FDW:
case OBJECT_FOREIGN_SERVER:
case OBJECT_EVENT_TRIGGER:
+ case OBJECT_TABLESAMPLEMETHOD:
address = get_object_address_unqualified(objtype,
objname, missing_ok);
break;
@@ -921,6 +937,9 @@ get_object_address_unqualified(ObjectType objtype,
case OBJECT_EVENT_TRIGGER:
msg = gettext_noop("event trigger name cannot be qualified");
break;
+ case OBJECT_TABLESAMPLEMETHOD:
+ msg = gettext_noop("tablesample method name cannot be qualified");
+ break;
default:
elog(ERROR, "unrecognized objtype: %d", (int) objtype);
msg = NULL; /* placate compiler */
@@ -981,6 +1000,11 @@ get_object_address_unqualified(ObjectType objtype,
address.objectId = get_event_trigger_oid(name, missing_ok);
address.objectSubId = 0;
break;
+ case OBJECT_TABLESAMPLEMETHOD:
+ address.classId = TableSampleMethodRelationId;
+ address.objectId = get_tablesample_method_oid(name, missing_ok);
+ address.objectSubId = 0;
+ break;
default:
elog(ERROR, "unrecognized objtype: %d", (int) objtype);
/* placate compiler, which doesn't know elog won't return */
@@ -2044,6 +2068,7 @@ check_object_ownership(Oid roleid, ObjectType objtype, ObjectAddress address,
break;
case OBJECT_TSPARSER:
case OBJECT_TSTEMPLATE:
+ case OBJECT_TABLESAMPLEMETHOD:
/* We treat these object types as being owned by superusers */
if (!superuser_arg(roleid))
ereport(ERROR,
@@ -2982,6 +3007,21 @@ getObjectDescription(const ObjectAddress *object)
break;
}
+ case OCLASS_TABLESAMPLEMETHOD:
+ {
+ HeapTuple tup;
+
+ tup = SearchSysCache1(TABLESAMPLEMETHODOID,
+ ObjectIdGetDatum(object->objectId));
+ if (!HeapTupleIsValid(tup))
+ elog(ERROR, "cache lookup failed for tablesample method %u",
+ object->objectId);
+ appendStringInfo(&buffer, _("tablesample method %s"),
+ NameStr(((Form_pg_tablesample_method) GETSTRUCT(tup))->tsmname));
+ ReleaseSysCache(tup);
+ break;
+ }
+
default:
appendStringInfo(&buffer, "unrecognized object %u %u %d",
object->classId,
@@ -3459,6 +3499,10 @@ getObjectTypeDescription(const ObjectAddress *object)
appendStringInfoString(&buffer, "policy");
break;
+ case OCLASS_TABLESAMPLEMETHOD:
+ appendStringInfoString(&buffer, "tablesample method");
+ break;
+
default:
appendStringInfo(&buffer, "unrecognized %u", object->classId);
break;
@@ -4381,6 +4425,23 @@ getObjectIdentityParts(const ObjectAddress *object,
break;
}
+ case OCLASS_TABLESAMPLEMETHOD:
+ {
+ HeapTuple tup;
+ Form_pg_tablesample_method tsmForm;
+
+ tup = SearchSysCache1(TABLESAMPLEMETHODOID,
+ ObjectIdGetDatum(object->objectId));
+ if (!HeapTupleIsValid(tup))
+ elog(ERROR, "cache lookup failed for tablesample method %u",
+ object->objectId);
+ tsmForm = (Form_pg_tablesample_method) GETSTRUCT(tup);
+ appendStringInfoString(&buffer,
+ quote_identifier(NameStr(tsmForm->tsmname)));
+ ReleaseSysCache(tup);
+ break;
+ }
+
default:
appendStringInfo(&buffer, "unrecognized object %u %u %d",
object->classId,
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index b1ac704..04fcd8c 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -18,8 +18,8 @@ OBJS = aggregatecmds.o alter.o analyze.o async.o cluster.o comment.o \
event_trigger.o explain.o extension.o foreigncmds.o functioncmds.o \
indexcmds.o lockcmds.o matview.o operatorcmds.o opclasscmds.o \
policy.o portalcmds.o prepare.o proclang.o \
- schemacmds.o seclabel.o sequence.o tablecmds.o tablespace.o trigger.o \
- tsearchcmds.o typecmds.o user.o vacuum.o vacuumlazy.o \
- variable.o view.o
+ schemacmds.o seclabel.o sequence.o tablecmds.o tablesample.o \
+ tablespace.o trigger.o tsearchcmds.o typecmds.o user.o vacuum.o \
+ vacuumlazy.o variable.o view.o
include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/commands/dropcmds.c b/src/backend/commands/dropcmds.c
index a1b0d4d..c307dcf 100644
--- a/src/backend/commands/dropcmds.c
+++ b/src/backend/commands/dropcmds.c
@@ -429,6 +429,10 @@ does_not_exist_skipping(ObjectType objtype, List *objname, List *objargs)
}
}
break;
+ case OBJECT_TABLESAMPLEMETHOD:
+ msg = gettext_noop("tablesample method \"%s\" does not exist, skipping");
+ name = NameListToString(objname);
+ break;
default:
elog(ERROR, "unexpected object type (%d)", (int) objtype);
break;
diff --git a/src/backend/commands/event_trigger.c b/src/backend/commands/event_trigger.c
index 4bcc327..9e9ef38 100644
--- a/src/backend/commands/event_trigger.c
+++ b/src/backend/commands/event_trigger.c
@@ -97,6 +97,7 @@ static event_trigger_support_data event_trigger_support[] = {
{"SEQUENCE", true},
{"SERVER", true},
{"TABLE", true},
+ {"TABLESAMPLE METHOD", true},
{"TABLESPACE", false},
{"TRIGGER", true},
{"TEXT SEARCH CONFIGURATION", true},
@@ -1089,6 +1090,7 @@ EventTriggerSupportsObjectType(ObjectType obtype)
case OBJECT_SEQUENCE:
case OBJECT_TABCONSTRAINT:
case OBJECT_TABLE:
+ case OBJECT_TABLESAMPLEMETHOD:
case OBJECT_TRIGGER:
case OBJECT_TSCONFIGURATION:
case OBJECT_TSDICTIONARY:
@@ -1146,6 +1148,7 @@ EventTriggerSupportsObjectClass(ObjectClass objclass)
case OCLASS_DEFACL:
case OCLASS_EXTENSION:
case OCLASS_POLICY:
+ case OCLASS_TABLESAMPLEMETHOD:
return true;
case MAX_OCLASS:
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 002319e..7790afc 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -8256,6 +8256,7 @@ ATExecAlterColumnType(AlteredTableInfo *tab, Relation rel,
case OCLASS_USER_MAPPING:
case OCLASS_DEFACL:
case OCLASS_EXTENSION:
+ case OCLASS_TABLESAMPLEMETHOD:
/*
* We don't expect any of these sorts of objects to depend on
diff --git a/src/backend/commands/tablesample.c b/src/backend/commands/tablesample.c
new file mode 100644
index 0000000..f40820b
--- /dev/null
+++ b/src/backend/commands/tablesample.c
@@ -0,0 +1,427 @@
+/*-------------------------------------------------------------------------
+ *
+ * tablesample.c
+ * Commands to manipulate tablesample methods
+ *
+ * Table sampling methods provide algorithms for doing sample scan over
+ * the table.
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ * src/backend/commands/tablesample.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <unistd.h>
+#include <dirent.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "catalog/dependency.h"
+#include "catalog/indexing.h"
+#include "catalog/objectaccess.h"
+#include "catalog/pg_proc.h"
+#include "catalog/pg_tablesample_method.h"
+#include "catalog/pg_type.h"
+#include "commands/defrem.h"
+#include "miscadmin.h"
+#include "parser/parse_func.h"
+#include "utils/builtins.h"
+#include "utils/rel.h"
+#include "utils/lsyscache.h"
+#include "utils/syscache.h"
+
+
+static Datum
+get_tablesample_method_func(DefElem *defel, int attnum)
+{
+ List *funcName = defGetQualifiedName(defel);
+ /* Big enough size for our needs. */
+ Oid *typeId = palloc0(7 * sizeof(Oid));
+ Oid retTypeId;
+ int nargs;
+ Oid procOid = InvalidOid;
+ FuncCandidateList clist;
+
+ switch (attnum)
+ {
+ case Anum_pg_tablesample_method_tsminit:
+ /*
+ * tsminit needs special handling because it is defined as function
+ * with 3 or more arguments and only first two arguments must have
+ * specific type, the rest is up to the tablesample method creator.
+ */
+ {
+ nargs = 2;
+ typeId[0] = INTERNALOID;
+ typeId[1] = INT4OID;
+ retTypeId = VOIDOID;
+
+ clist = FuncnameGetCandidates(funcName, -1, NIL, false, false, false);
+
+ while (clist)
+ {
+ if (clist->nargs >= 3 &&
+ memcmp(typeId, clist->args, nargs * sizeof(Oid)) == 0)
+ {
+ procOid = clist->oid;
+ /* Save real function signature for future errors. */
+ nargs = clist->nargs;
+ pfree(typeId);
+ typeId = clist->args;
+ break;
+ }
+ clist = clist->next;
+ }
+
+ if (!OidIsValid(procOid))
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_FUNCTION),
+ errmsg("function \"%s\" does not exist or does not have valid signature",
+ NameListToString(funcName)),
+ errhint("The tamplesample method init function "
+ "must have at least 3 input parameters "
+ "with first one of type INTERNAL and second of type INTEGER.")));
+ }
+ break;
+
+ case Anum_pg_tablesample_method_tsmnextblock:
+ nargs = 1;
+ typeId[0] = INTERNALOID;
+ retTypeId = INT4OID;
+ break;
+ case Anum_pg_tablesample_method_tsmnexttuple:
+ nargs = 3;
+ typeId[0] = INTERNALOID;
+ typeId[1] = INT4OID;
+ typeId[2] = INT2OID;
+ retTypeId = INT2OID;
+ break;
+ case Anum_pg_tablesample_method_tsmexaminetuple:
+ nargs = 4;
+ typeId[0] = INTERNALOID;
+ typeId[1] = INT4OID;
+ typeId[2] = INTERNALOID;
+ typeId[3] = BOOLOID;
+ retTypeId = BOOLOID;
+ break;
+ case Anum_pg_tablesample_method_tsmend:
+ case Anum_pg_tablesample_method_tsmreset:
+ nargs = 1;
+ typeId[0] = INTERNALOID;
+ retTypeId = VOIDOID;
+ break;
+ case Anum_pg_tablesample_method_tsmcost:
+ nargs = 7;
+ typeId[0] = INTERNALOID;
+ typeId[1] = INTERNALOID;
+ typeId[2] = INTERNALOID;
+ typeId[3] = INTERNALOID;
+ typeId[4] = INTERNALOID;
+ typeId[5] = INTERNALOID;
+ typeId[6] = INTERNALOID;
+ retTypeId = VOIDOID;
+ break;
+ default:
+ /* should not be here */
+ elog(ERROR, "unrecognized attribute for tablesample method: %d",
+ attnum);
+ nargs = 0; /* keep compiler quiet */
+ }
+
+ if (!OidIsValid(procOid))
+ procOid = LookupFuncName(funcName, nargs, typeId, false);
+ if (get_func_rettype(procOid) != retTypeId)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("function %s should return type %s",
+ func_signature_string(funcName, nargs, NIL, typeId),
+ format_type_be(retTypeId))));
+
+ return ObjectIdGetDatum(procOid);
+}
+
+/*
+ * make pg_depend entries for a new pg_tablesample_method entry
+ */
+static void
+makeTablesampleMethodDeps(HeapTuple tuple)
+{
+ Form_pg_tablesample_method tsm = (Form_pg_tablesample_method) GETSTRUCT(tuple);
+ ObjectAddress myself,
+ referenced;
+
+ myself.classId = TableSampleMethodRelationId;
+ myself.objectId = HeapTupleGetOid(tuple);
+ myself.objectSubId = 0;
+
+ /* dependency on extension */
+ recordDependencyOnCurrentExtension(&myself, false);
+
+ /* dependencies on functions */
+ referenced.classId = ProcedureRelationId;
+ referenced.objectSubId = 0;
+
+ referenced.objectId = tsm->tsminit;
+ recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);
+
+ referenced.objectId = tsm->tsmnextblock;
+ recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);
+
+ referenced.objectId = tsm->tsmnexttuple;
+ recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);
+
+ if (OidIsValid(tsm->tsmexaminetuple))
+ {
+ referenced.objectId = tsm->tsmexaminetuple;
+ recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);
+ }
+
+ referenced.objectId = tsm->tsmend;
+ recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);
+
+ referenced.objectId = tsm->tsmreset;
+ recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);
+
+ referenced.objectId = tsm->tsmcost;
+ recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);
+}
+
+/*
+ * Create a table sampling method
+ *
+ * Only superusers can create a table sampling methods.
+ */
+Oid
+DefineTablesampleMethod(List *names, List *parameters)
+{
+ char *tsmname = strVal(linitial(names));
+ Oid tsmoid;
+ ListCell *pl;
+ Relation rel;
+ Datum values[Natts_pg_tablesample_method];
+ bool nulls[Natts_pg_tablesample_method];
+ HeapTuple tuple;
+
+ /* Must be super user. */
+ if (!superuser())
+ ereport(ERROR,
+ (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+ errmsg("permission denied to create tablesample method \"%s\"",
+ tsmname),
+ errhint("Must be superuser to create a tablesample method.")));
+
+ /* Must not already exist. */
+ tsmoid = get_tablesample_method_oid(tsmname, true);
+ if (OidIsValid(tsmoid))
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_OBJECT),
+ errmsg("tablesample method \"%s\" already exists",
+ tsmname)));
+
+ /* Initialize the values. */
+ memset(values, 0, sizeof(values));
+ memset(nulls, false, sizeof(nulls));
+
+ values[Anum_pg_tablesample_method_tsmname - 1] =
+ DirectFunctionCall1(namein, CStringGetDatum(tsmname));
+
+ /*
+ * loop over the definition list and extract the information we need.
+ */
+ foreach(pl, parameters)
+ {
+ DefElem *defel = (DefElem *) lfirst(pl);
+
+ if (pg_strcasecmp(defel->defname, "seqscan") == 0)
+ {
+ values[Anum_pg_tablesample_method_tsmseqscan - 1] =
+ BoolGetDatum(defGetBoolean(defel));
+ }
+ else if (pg_strcasecmp(defel->defname, "pagemode") == 0)
+ {
+ values[Anum_pg_tablesample_method_tsmpagemode - 1] =
+ BoolGetDatum(defGetBoolean(defel));
+ }
+ else if (pg_strcasecmp(defel->defname, "init") == 0)
+ {
+ values[Anum_pg_tablesample_method_tsminit - 1] =
+ get_tablesample_method_func(defel,
+ Anum_pg_tablesample_method_tsminit);
+ }
+ else if (pg_strcasecmp(defel->defname, "nextblock") == 0)
+ {
+ values[Anum_pg_tablesample_method_tsmnextblock - 1] =
+ get_tablesample_method_func(defel,
+ Anum_pg_tablesample_method_tsmnextblock);
+ }
+ else if (pg_strcasecmp(defel->defname, "nexttuple") == 0)
+ {
+ values[Anum_pg_tablesample_method_tsmnexttuple - 1] =
+ get_tablesample_method_func(defel,
+ Anum_pg_tablesample_method_tsmnexttuple);
+ }
+ else if (pg_strcasecmp(defel->defname, "examinetuple") == 0)
+ {
+ values[Anum_pg_tablesample_method_tsmexaminetuple - 1] =
+ get_tablesample_method_func(defel,
+ Anum_pg_tablesample_method_tsmexaminetuple);
+ }
+ else if (pg_strcasecmp(defel->defname, "end") == 0)
+ {
+ values[Anum_pg_tablesample_method_tsmend - 1] =
+ get_tablesample_method_func(defel,
+ Anum_pg_tablesample_method_tsmend);
+ }
+ else if (pg_strcasecmp(defel->defname, "reset") == 0)
+ {
+ values[Anum_pg_tablesample_method_tsmreset - 1] =
+ get_tablesample_method_func(defel,
+ Anum_pg_tablesample_method_tsmreset);
+ }
+ else if (pg_strcasecmp(defel->defname, "cost") == 0)
+ {
+ values[Anum_pg_tablesample_method_tsmcost - 1] =
+ get_tablesample_method_func(defel,
+ Anum_pg_tablesample_method_tsmcost);
+ }
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("tablesample method parameter \"%s\" not recognized",
+ defel->defname)));
+ }
+
+ /*
+ * Validation.
+ */
+ if (!OidIsValid(DatumGetObjectId(values[Anum_pg_tablesample_method_tsminit - 1])))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("tablesample method init function is required")));
+
+ if (!OidIsValid(DatumGetObjectId(values[Anum_pg_tablesample_method_tsmnextblock - 1])))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("tablesample method nextblock function is required")));
+
+ if (!OidIsValid(DatumGetObjectId(values[Anum_pg_tablesample_method_tsmnexttuple - 1])))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("tablesample method nexttuple function is required")));
+
+ if (!OidIsValid(DatumGetObjectId(values[Anum_pg_tablesample_method_tsmend - 1])))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("tablesample method end function is required")));
+
+ if (!OidIsValid(DatumGetObjectId(values[Anum_pg_tablesample_method_tsmreset - 1])))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("tablesample method reset function is required")));
+
+ if (!OidIsValid(DatumGetObjectId(values[Anum_pg_tablesample_method_tsmcost - 1])))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("tablesample method cost function is required")));
+
+ /*
+ * Insert tuple into pg_tablesample_method.
+ */
+ rel = heap_open(TableSampleMethodRelationId, RowExclusiveLock);
+
+ tuple = heap_form_tuple(rel->rd_att, values, nulls);
+
+ tsmoid = simple_heap_insert(rel, tuple);
+
+ CatalogUpdateIndexes(rel, tuple);
+
+ makeTablesampleMethodDeps(tuple);
+
+ heap_freetuple(tuple);
+
+ /* Post creation hook for new tablesample method */
+ InvokeObjectPostCreateHook(TableSampleMethodRelationId, tsmoid, 0);
+
+ heap_close(rel, RowExclusiveLock);
+
+ return tsmoid;
+}
+
+/*
+ * Drop a tablesample method.
+ */
+void
+RemoveTablesampleMethodById(Oid tsmoid)
+{
+ Relation rel;
+ HeapTuple tuple;
+ Form_pg_tablesample_method tsm;
+
+ /*
+ * Find the target tuple
+ */
+ rel = heap_open(TableSampleMethodRelationId, RowExclusiveLock);
+
+ tuple = SearchSysCache1(TABLESAMPLEMETHODOID, ObjectIdGetDatum(tsmoid));
+ if (!HeapTupleIsValid(tuple))
+ elog(ERROR, "cache lookup failed for tablesample method %u",
+ tsmoid);
+
+ tsm = (Form_pg_tablesample_method) GETSTRUCT(tuple);
+ /* Can't drop builtin tablesample methods. */
+ if (tsmoid == TABLESAMPLE_METHOD_SYSTEM_OID ||
+ tsmoid == TABLESAMPLE_METHOD_BERNOULLI_OID)
+ ereport(ERROR,
+ (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+ errmsg("permission denied for tablesample method %s",
+ NameStr(tsm->tsmname))));
+
+ /*
+ * Remove the pg_tablespace tuple (this will roll back if we fail below)
+ */
+ simple_heap_delete(rel, &tuple->t_self);
+
+ ReleaseSysCache(tuple);
+
+ heap_close(rel, RowExclusiveLock);
+}
+
+/*
+ * get_tablesample_method_oid - given a tablesample method name,
+ * look up the OID
+ *
+ * If missing_ok is false, throw an error if tablesample method name not found.
+ * If true, just return InvalidOid.
+ */
+Oid
+get_tablesample_method_oid(const char *tsmname, bool missing_ok)
+{
+ Oid result;
+ HeapTuple tuple;
+
+ tuple = SearchSysCache1(TABLESAMPLEMETHODNAME, PointerGetDatum(tsmname));
+ if (HeapTupleIsValid(tuple))
+ {
+ result = HeapTupleGetOid(tuple);
+ ReleaseSysCache(tuple);
+ }
+ else
+ result = InvalidOid;
+
+ if (!OidIsValid(result) && !missing_ok)
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_OBJECT),
+ errmsg("tablesample method \"%s\" does not exist",
+ tsmname)));
+
+ return result;
+}
+
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 711cdd5..d3f1273 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -590,7 +590,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
LEADING LEAKPROOF LEAST LEFT LEVEL LIKE LIMIT LISTEN LOAD LOCAL
LOCALTIME LOCALTIMESTAMP LOCATION LOCK_P LOCKED LOGGED
- MAPPING MATCH MATERIALIZED MAXVALUE MINUTE_P MINVALUE MODE MONTH_P MOVE
+ MAPPING MATCH MATERIALIZED MAXVALUE METHOD MINUTE_P MINVALUE MODE MONTH_P
+ MOVE
NAME_P NAMES NATIONAL NATURAL NCHAR NEXT NO NONE
NOT NOTHING NOTIFY NOTNULL NOWAIT NULL_P NULLIF
@@ -5103,6 +5104,15 @@ DefineStmt:
n->definition = list_make1(makeDefElem("from", (Node *) $5));
$$ = (Node *)n;
}
+ | CREATE TABLESAMPLE METHOD name definition
+ {
+ DefineStmt *n = makeNode(DefineStmt);
+ n->kind = OBJECT_TABLESAMPLEMETHOD;
+ n->args = NIL;
+ n->defnames = list_make1(makeString($4));
+ n->definition = $5;
+ $$ = (Node *)n;
+ }
;
definition: '(' def_list ')' { $$ = $2; }
@@ -5557,6 +5567,7 @@ drop_type: TABLE { $$ = OBJECT_TABLE; }
| TEXT_P SEARCH DICTIONARY { $$ = OBJECT_TSDICTIONARY; }
| TEXT_P SEARCH TEMPLATE { $$ = OBJECT_TSTEMPLATE; }
| TEXT_P SEARCH CONFIGURATION { $$ = OBJECT_TSCONFIGURATION; }
+ | TABLESAMPLE METHOD { $$ = OBJECT_TABLESAMPLEMETHOD; }
;
any_name_list:
@@ -13417,6 +13428,7 @@ unreserved_keyword:
| MATCH
| MATERIALIZED
| MAXVALUE
+ | METHOD
| MINUTE_P
| MINVALUE
| MODE
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index fd09d3a..cadf6b4 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -23,6 +23,7 @@
#include "access/xlog.h"
#include "catalog/catalog.h"
#include "catalog/namespace.h"
+#include "catalog/pg_tablesample_method.h"
#include "catalog/toasting.h"
#include "commands/alter.h"
#include "commands/async.h"
@@ -1136,6 +1137,11 @@ ProcessUtilitySlow(Node *parsetree,
Assert(stmt->args == NIL);
DefineCollation(stmt->defnames, stmt->definition);
break;
+ case OBJECT_TABLESAMPLEMETHOD:
+ Assert(stmt->args == NIL);
+ Assert(list_length(stmt->defnames) == 1);
+ DefineTablesampleMethod(stmt->defnames, stmt->definition);
+ break;
default:
elog(ERROR, "unrecognized define stmt type: %d",
(int) stmt->kind);
@@ -2004,6 +2010,9 @@ CreateCommandTag(Node *parsetree)
case OBJECT_POLICY:
tag = "DROP POLICY";
break;
+ case OBJECT_TABLESAMPLEMETHOD:
+ tag = "DROP TABLESAMPLE METHOD";
+ break;
default:
tag = "???";
}
@@ -2100,6 +2109,9 @@ CreateCommandTag(Node *parsetree)
case OBJECT_COLLATION:
tag = "CREATE COLLATION";
break;
+ case OBJECT_TABLESAMPLEMETHOD:
+ tag = "CREATE TABLESAMPLE METHOD";
+ break;
default:
tag = "???";
}
diff --git a/src/bin/pg_dump/common.c b/src/bin/pg_dump/common.c
index 1a0a587..8a64e4b 100644
--- a/src/bin/pg_dump/common.c
+++ b/src/bin/pg_dump/common.c
@@ -103,6 +103,7 @@ getSchemaData(Archive *fout, DumpOptions *dopt, int *numTablesPtr)
int numForeignServers;
int numDefaultACLs;
int numEventTriggers;
+ int numTSMs;
if (g_verbose)
write_msg(NULL, "reading schemas\n");
@@ -251,6 +252,10 @@ getSchemaData(Archive *fout, DumpOptions *dopt, int *numTablesPtr)
write_msg(NULL, "reading policies\n");
getPolicies(fout, tblinfo, numTables);
+ if (g_verbose)
+ write_msg(NULL, "reading tablesample methods\n");
+ getTableSampleMethods(fout, &numTSMs);
+
*numTablesPtr = numTables;
return tblinfo;
}
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 7da5c41..324ca4e 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -182,6 +182,7 @@ static void dumpSequenceData(Archive *fout, TableDataInfo *tdinfo);
static void dumpIndex(Archive *fout, DumpOptions *dopt, IndxInfo *indxinfo);
static void dumpConstraint(Archive *fout, DumpOptions *dopt, ConstraintInfo *coninfo);
static void dumpTableConstraintComment(Archive *fout, DumpOptions *dopt, ConstraintInfo *coninfo);
+static void dumpTableSampleMethod(Archive *fout, DumpOptions *dopt, TSMInfo *tbinfo);
static void dumpTSParser(Archive *fout, DumpOptions *dopt, TSParserInfo *prsinfo);
static void dumpTSDictionary(Archive *fout, DumpOptions *dopt, TSDictInfo *dictinfo);
static void dumpTSTemplate(Archive *fout, DumpOptions *dopt, TSTemplateInfo *tmplinfo);
@@ -7134,6 +7135,78 @@ getTableAttrs(Archive *fout, DumpOptions *dopt, TableInfo *tblinfo, int numTable
}
/*
+ * getTableSampleMethods:
+ * read all tablesample methods in the system catalogs and return them
+ * in the TSMInfo* structure
+ *
+ * numTSMs is set to the number of tablesample methods read in
+ */
+TSMInfo *
+getTableSampleMethods(Archive *fout, int *numTSMs)
+{
+ PGresult *res;
+ int ntups;
+ int i;
+ PQExpBuffer query;
+ TSMInfo *tsminfo;
+ int i_tableoid,
+ i_oid,
+ i_tsmname,
+ i_tsmseqscan,
+ i_tsmpagemode;
+
+ /* Before 9.5, there were no tablesample methods */
+ if (fout->remoteVersion < 90500)
+ {
+ *numTSMs = 0;
+ return NULL;
+ }
+
+ query = createPQExpBuffer();
+
+ appendPQExpBuffer(query,
+ "SELECT tableoid, oid, tsmname, tsmseqscan, tsmpagemode "
+ "FROM pg_catalog.pg_tablesample_method "
+ "WHERE oid >= '%u'::pg_catalog.oid",
+ FirstNormalObjectId);
+
+ res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+
+ ntups = PQntuples(res);
+ *numTSMs = ntups;
+
+ tsminfo = (TSMInfo *) pg_malloc(ntups * sizeof(TSMInfo));
+
+ i_tableoid = PQfnumber(res, "tableoid");
+ i_oid = PQfnumber(res, "oid");
+ i_tsmname = PQfnumber(res, "tsmname");
+ i_tsmseqscan = PQfnumber(res, "tsmseqscan");
+ i_tsmpagemode = PQfnumber(res, "tsmpagemode");
+
+ for (i = 0; i < ntups; i++)
+ {
+ tsminfo[i].dobj.objType = DO_TABLESAMPLE_METHOD;
+ tsminfo[i].dobj.catId.tableoid = atooid(PQgetvalue(res, i, i_tableoid));
+ tsminfo[i].dobj.catId.oid = atooid(PQgetvalue(res, i, i_oid));
+ AssignDumpId(&tsminfo[i].dobj);
+ tsminfo[i].dobj.name = pg_strdup(PQgetvalue(res, i, i_tsmname));
+ tsminfo[i].dobj.namespace = NULL;
+ tsminfo[i].tsmseqscan = PQgetvalue(res, i, i_tsmseqscan)[0] == 't';
+ tsminfo[i].tsmpagemode = PQgetvalue(res, i, i_tsmpagemode)[0] == 't';
+
+ /* Decide whether we want to dump it */
+ selectDumpableObject(&(tsminfo[i].dobj));
+ }
+
+ PQclear(res);
+
+ destroyPQExpBuffer(query);
+
+ return tsminfo;
+}
+
+
+/*
* Test whether a column should be printed as part of table's CREATE TABLE.
* Column number is zero-based.
*
@@ -8226,6 +8299,9 @@ dumpDumpableObject(Archive *fout, DumpOptions *dopt, DumpableObject *dobj)
case DO_POLICY:
dumpPolicy(fout, dopt, (PolicyInfo *) dobj);
break;
+ case DO_TABLESAMPLE_METHOD:
+ dumpTableSampleMethod(fout, dopt, (TSMInfo *) dobj);
+ break;
case DO_PRE_DATA_BOUNDARY:
case DO_POST_DATA_BOUNDARY:
/* never dumped, nothing to do */
@@ -12226,6 +12302,106 @@ dumpAgg(Archive *fout, DumpOptions *dopt, AggInfo *agginfo)
}
/*
+ * dumpTableSampleMethod
+ * write the declaration of one user-defined tablesample method
+ */
+static void
+dumpTableSampleMethod(Archive *fout, DumpOptions *dopt, TSMInfo *tsminfo)
+{
+ PGresult *res;
+ PQExpBuffer q;
+ PQExpBuffer delq;
+ PQExpBuffer labelq;
+ PQExpBuffer query;
+ char *tsminit;
+ char *tsmnextblock;
+ char *tsmnexttuple;
+ char *tsmexaminetuple;
+ char *tsmend;
+ char *tsmreset;
+ char *tsmcost;
+
+ /* Skip if not to be dumped */
+ if (!tsminfo->dobj.dump || dopt->dataOnly)
+ return;
+
+ q = createPQExpBuffer();
+ delq = createPQExpBuffer();
+ labelq = createPQExpBuffer();
+ query = createPQExpBuffer();
+
+ /* Make sure we are in proper schema */
+ selectSourceSchema(fout, "pg_catalog");
+
+ appendPQExpBuffer(query, "SELECT tsminit, tsmnextblock, "
+ "tsmnexttuple, tsmexaminetuple, "
+ "tsmend, tsmreset, tsmcost "
+ "FROM pg_catalog.pg_tablesample_method "
+ "WHERE oid = '%u'::pg_catalog.oid",
+ tsminfo->dobj.catId.oid);
+
+ res = ExecuteSqlQueryForSingleRow(fout, query->data);
+
+ tsminit = PQgetvalue(res, 0, PQfnumber(res, "tsminit"));
+ tsmnexttuple = PQgetvalue(res, 0, PQfnumber(res, "tsmnexttuple"));
+ tsmnextblock = PQgetvalue(res, 0, PQfnumber(res, "tsmnextblock"));
+ tsmexaminetuple = PQgetvalue(res, 0, PQfnumber(res, "tsmexaminetuple"));
+ tsmend = PQgetvalue(res, 0, PQfnumber(res, "tsmend"));
+ tsmreset = PQgetvalue(res, 0, PQfnumber(res, "tsmreset"));
+ tsmcost = PQgetvalue(res, 0, PQfnumber(res, "tsmcost"));
+
+ appendPQExpBuffer(q, "CREATE TABLESAMPLE METHOD %s (\n",
+ fmtId(tsminfo->dobj.name));
+
+ appendPQExpBuffer(q, " INIT = %s,\n", tsminit);
+ appendPQExpBuffer(q, " NEXTTUPLE = %s,\n", tsmnexttuple);
+ appendPQExpBuffer(q, " NEXTBLOCK = %s,\n", tsmnextblock);
+ appendPQExpBuffer(q, " END = %s,\n", tsmend);
+ appendPQExpBuffer(q, " RESET = %s,\n", tsmreset);
+ appendPQExpBuffer(q, " COST = %s", tsmcost);
+
+ if (strcmp(tsmexaminetuple, "-") != 0)
+ appendPQExpBuffer(q, ",\n EXAMINETUPLE = %s", tsmexaminetuple);
+
+ if (tsminfo->tsmseqscan)
+ appendPQExpBufferStr(q, ",\n SEQSCAN = true");
+
+ if (tsminfo->tsmpagemode)
+ appendPQExpBufferStr(q, ",\n PAGEMODE = true");
+
+ appendPQExpBufferStr(q, "\n);");
+
+ appendPQExpBuffer(delq, "DROP TABLESAMPLE METHOD %s",
+ fmtId(tsminfo->dobj.name));
+
+ appendPQExpBuffer(labelq, "TABLESAMPLE METHOD %s",
+ fmtId(tsminfo->dobj.name));
+
+ if (dopt->binary_upgrade)
+ binary_upgrade_extension_member(q, &tsminfo->dobj, labelq->data);
+
+ ArchiveEntry(fout, tsminfo->dobj.catId, tsminfo->dobj.dumpId,
+ tsminfo->dobj.name,
+ NULL,
+ NULL,
+ "",
+ false, "TABLESAMPLE METHOD", SECTION_PRE_DATA,
+ q->data, delq->data, NULL,
+ NULL, 0,
+ NULL, NULL);
+
+ /* Dump Parser Comments */
+ dumpComment(fout, dopt, labelq->data,
+ NULL, "",
+ tsminfo->dobj.catId, 0, tsminfo->dobj.dumpId);
+
+ PQclear(res);
+ destroyPQExpBuffer(q);
+ destroyPQExpBuffer(delq);
+ destroyPQExpBuffer(labelq);
+}
+
+/*
* dumpTSParser
* write out a single text search parser
*/
@@ -15659,6 +15835,7 @@ addBoundaryDependencies(DumpableObject **dobjs, int numObjs,
case DO_FDW:
case DO_FOREIGN_SERVER:
case DO_BLOB:
+ case DO_TABLESAMPLE_METHOD:
/* Pre-data objects: must come before the pre-data boundary */
addObjectDependency(preDataBound, dobj->dumpId);
break;
diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h
index a9d3c10..87bef24 100644
--- a/src/bin/pg_dump/pg_dump.h
+++ b/src/bin/pg_dump/pg_dump.h
@@ -76,7 +76,8 @@ typedef enum
DO_POST_DATA_BOUNDARY,
DO_EVENT_TRIGGER,
DO_REFRESH_MATVIEW,
- DO_POLICY
+ DO_POLICY,
+ DO_TABLESAMPLE_METHOD
} DumpableObjectType;
typedef struct _dumpableObject
@@ -383,6 +384,13 @@ typedef struct _inhInfo
Oid inhparent; /* OID of its parent */
} InhInfo;
+typedef struct _tsmInfo
+{
+ DumpableObject dobj;
+ bool tsmseqscan;
+ bool tsmpagemode;
+} TSMInfo;
+
typedef struct _prsInfo
{
DumpableObject dobj;
@@ -536,6 +544,7 @@ extern ProcLangInfo *getProcLangs(Archive *fout, int *numProcLangs);
extern CastInfo *getCasts(Archive *fout, DumpOptions *dopt, int *numCasts);
extern void getTableAttrs(Archive *fout, DumpOptions *dopt, TableInfo *tbinfo, int numTables);
extern bool shouldPrintColumn(DumpOptions *dopt, TableInfo *tbinfo, int colno);
+extern TSMInfo *getTableSampleMethods(Archive *fout, int *numTSMs);
extern TSParserInfo *getTSParsers(Archive *fout, int *numTSParsers);
extern TSDictInfo *getTSDictionaries(Archive *fout, int *numTSDicts);
extern TSTemplateInfo *getTSTemplates(Archive *fout, int *numTSTemplates);
diff --git a/src/bin/pg_dump/pg_dump_sort.c b/src/bin/pg_dump/pg_dump_sort.c
index c5ed593..9567cf6 100644
--- a/src/bin/pg_dump/pg_dump_sort.c
+++ b/src/bin/pg_dump/pg_dump_sort.c
@@ -73,7 +73,8 @@ static const int oldObjectTypePriority[] =
13, /* DO_POST_DATA_BOUNDARY */
20, /* DO_EVENT_TRIGGER */
15, /* DO_REFRESH_MATVIEW */
- 21 /* DO_POLICY */
+ 21, /* DO_POLICY */
+ 5 /* DO_TABLESAMPLE_METHOD */
};
/*
@@ -122,7 +123,8 @@ static const int newObjectTypePriority[] =
25, /* DO_POST_DATA_BOUNDARY */
32, /* DO_EVENT_TRIGGER */
33, /* DO_REFRESH_MATVIEW */
- 34 /* DO_POLICY */
+ 34, /* DO_POLICY */
+ 17 /* DO_TABLESAMPLE_METHOD */
};
static DumpId preDataBoundId;
@@ -1460,6 +1462,11 @@ describeDumpableObject(DumpableObject *obj, char *buf, int bufsize)
"POLICY (ID %d OID %u)",
obj->dumpId, obj->catId.oid);
return;
+ case DO_TABLESAMPLE_METHOD:
+ snprintf(buf, bufsize,
+ "TABLESAMPLE METHOD %s (ID %d OID %u)",
+ obj->name, obj->dumpId, obj->catId.oid);
+ return;
case DO_PRE_DATA_BOUNDARY:
snprintf(buf, bufsize,
"PRE-DATA BOUNDARY (ID %d)",
diff --git a/src/include/catalog/dependency.h b/src/include/catalog/dependency.h
index 6481ac8..30653f8 100644
--- a/src/include/catalog/dependency.h
+++ b/src/include/catalog/dependency.h
@@ -148,6 +148,7 @@ typedef enum ObjectClass
OCLASS_EXTENSION, /* pg_extension */
OCLASS_EVENT_TRIGGER, /* pg_event_trigger */
OCLASS_POLICY, /* pg_policy */
+ OCLASS_TABLESAMPLEMETHOD, /* pg_tablesample_method */
MAX_OCLASS /* MUST BE LAST */
} ObjectClass;
diff --git a/src/include/catalog/pg_tablesample_method.h b/src/include/catalog/pg_tablesample_method.h
index ec826e3..bc22edd 100644
--- a/src/include/catalog/pg_tablesample_method.h
+++ b/src/include/catalog/pg_tablesample_method.h
@@ -71,7 +71,18 @@ typedef FormData_pg_tablesample_method *Form_pg_tablesample_method;
DATA(insert OID = 3293 ( system false true tsm_system_init tsm_system_nextblock tsm_system_nexttuple - tsm_system_end tsm_system_reset tsm_system_cost ));
DESCR("SYSTEM table sampling method");
+#define TABLESAMPLE_METHOD_SYSTEM_OID 3293
DATA(insert OID = 3294 ( bernoulli true false tsm_bernoulli_init tsm_bernoulli_nextblock tsm_bernoulli_nexttuple - tsm_bernoulli_end tsm_bernoulli_reset tsm_bernoulli_cost ));
DESCR("BERNOULLI table sampling method");
+#define TABLESAMPLE_METHOD_BERNOULLI_OID 3294
+
+/* ----------------
+ * functions for manipulation of pg_tablesample_method
+ * ----------------
+ */
+
+extern Oid DefineTablesampleMethod(List *names, List *parameters);
+extern void RemoveTablesampleMethodById(Oid tsmoid);
+extern Oid get_tablesample_method_oid(const char *tsmname, bool missing_ok);
#endif /* PG_TABLESAMPLE_METHOD_H */
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index a4288d1..359119a 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -1291,6 +1291,7 @@ typedef enum ObjectType
OBJECT_SEQUENCE,
OBJECT_TABCONSTRAINT,
OBJECT_TABLE,
+ OBJECT_TABLESAMPLEMETHOD,
OBJECT_TABLESPACE,
OBJECT_TRIGGER,
OBJECT_TSCONFIGURATION,
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index ae90df8..902c189 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -236,6 +236,7 @@ PG_KEYWORD("mapping", MAPPING, UNRESERVED_KEYWORD)
PG_KEYWORD("match", MATCH, UNRESERVED_KEYWORD)
PG_KEYWORD("materialized", MATERIALIZED, UNRESERVED_KEYWORD)
PG_KEYWORD("maxvalue", MAXVALUE, UNRESERVED_KEYWORD)
+PG_KEYWORD("method", METHOD, UNRESERVED_KEYWORD)
PG_KEYWORD("minute", MINUTE_P, UNRESERVED_KEYWORD)
PG_KEYWORD("minvalue", MINVALUE, UNRESERVED_KEYWORD)
PG_KEYWORD("mode", MODE, UNRESERVED_KEYWORD)
diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile
index 93d93af..37ea524 100644
--- a/src/test/modules/Makefile
+++ b/src/test/modules/Makefile
@@ -9,7 +9,8 @@ SUBDIRS = \
worker_spi \
dummy_seclabel \
test_shm_mq \
- test_parser
+ test_parser \
+ tablesample
all: submake-errcodes
diff --git a/src/test/modules/tablesample/.gitignore b/src/test/modules/tablesample/.gitignore
new file mode 100644
index 0000000..5dcb3ff
--- /dev/null
+++ b/src/test/modules/tablesample/.gitignore
@@ -0,0 +1,4 @@
+# Generated subdirectories
+/log/
+/results/
+/tmp_check/
diff --git a/src/test/modules/tablesample/Makefile b/src/test/modules/tablesample/Makefile
new file mode 100644
index 0000000..469b004
--- /dev/null
+++ b/src/test/modules/tablesample/Makefile
@@ -0,0 +1,21 @@
+# src/test/modules/tsm_test/Makefile
+
+MODULE_big = tsm_test
+OBJS = tsm_test.o $(WIN32RES)
+PGFILEDESC = "tsm_test - example of a custom tablesample method"
+
+EXTENSION = tsm_test
+DATA = tsm_test--1.0.sql
+
+REGRESS = tablesample
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/tablesample
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/src/test/modules/tablesample/expected/tablesample.out b/src/test/modules/tablesample/expected/tablesample.out
new file mode 100644
index 0000000..ad62e32
--- /dev/null
+++ b/src/test/modules/tablesample/expected/tablesample.out
@@ -0,0 +1,38 @@
+CREATE EXTENSION tsm_test;
+CREATE TABLE test_tsm AS SELECT md5(i::text) a, 0.5::float b FROM generate_series(1,10) g(i);
+SELECT * FROM test_tsm TABLESAMPLE tsm_test('b') REPEATABLE (1);
+ a | b
+----------------------------------+-----
+ c4ca4238a0b923820dcc509a6f75849b | 0.5
+ c81e728d9d4c2f636f067f89cc14862c | 0.5
+ a87ff679a2f3e71d9181a67b7542122c | 0.5
+ 1679091c5a880faf6fb5e6087eb1b2dc | 0.5
+ 8f14e45fceea167a5a36dedd4bea2543 | 0.5
+ d3d9446802a44259755d38e6d163e820 | 0.5
+(6 rows)
+
+CREATE VIEW test_tsm_v AS SELECT * FROM test_tsm TABLESAMPLE tsm_test('b') REPEATABLE (9999);
+SELECT * FROM test_tsm_v;
+ a | b
+----------------------------------+-----
+ c4ca4238a0b923820dcc509a6f75849b | 0.5
+ e4da3b7fbbce2345d7772b0674a318d5 | 0.5
+ 1679091c5a880faf6fb5e6087eb1b2dc | 0.5
+ 8f14e45fceea167a5a36dedd4bea2543 | 0.5
+ c9f0f895fb98ab9159f51fd0297e236d | 0.5
+(5 rows)
+
+DROP TABLESAMPLE METHOD tsm_test;
+ERROR: cannot drop tablesample method tsm_test because extension tsm_test requires it
+HINT: You can drop extension tsm_test instead.
+DROP EXTENSION tsm_test;
+ERROR: cannot drop extension tsm_test because other objects depend on it
+DETAIL: view test_tsm_v depends on tablesample method tsm_test
+HINT: Use DROP ... CASCADE to drop the dependent objects too.
+DROP EXTENSION tsm_test CASCADE;
+NOTICE: drop cascades to view test_tsm_v
+SELECT * FROM pg_tablesample_method WHERE tsmname = 'tsm_test';
+ tsmname | tsmseqscan | tsmpagemode | tsminit | tsmnextblock | tsmnexttuple | tsmexaminetuple | tsmend | tsmreset | tsmcost
+---------+------------+-------------+---------+--------------+--------------+-----------------+--------+----------+---------
+(0 rows)
+
diff --git a/src/test/modules/tablesample/sql/tablesample.sql b/src/test/modules/tablesample/sql/tablesample.sql
new file mode 100644
index 0000000..b1104d6
--- /dev/null
+++ b/src/test/modules/tablesample/sql/tablesample.sql
@@ -0,0 +1,14 @@
+CREATE EXTENSION tsm_test;
+
+CREATE TABLE test_tsm AS SELECT md5(i::text) a, 0.5::float b FROM generate_series(1,10) g(i);
+
+SELECT * FROM test_tsm TABLESAMPLE tsm_test('b') REPEATABLE (1);
+
+CREATE VIEW test_tsm_v AS SELECT * FROM test_tsm TABLESAMPLE tsm_test('b') REPEATABLE (9999);
+SELECT * FROM test_tsm_v;
+
+DROP TABLESAMPLE METHOD tsm_test;
+DROP EXTENSION tsm_test;
+DROP EXTENSION tsm_test CASCADE;
+
+SELECT * FROM pg_tablesample_method WHERE tsmname = 'tsm_test';
diff --git a/src/test/modules/tablesample/tsm_test--1.0.sql b/src/test/modules/tablesample/tsm_test--1.0.sql
new file mode 100644
index 0000000..e5a9ae8
--- /dev/null
+++ b/src/test/modules/tablesample/tsm_test--1.0.sql
@@ -0,0 +1,52 @@
+/* src/test/modules/tablesample/tsm_test--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION tsm_test" to load this file. \quit
+
+CREATE FUNCTION tsm_test_init(internal, int4, text)
+RETURNS void
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION tsm_test_nextblock(internal)
+RETURNS int4
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION tsm_test_nexttuple(internal, int4, int2)
+RETURNS int2
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION tsm_test_examinetuple(internal, int4, internal, bool)
+RETURNS bool
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION tsm_test_end(internal)
+RETURNS void
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION tsm_test_reset(internal)
+RETURNS void
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION tsm_test_cost(internal, internal, internal, internal, internal, internal, internal)
+RETURNS void
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+
+CREATE TABLESAMPLE METHOD tsm_test (
+ SEQSCAN = true,
+ PAGEMODE = true,
+ INIT = tsm_test_init,
+ NEXTBLOCK = tsm_test_nextblock,
+ NEXTTUPLE = tsm_test_nexttuple,
+ EXAMINETUPLE = tsm_test_examinetuple,
+ END = tsm_test_end,
+ RESET = tsm_test_reset,
+ COST = tsm_test_cost
+);
diff --git a/src/test/modules/tablesample/tsm_test.c b/src/test/modules/tablesample/tsm_test.c
new file mode 100644
index 0000000..be4dcb9
--- /dev/null
+++ b/src/test/modules/tablesample/tsm_test.c
@@ -0,0 +1,228 @@
+/*-------------------------------------------------------------------------
+ *
+ * tsm_test.c
+ * Simple example of a custom tablesample method
+ *
+ * Copyright (c) 2007-2014, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/test/modules/tablesample/tsm_test.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "fmgr.h"
+
+#include "access/htup_details.h"
+#include "access/relscan.h"
+#include "access/tupdesc.h"
+#include "catalog/pg_type.h"
+#include "nodes/execnodes.h"
+#include "nodes/relation.h"
+#include "optimizer/clauses.h"
+#include "storage/bufmgr.h"
+#include "utils/builtins.h"
+#include "utils/lsyscache.h"
+#include "utils/rel.h"
+#include "utils/sampling.h"
+#include "utils/tablesample.h"
+
+PG_MODULE_MAGIC;
+
+/* State */
+typedef struct
+{
+ uint32 seed; /* random seed */
+ AttrNumber attnum; /* column to check */
+ TupleDesc tupDesc; /* tuple descriptor of table */
+ BlockNumber startblock; /* starting block, we use ths for syncscan support */
+ BlockNumber nblocks; /* total blocks in relation */
+ BlockNumber blockno; /* current block */
+ OffsetNumber lt; /* last tuple returned from current block */
+ SamplerRandomState randstate; /* random generator state */
+} tsm_test_state;
+
+
+PG_FUNCTION_INFO_V1(tsm_test_init);
+PG_FUNCTION_INFO_V1(tsm_test_nextblock);
+PG_FUNCTION_INFO_V1(tsm_test_nexttuple);
+PG_FUNCTION_INFO_V1(tsm_test_examinetuple);
+PG_FUNCTION_INFO_V1(tsm_test_end);
+PG_FUNCTION_INFO_V1(tsm_test_reset);
+PG_FUNCTION_INFO_V1(tsm_test_cost);
+
+/*
+ * Initialize the state.
+ */
+Datum
+tsm_test_init(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ uint32 seed = PG_GETARG_UINT32(1);
+ char *attname;
+ AttrNumber attnum;
+ Oid atttype;
+ Relation rel = scanstate->ss.ss_currentRelation;
+ HeapScanDesc scan = scanstate->ss.ss_currentScanDesc;
+ tsm_test_state *state;
+ TupleDesc tupDesc = RelationGetDescr(rel);
+
+ if (PG_ARGISNULL(2))
+ ereport(ERROR,
+ (errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED),
+ errmsg("invalid parameter for tablesample method tsm_test"),
+ errhint("attnum cannot be NULL.")));
+
+ attname = text_to_cstring(PG_GETARG_TEXT_P(2));
+
+ attnum = get_attnum(rel->rd_id, attname);
+ if (!AttrNumberIsForUserDefinedAttr(attnum))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid parameter for tablesample method tsm_test"),
+ errhint("column %s does not exist", attname)));
+
+ atttype = get_atttype(rel->rd_id, attnum);
+ if (atttype != FLOAT8OID)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid parameter for tablesample method tsm_test"),
+ errhint("column %s is not of type float.", attname)));
+
+ state = palloc0(sizeof(tsm_test_state));
+
+ /* Remember initial values for reinit */
+ state->seed = seed;
+ state->attnum = attnum;
+ state->tupDesc = tupDesc;
+ state->startblock = scan->rs_startblock;
+ state->nblocks = scan->rs_nblocks;
+ state->blockno = InvalidBlockNumber;
+ state->lt = InvalidOffsetNumber;
+ sampler_random_init_state(state->seed, state->randstate);
+
+ scanstate->tsmdata = (void *) state;
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Get next block number to read or InvalidBlockNumber if we are at the
+ * end of the relation.
+ */
+Datum
+tsm_test_nextblock(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ tsm_test_state *state = (tsm_test_state *) scanstate->tsmdata;
+
+ /* Cycle from startblock to startblock to support syncscan. */
+ if (state->blockno == InvalidBlockNumber)
+ state->blockno = state->startblock;
+ else
+ {
+ state->blockno++;
+
+ if (state->blockno >= state->nblocks)
+ state->blockno = 0;
+
+ if (state->blockno == state->startblock)
+ PG_RETURN_UINT32(InvalidBlockNumber);
+ }
+
+ PG_RETURN_UINT32(state->blockno);
+}
+
+/*
+ * Get next tuple from current block.
+ */
+Datum
+tsm_test_nexttuple(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ OffsetNumber maxoffset = PG_GETARG_UINT16(2);
+ tsm_test_state *state = (tsm_test_state *) scanstate->tsmdata;
+
+ if (state->lt == InvalidOffsetNumber)
+ state->lt = FirstOffsetNumber;
+ else if (++state->lt > maxoffset)
+ PG_RETURN_UINT16(InvalidOffsetNumber);
+
+ PG_RETURN_UINT16(state->lt);
+}
+
+/*
+ * Examine tuple and decide if it should be returned.
+ */
+Datum
+tsm_test_examinetuple(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ HeapTuple tuple = (HeapTuple) PG_GETARG_POINTER(2);
+ bool visible = PG_GETARG_BOOL(3);
+ tsm_test_state *state = (tsm_test_state *) scanstate->tsmdata;
+ bool isnull;
+ float8 val, rand;
+
+ if (!visible)
+ PG_RETURN_BOOL(false);
+
+ val = DatumGetFloat8(heap_getattr(tuple, state->attnum, state->tupDesc, &isnull));
+ rand = sampler_random_fract(state->randstate);
+ if (isnull || val < rand)
+ PG_RETURN_BOOL(false);
+ else
+ PG_RETURN_BOOL(true);
+}
+
+/*
+ * Cleanup method.
+ */
+Datum
+tsm_test_end(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+
+ pfree(scanstate->tsmdata);
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Reset state (called by ReScan).
+ */
+Datum
+tsm_test_reset(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ tsm_test_state *state = (tsm_test_state *) scanstate->tsmdata;
+
+ state->blockno = InvalidBlockNumber;
+ state->lt = InvalidOffsetNumber;
+
+ sampler_random_init_state(state->seed, state->randstate);
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Costing function.
+ */
+Datum
+tsm_test_cost(PG_FUNCTION_ARGS)
+{
+ Path *path = (Path *) PG_GETARG_POINTER(1);
+ RelOptInfo *baserel = (RelOptInfo *) PG_GETARG_POINTER(2);
+ BlockNumber *pages = (BlockNumber *) PG_GETARG_POINTER(4);
+ double *tuples = (double *) PG_GETARG_POINTER(5);
+
+ *pages = baserel->pages;
+
+ /* This is very bad estimation */
+ *tuples = path->rows = path->rows/2;
+
+ PG_RETURN_VOID();
+}
+
diff --git a/src/test/modules/tablesample/tsm_test.control b/src/test/modules/tablesample/tsm_test.control
new file mode 100644
index 0000000..a7b2741
--- /dev/null
+++ b/src/test/modules/tablesample/tsm_test.control
@@ -0,0 +1,5 @@
+# tsm_test extension
+comment = 'test module for custom tablesample method'
+default_version = '1.0'
+module_pathname = '$libdir/tsm_test'
+relocatable = true
--
1.9.1
0002-tablesample-v11.patchtext/x-diff; name=0002-tablesample-v11.patchDownload
>From f39a165a2273417bd8f36ce9af611d511ca39eed Mon Sep 17 00:00:00 2001
From: Petr Jelinek <pjmodos@pjmodos.net>
Date: Wed, 7 Jan 2015 23:37:55 +0100
Subject: [PATCH 2/4] tablesample v11
---
contrib/file_fdw/file_fdw.c | 2 +-
contrib/postgres_fdw/postgres_fdw.c | 2 +-
doc/src/sgml/catalogs.sgml | 120 ++++++
doc/src/sgml/ref/select.sgml | 38 +-
src/backend/access/Makefile | 3 +-
src/backend/access/heap/heapam.c | 43 ++-
src/backend/catalog/Makefile | 2 +-
src/backend/commands/analyze.c | 2 +-
src/backend/commands/explain.c | 7 +
src/backend/executor/Makefile | 2 +-
src/backend/executor/execAmi.c | 8 +
src/backend/executor/execCurrent.c | 1 +
src/backend/executor/execProcnode.c | 14 +
src/backend/executor/nodeSamplescan.c | 556 ++++++++++++++++++++++++++++
src/backend/nodes/copyfuncs.c | 60 +++
src/backend/nodes/equalfuncs.c | 37 ++
src/backend/nodes/nodeFuncs.c | 12 +
src/backend/nodes/outfuncs.c | 48 +++
src/backend/nodes/readfuncs.c | 45 +++
src/backend/optimizer/path/allpaths.c | 49 +++
src/backend/optimizer/path/costsize.c | 68 ++++
src/backend/optimizer/plan/createplan.c | 69 ++++
src/backend/optimizer/plan/setrefs.c | 11 +
src/backend/optimizer/plan/subselect.c | 1 +
src/backend/optimizer/util/pathnode.c | 22 ++
src/backend/parser/gram.y | 37 +-
src/backend/parser/parse_clause.c | 47 +++
src/backend/parser/parse_func.c | 131 +++++++
src/backend/utils/Makefile | 3 +-
src/backend/utils/adt/ruleutils.c | 50 +++
src/backend/utils/cache/syscache.c | 23 ++
src/backend/utils/misc/sampling.c | 33 +-
src/backend/utils/tablesample/Makefile | 17 +
src/backend/utils/tablesample/bernoulli.c | 224 +++++++++++
src/backend/utils/tablesample/system.c | 185 +++++++++
src/include/access/heapam.h | 4 +
src/include/access/relscan.h | 1 +
src/include/catalog/indexing.h | 5 +
src/include/catalog/pg_proc.h | 25 ++
src/include/catalog/pg_tablesample_method.h | 77 ++++
src/include/executor/nodeSamplescan.h | 24 ++
src/include/nodes/execnodes.h | 18 +
src/include/nodes/nodes.h | 4 +
src/include/nodes/parsenodes.h | 36 ++
src/include/nodes/plannodes.h | 6 +
src/include/optimizer/cost.h | 1 +
src/include/optimizer/pathnode.h | 2 +
src/include/parser/kwlist.h | 1 +
src/include/parser/parse_func.h | 4 +
src/include/utils/rel.h | 1 -
src/include/utils/sampling.h | 15 +-
src/include/utils/syscache.h | 2 +
src/include/utils/tablesample.h | 27 ++
src/test/regress/expected/sanity_check.out | 1 +
src/test/regress/expected/tablesample.out | 168 +++++++++
src/test/regress/parallel_schedule | 2 +-
src/test/regress/serial_schedule | 1 +
src/test/regress/sql/tablesample.sql | 42 +++
58 files changed, 2403 insertions(+), 36 deletions(-)
create mode 100644 src/backend/executor/nodeSamplescan.c
create mode 100644 src/backend/utils/tablesample/Makefile
create mode 100644 src/backend/utils/tablesample/bernoulli.c
create mode 100644 src/backend/utils/tablesample/system.c
create mode 100644 src/include/catalog/pg_tablesample_method.h
create mode 100644 src/include/executor/nodeSamplescan.h
create mode 100644 src/include/utils/tablesample.h
create mode 100644 src/test/regress/expected/tablesample.out
create mode 100644 src/test/regress/sql/tablesample.sql
diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index 249d541..6a813a3 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -1096,7 +1096,7 @@ file_acquire_sample_rows(Relation onerel, int elevel,
* Found a suitable tuple, so save it, replacing one old tuple
* at random
*/
- int k = (int) (targrows * sampler_random_fract());
+ int k = (int) (targrows * sampler_random_fract(rstate.randstate));
Assert(k >= 0 && k < targrows);
heap_freetuple(rows[k]);
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 74ef792..5903384 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -2543,7 +2543,7 @@ analyze_row_processor(PGresult *res, int row, PgFdwAnalyzeState *astate)
if (astate->rowstoskip <= 0)
{
/* Choose a random reservoir element to replace. */
- pos = (int) (targrows * sampler_random_fract());
+ pos = (int) (targrows * sampler_random_fract(astate->rstate.randstate));
Assert(pos >= 0 && pos < targrows);
heap_freetuple(astate->rows[pos]);
}
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index d0b78f2..af808a0 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -269,6 +269,11 @@
</row>
<row>
+ <entry><link linkend="catalog-pg-tablesample-method"><structname>pg_tablesample_method</structname></link></entry>
+ <entry>table sampling methods</entry>
+ </row>
+
+ <row>
<entry><link linkend="catalog-pg-tablespace"><structname>pg_tablespace</structname></link></entry>
<entry>tablespaces within this database cluster</entry>
</row>
@@ -5980,6 +5985,121 @@
</sect1>
+ <sect1 id="catalog-pg-tablesample-method">
+ <title><structname>pg_tabesample_method</structname></title>
+
+ <indexterm zone="catalog-pg-tablesample-method">
+ <primary>pg_am</primary>
+ </indexterm>
+
+ <para>
+ The catalog <structname>pg_tablesample_method</structname> stores
+ information about table sampling methods which can be used in
+ <command>TABLESAMPLE</command> clause of a <command>SELECT</command>
+ statement.
+ </para>
+
+ <table>
+ <title><structname>pg_tablesample_method</> Columns</title>
+
+ <tgroup cols="4">
+ <thead>
+ <row>
+ <entry>Name</entry>
+ <entry>Type</entry>
+ <entry>References</entry>
+ <entry>Description</entry>
+ </row>
+ </thead>
+ <tbody>
+
+ <row>
+ <entry><structfield>oid</structfield></entry>
+ <entry><type>oid</type></entry>
+ <entry></entry>
+ <entry>Row identifier (hidden attribute; must be explicitly selected)</entry>
+ </row>
+
+ <row>
+ <entry><structfield>tsmname</structfield></entry>
+ <entry><type>name</type></entry>
+ <entry></entry>
+ <entry>Name of the sampling method</entry>
+ </row>
+
+ <row>
+ <entry><structfield>tsmseqscan</structfield></entry>
+ <entry><type>bool</type></entry>
+ <entry></entry>
+ <entry>Does the sampling method scan the whole table sequentially?
+ </entry>
+ </row>
+
+ <row>
+ <entry><structfield>tsmpagemode</structfield></entry>
+ <entry><type>bool</type></entry>
+ <entry></entry>
+ <entry>Does the sampling method always read whole pages?
+ </entry>
+ </row>
+
+ <row>
+ <entry><structfield>tsminit</structfield></entry>
+ <entry><type>regproc</type></entry>
+ <entry><literal><link linkend="catalog-pg-proc"><structname>pg_proc</structname></link>.oid</literal></entry>
+ <entry><quote>Initialize the sampling scan</quote> function</entry>
+ </row>
+
+ <row>
+ <entry><structfield>tsmnextblock</structfield></entry>
+ <entry><type>regproc</type></entry>
+ <entry><literal><link linkend="catalog-pg-proc"><structname>pg_proc</structname></link>.oid</literal></entry>
+ <entry><quote>Get next block number</quote> function</entry>
+ </row>
+
+ <row>
+ <entry><structfield>tsmnexttuple</structfield></entry>
+ <entry><type>regproc</type></entry>
+ <entry><literal><link linkend="catalog-pg-proc"><structname>pg_proc</structname></link>.oid</literal></entry>
+ <entry><quote>Get next tuple offset</quote> function</entry>
+ </row>
+
+ <row>
+ <entry><structfield>tsmexaminetuple</structfield></entry>
+ <entry><type>regproc</type></entry>
+ <entry><literal><link linkend="catalog-pg-proc"><structname>pg_proc</structname></link>.oid</literal></entry>
+ <entry>Function which examines the tuple contents and decides if to
+ return in, or zero if none</entry>
+ </row>
+
+ <row>
+ <entry><structfield>tsmend</structfield></entry>
+ <entry><type>regproc</type></entry>
+ <entry><literal><link linkend="catalog-pg-proc"><structname>pg_proc</structname></link>.oid</literal></entry>
+ <entry><quote>End the sampling scan</quote> function</entry>
+ </row>
+
+ <row>
+ <entry><structfield>tsmreset</structfield></entry>
+ <entry><type>regproc</type></entry>
+ <entry><literal><link linkend="catalog-pg-proc"><structname>pg_proc</structname></link>.oid</literal></entry>
+ <entry><quote>Restart the state of sampling scan</quote> function</entry>
+ </row>
+
+ <row>
+ <entry><structfield>tsmcost</structfield></entry>
+ <entry><type>regproc</type></entry>
+ <entry><literal><link linkend="catalog-pg-proc"><structname>pg_proc</structname></link>.oid</literal></entry>
+ <entry>Costing function</entry>
+ </row>
+
+ </tbody>
+ </tgroup>
+ </table>
+
+ </sect1>
+
+
<sect1 id="catalog-pg-tablespace">
<title><structname>pg_tablespace</structname></title>
diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml
index 2295f63..c16285e 100644
--- a/doc/src/sgml/ref/select.sgml
+++ b/doc/src/sgml/ref/select.sgml
@@ -49,7 +49,7 @@ SELECT [ ALL | DISTINCT [ ON ( <replaceable class="parameter">expression</replac
<phrase>where <replaceable class="parameter">from_item</replaceable> can be one of:</phrase>
- [ ONLY ] <replaceable class="parameter">table_name</replaceable> [ * ] [ [ AS ] <replaceable class="parameter">alias</replaceable> [ ( <replaceable class="parameter">column_alias</replaceable> [, ...] ) ] ]
+ [ ONLY ] <replaceable class="parameter">table_name</replaceable> [ * ] [ TABLESAMPLE <replaceable class="parameter">sampling_method</replaceable> ( <replaceable class="parameter">argument</replaceable> [, ...] ) [ REPEATABLE ( <replaceable class="parameter">seed</replaceable> ) ] ] [ [ AS ] <replaceable class="parameter">alias</replaceable> [ ( <replaceable class="parameter">column_alias</replaceable> [, ...] ) ] ]
[ LATERAL ] ( <replaceable class="parameter">select</replaceable> ) [ AS ] <replaceable class="parameter">alias</replaceable> [ ( <replaceable class="parameter">column_alias</replaceable> [, ...] ) ]
<replaceable class="parameter">with_query_name</replaceable> [ [ AS ] <replaceable class="parameter">alias</replaceable> [ ( <replaceable class="parameter">column_alias</replaceable> [, ...] ) ] ]
[ LATERAL ] <replaceable class="parameter">function_name</replaceable> ( [ <replaceable class="parameter">argument</replaceable> [, ...] ] )
@@ -317,6 +317,42 @@ TABLE [ ONLY ] <replaceable class="parameter">table_name</replaceable> [ * ]
</varlistentry>
<varlistentry>
+ <term>TABLESAMPLE <replaceable class="parameter">sampling_method</replaceable> ( <replaceable class="parameter">argument</replaceable> [, ...] ) [ REPEATABLE ( <replaceable class="parameter">seed</replaceable> ) ]</term>
+ <listitem>
+ <para>
+ Table sample clause after
+ <replaceable class="parameter">table_name</replaceable> indicates that
+ a <replaceable class="parameter">sampling_method</replaceable> should
+ be used to retrieve subset of rows in the table.
+ The <replaceable class="parameter">sampling_method</replaceable> can be
+ one of:
+ <itemizedlist>
+ <listitem>
+ <para><literal>SYSTEM</literal></para>
+ </listitem>
+ <listitem>
+ <para><literal>BERNOULLI</literal></para>
+ </listitem>
+ </itemizedlist>
+ Both of those sampling methods currently accept only single argument
+ which is the percent (floating point from 0 to 100) of the rows to
+ be returned.
+ The <literal>SYSTEM</literal> sampling method does block level
+ sampling with each block having same chance of being selected and
+ returns all rows from each selected block.
+ The <literal>BERNOULLI</literal> scans whole table and returns
+ individual rows with equal probability.
+ The optional numeric parameter <literal>REPEATABLE</literal> is used
+ as random seed for sampling. Note that subsequent commands may return
+ different results even if same <literal>REPEATABLE</literal> clause
+ was specified. This happens because <acronym>DML</acronym> statements
+ and maintenance operations such as <command>VACUUM</> affect physical
+ distribution of data.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><replaceable class="parameter">alias</replaceable></term>
<listitem>
<para>
diff --git a/src/backend/access/Makefile b/src/backend/access/Makefile
index 21721b4..238057a 100644
--- a/src/backend/access/Makefile
+++ b/src/backend/access/Makefile
@@ -8,6 +8,7 @@ subdir = src/backend/access
top_builddir = ../../..
include $(top_builddir)/src/Makefile.global
-SUBDIRS = brin common gin gist hash heap index nbtree rmgrdesc spgist transam
+SUBDIRS = brin common gin gist hash heap index nbtree rmgrdesc spgist \
+ transam
include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index cb6f8a3..76c2d3a 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -79,8 +79,9 @@ bool synchronize_seqscans = true;
static HeapScanDesc heap_beginscan_internal(Relation relation,
Snapshot snapshot,
int nkeys, ScanKey key,
- bool allow_strat, bool allow_sync,
- bool is_bitmapscan, bool temp_snap);
+ bool allow_strat, bool allow_sync, bool allow_pagemode,
+ bool is_bitmapscan, bool is_samplescan,
+ bool temp_snap);
static HeapTuple heap_prepare_insert(Relation relation, HeapTuple tup,
TransactionId xid, CommandId cid, int options);
static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
@@ -293,9 +294,10 @@ initscan(HeapScanDesc scan, ScanKey key, bool is_rescan)
/*
* Currently, we don't have a stats counter for bitmap heap scans (but the
- * underlying bitmap index scans will be counted).
+ * underlying bitmap index scans will be counted) or sample scans (we only
+ * update stats for tuple fetches there)
*/
- if (!scan->rs_bitmapscan)
+ if (!scan->rs_bitmapscan && !scan->rs_samplescan)
pgstat_count_heap_scan(scan->rs_rd);
}
@@ -314,7 +316,7 @@ heap_setscanlimits(HeapScanDesc scan, BlockNumber startBlk, BlockNumber numBlks)
* In page-at-a-time mode it performs additional work, namely determining
* which tuples on the page are visible.
*/
-static void
+void
heapgetpage(HeapScanDesc scan, BlockNumber page)
{
Buffer buffer;
@@ -1289,7 +1291,7 @@ heap_openrv_extended(const RangeVar *relation, LOCKMODE lockmode,
* heap_beginscan - begin relation scan
*
* heap_beginscan_strat offers an extended API that lets the caller control
- * whether a nondefault buffer access strategy can be used, and whether
+ * whether a nondefault buffer access strategy can be used and whether
* syncscan can be chosen (possibly resulting in the scan not starting from
* block zero). Both of these default to TRUE with plain heap_beginscan.
*
@@ -1297,6 +1299,9 @@ heap_openrv_extended(const RangeVar *relation, LOCKMODE lockmode,
* HeapScanDesc for a bitmap heap scan. Although that scan technology is
* really quite unlike a standard seqscan, there is just enough commonality
* to make it worth using the same data structure.
+ *
+ * heap_beginscan_ss is alternate entry point for setting up a
+ * HeapScanDesc for a TABLESAMPLE scan.
* ----------------
*/
HeapScanDesc
@@ -1304,7 +1309,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
int nkeys, ScanKey key)
{
return heap_beginscan_internal(relation, snapshot, nkeys, key,
- true, true, false, false);
+ true, true, true, false, false, false);
}
HeapScanDesc
@@ -1314,7 +1319,7 @@ heap_beginscan_catalog(Relation relation, int nkeys, ScanKey key)
Snapshot snapshot = RegisterSnapshot(GetCatalogSnapshot(relid));
return heap_beginscan_internal(relation, snapshot, nkeys, key,
- true, true, false, true);
+ true, true, true, false, false, true);
}
HeapScanDesc
@@ -1323,7 +1328,8 @@ heap_beginscan_strat(Relation relation, Snapshot snapshot,
bool allow_strat, bool allow_sync)
{
return heap_beginscan_internal(relation, snapshot, nkeys, key,
- allow_strat, allow_sync, false, false);
+ allow_strat, allow_sync, true,
+ false, false, false);
}
HeapScanDesc
@@ -1331,14 +1337,24 @@ heap_beginscan_bm(Relation relation, Snapshot snapshot,
int nkeys, ScanKey key)
{
return heap_beginscan_internal(relation, snapshot, nkeys, key,
- false, false, true, false);
+ false, false, true, true, false, false);
+}
+
+HeapScanDesc
+heap_beginscan_ss(Relation relation, Snapshot snapshot,
+ int nkeys, ScanKey key,
+ bool allow_strat, bool allow_pagemode)
+{
+ return heap_beginscan_internal(relation, snapshot, nkeys, key,
+ allow_strat, false, allow_pagemode,
+ false, true, false);
}
static HeapScanDesc
heap_beginscan_internal(Relation relation, Snapshot snapshot,
int nkeys, ScanKey key,
- bool allow_strat, bool allow_sync,
- bool is_bitmapscan, bool temp_snap)
+ bool allow_strat, bool allow_sync, bool allow_pagemode,
+ bool is_bitmapscan, bool is_samplescan, bool temp_snap)
{
HeapScanDesc scan;
@@ -1360,6 +1376,7 @@ heap_beginscan_internal(Relation relation, Snapshot snapshot,
scan->rs_snapshot = snapshot;
scan->rs_nkeys = nkeys;
scan->rs_bitmapscan = is_bitmapscan;
+ scan->rs_samplescan = is_samplescan;
scan->rs_strategy = NULL; /* set in initscan */
scan->rs_allow_strat = allow_strat;
scan->rs_allow_sync = allow_sync;
@@ -1368,7 +1385,7 @@ heap_beginscan_internal(Relation relation, Snapshot snapshot,
/*
* we can use page-at-a-time mode if it's an MVCC-safe snapshot
*/
- scan->rs_pageatatime = IsMVCCSnapshot(snapshot);
+ scan->rs_pageatatime = allow_pagemode && IsMVCCSnapshot(snapshot);
/*
* For a seqscan in a serializable transaction, acquire a predicate lock
diff --git a/src/backend/catalog/Makefile b/src/backend/catalog/Makefile
index a403c64..34db3e6 100644
--- a/src/backend/catalog/Makefile
+++ b/src/backend/catalog/Makefile
@@ -39,7 +39,7 @@ POSTGRES_BKI_SRCS = $(addprefix $(top_srcdir)/src/include/catalog/,\
pg_ts_config.h pg_ts_config_map.h pg_ts_dict.h \
pg_ts_parser.h pg_ts_template.h pg_extension.h \
pg_foreign_data_wrapper.h pg_foreign_server.h pg_user_mapping.h \
- pg_foreign_table.h pg_policy.h \
+ pg_foreign_table.h pg_policy.h pg_tablesample_method.h \
pg_default_acl.h pg_seclabel.h pg_shseclabel.h pg_collation.h pg_range.h \
toasting.h indexing.h \
)
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 5730f26..eb0da5b 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -1147,7 +1147,7 @@ acquire_sample_rows(Relation onerel, int elevel,
* Found a suitable tuple, so save it, replacing one
* old tuple at random
*/
- int k = (int) (targrows * sampler_random_fract());
+ int k = (int) (targrows * sampler_random_fract(rstate.randstate));
Assert(k >= 0 && k < targrows);
heap_freetuple(rows[k]);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 315a528..90190cd 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -732,6 +732,7 @@ ExplainPreScanNode(PlanState *planstate, Bitmapset **rels_used)
case T_WorkTableScan:
case T_ForeignScan:
case T_CustomScan:
+ case T_SampleScan:
*rels_used = bms_add_member(*rels_used,
((Scan *) plan)->scanrelid);
break;
@@ -957,6 +958,9 @@ ExplainNode(PlanState *planstate, List *ancestors,
else
pname = sname;
break;
+ case T_SampleScan:
+ pname = sname = "Sample Scan";
+ break;
case T_Material:
pname = sname = "Materialize";
break;
@@ -1074,6 +1078,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
case T_WorkTableScan:
case T_ForeignScan:
case T_CustomScan:
+ case T_SampleScan:
ExplainScanTarget((Scan *) plan, es);
break;
case T_IndexScan:
@@ -1326,6 +1331,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
case T_CteScan:
case T_WorkTableScan:
case T_SubqueryScan:
+ case T_SampleScan:
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
@@ -2224,6 +2230,7 @@ ExplainTargetRel(Plan *plan, Index rti, ExplainState *es)
case T_TidScan:
case T_ForeignScan:
case T_CustomScan:
+ case T_SampleScan:
case T_ModifyTable:
/* Assert it's on a real relation */
Assert(rte->rtekind == RTE_RELATION);
diff --git a/src/backend/executor/Makefile b/src/backend/executor/Makefile
index af707b0..75f799c 100644
--- a/src/backend/executor/Makefile
+++ b/src/backend/executor/Makefile
@@ -21,7 +21,7 @@ OBJS = execAmi.o execCurrent.o execGrouping.o execJunk.o execMain.o \
nodeLimit.o nodeLockRows.o \
nodeMaterial.o nodeMergeAppend.o nodeMergejoin.o nodeModifyTable.o \
nodeNestloop.o nodeFunctionscan.o nodeRecursiveunion.o nodeResult.o \
- nodeSeqscan.o nodeSetOp.o nodeSort.o nodeUnique.o \
+ nodeSamplescan.o nodeSeqscan.o nodeSetOp.o nodeSort.o nodeUnique.o \
nodeValuesscan.o nodeCtescan.o nodeWorktablescan.o \
nodeGroup.o nodeSubplan.o nodeSubqueryscan.o nodeTidscan.o \
nodeForeignscan.o nodeWindowAgg.o tstoreReceiver.o spi.o
diff --git a/src/backend/executor/execAmi.c b/src/backend/executor/execAmi.c
index 6ebad2f..4948a26 100644
--- a/src/backend/executor/execAmi.c
+++ b/src/backend/executor/execAmi.c
@@ -39,6 +39,7 @@
#include "executor/nodeNestloop.h"
#include "executor/nodeRecursiveunion.h"
#include "executor/nodeResult.h"
+#include "executor/nodeSamplescan.h"
#include "executor/nodeSeqscan.h"
#include "executor/nodeSetOp.h"
#include "executor/nodeSort.h"
@@ -155,6 +156,10 @@ ExecReScan(PlanState *node)
ExecReScanSeqScan((SeqScanState *) node);
break;
+ case T_SampleScanState:
+ ExecReScanSampleScan((SampleScanState *) node);
+ break;
+
case T_IndexScanState:
ExecReScanIndexScan((IndexScanState *) node);
break;
@@ -480,6 +485,9 @@ ExecSupportsBackwardScan(Plan *node)
}
return false;
+ case T_SampleScan:
+ return false;
+
case T_Material:
case T_Sort:
/* these don't evaluate tlist */
diff --git a/src/backend/executor/execCurrent.c b/src/backend/executor/execCurrent.c
index d87be96..bcd287f 100644
--- a/src/backend/executor/execCurrent.c
+++ b/src/backend/executor/execCurrent.c
@@ -261,6 +261,7 @@ search_plan_tree(PlanState *node, Oid table_oid)
* Relation scan nodes can all be treated alike
*/
case T_SeqScanState:
+ case T_SampleScanState:
case T_IndexScanState:
case T_IndexOnlyScanState:
case T_BitmapHeapScanState:
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 9892499..03c2feb 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -102,6 +102,7 @@
#include "executor/nodeNestloop.h"
#include "executor/nodeRecursiveunion.h"
#include "executor/nodeResult.h"
+#include "executor/nodeSamplescan.h"
#include "executor/nodeSeqscan.h"
#include "executor/nodeSetOp.h"
#include "executor/nodeSort.h"
@@ -190,6 +191,11 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
estate, eflags);
break;
+ case T_SampleScan:
+ result = (PlanState *) ExecInitSampleScan((SampleScan *) node,
+ estate, eflags);
+ break;
+
case T_IndexScan:
result = (PlanState *) ExecInitIndexScan((IndexScan *) node,
estate, eflags);
@@ -406,6 +412,10 @@ ExecProcNode(PlanState *node)
result = ExecSeqScan((SeqScanState *) node);
break;
+ case T_SampleScanState:
+ result = ExecSampleScan((SampleScanState *) node);
+ break;
+
case T_IndexScanState:
result = ExecIndexScan((IndexScanState *) node);
break;
@@ -644,6 +654,10 @@ ExecEndNode(PlanState *node)
ExecEndSeqScan((SeqScanState *) node);
break;
+ case T_SampleScanState:
+ ExecEndSampleScan((SampleScanState *) node);
+ break;
+
case T_IndexScanState:
ExecEndIndexScan((IndexScanState *) node);
break;
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
new file mode 100644
index 0000000..c7951d1
--- /dev/null
+++ b/src/backend/executor/nodeSamplescan.c
@@ -0,0 +1,556 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeSamplescan.c
+ * Support routines for sample scans of relations (table sampling).
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ * src/backend/executor/nodeSamplescan.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/relscan.h"
+#include "catalog/pg_tablesample_method.h"
+#include "executor/executor.h"
+#include "executor/nodeSamplescan.h"
+#include "miscadmin.h"
+#include "parser/parsetree.h"
+#include "pgstat.h"
+#include "storage/bufmgr.h"
+#include "storage/predicate.h"
+#include "utils/rel.h"
+#include "utils/syscache.h"
+#include "utils/tqual.h"
+
+static void InitScanRelation(SampleScanState *node, EState *estate,
+ int eflags, TableSampleClause *tablesample);
+static TupleTableSlot *SampleNext(SampleScanState *node);
+static HeapTuple samplenexttup(SampleScanState *node, HeapScanDesc scan);
+
+
+/*
+ * Initialize the sampling method - loads function info and
+ * calls the tsminit function.
+ *
+ * We need special handling for this because the tsminit function
+ * is allowed to take optional additional arguments.
+ */
+static void
+InitSamplingMethod(SampleScanState *scanstate, TableSampleClause *tablesample)
+{
+ FunctionCallInfoData fcinfo;
+ int i;
+ List *args = tablesample->args;
+ ListCell *arg;
+ ExprContext *econtext = scanstate->ss.ps.ps_ExprContext;
+
+ /* Load functions */
+ fmgr_info(tablesample->tsminit, &(scanstate->tsminit));
+ fmgr_info(tablesample->tsmnextblock, &(scanstate->tsmnextblock));
+ fmgr_info(tablesample->tsmnexttuple, &(scanstate->tsmnexttuple));
+ if (OidIsValid(tablesample->tsmexaminetuple))
+ fmgr_info(tablesample->tsmexaminetuple, &(scanstate->tsmexaminetuple));
+ else
+ scanstate->tsmexaminetuple.fn_oid = InvalidOid;
+ fmgr_info(tablesample->tsmend, &(scanstate->tsmend));
+ fmgr_info(tablesample->tsmreset, &(scanstate->tsmreset));
+
+ InitFunctionCallInfoData(fcinfo, &scanstate->tsminit,
+ list_length(args) + 2,
+ InvalidOid, NULL, NULL);
+
+ /* First arg is always SampleScanState */
+ fcinfo.arg[0] = PointerGetDatum(scanstate);
+ fcinfo.argnull[0] = false;
+
+ /*
+ * Second arg is always REPEATABLE
+ * When tablesample->repeatable is NULL then REPEATABLE clause was not
+ * specified.
+ * When specified, the expression cannot evaluate to NULL.
+ */
+ if (tablesample->repeatable)
+ {
+ ExprState *argstate = ExecInitExpr((Expr *) tablesample->repeatable,
+ (PlanState *) scanstate);
+ fcinfo.arg[1] = ExecEvalExpr(argstate, econtext,
+ &fcinfo.argnull[1], NULL);
+ if (fcinfo.argnull[1])
+ ereport(ERROR,
+ (errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED),
+ errmsg("REPEATABLE clause cannot be NULL")));
+ }
+ else
+ {
+ fcinfo.arg[1] = UInt32GetDatum(random());
+ fcinfo.argnull[1] = false;
+ }
+
+
+ i = 2;
+ foreach(arg, args)
+ {
+ Expr *argexpr = (Expr *) lfirst(arg);
+ ExprState *argstate = ExecInitExpr(argexpr, (PlanState *) scanstate);
+
+ if (argstate == NULL)
+ {
+ fcinfo.argnull[i] = true;
+ fcinfo.arg[i] = (Datum) 0;;
+ }
+
+ fcinfo.arg[i] = ExecEvalExpr(argstate, econtext,
+ &fcinfo.argnull[i], NULL);
+ i++;
+ }
+ Assert(i == fcinfo.nargs);
+
+ (void) FunctionCallInvoke(&fcinfo);
+}
+
+
+/* ----------------------------------------------------------------
+ * Scan Support
+ * ----------------------------------------------------------------
+ */
+
+/* ----------------------------------------------------------------
+ * SampleNext
+ *
+ * This is a workhorse for ExecSampleScan
+ * ----------------------------------------------------------------
+ */
+static TupleTableSlot *
+SampleNext(SampleScanState *node)
+{
+ TupleTableSlot *slot;
+ HeapScanDesc scan;
+ HeapTuple tuple;
+
+ /*
+ * get information from the scan state
+ */
+ slot = node->ss.ss_ScanTupleSlot;
+ scan = node->ss.ss_currentScanDesc;
+
+ tuple = samplenexttup(node, scan);
+
+ if (tuple)
+ ExecStoreTuple(tuple, /* tuple to store */
+ slot, /* slot to store in */
+ scan->rs_cbuf, /* buffer associated with this tuple */
+ false); /* don't pfree this pointer */
+ else
+ ExecClearTuple(slot);
+
+ return slot;
+}
+
+/*
+ * Check visibility of the tuple.
+ */
+static bool
+SampleTupleVisible(HeapTuple tuple, OffsetNumber tupoffset, HeapScanDesc scan)
+{
+ /*
+ * If this scan is reading whole pages at a time, there is already
+ * visibilty info present in rs_vistuples so we can just search it
+ * for the tupoffset.
+ */
+ if (scan->rs_pageatatime)
+ {
+ int start = 0,
+ end = scan->rs_ntuples - 1;
+
+ /*
+ * Do the binary search over rs_vistuples, it's already sorted by
+ * OffsetNumber so we don't need to do any sorting ourselves here.
+ *
+ * We could use bsearch() here but it's slower for integers because
+ * of the function call overhead and because it needs boiler plate code
+ * it would not save us anything code-wise anyway.
+ */
+ while (start <= end)
+ {
+ int mid = start + (end - start) / 2;
+ OffsetNumber curoffset = scan->rs_vistuples[mid];
+
+ if (curoffset == tupoffset)
+ return true;
+ else if (curoffset > tupoffset)
+ end = mid - 1;
+ else
+ start = mid + 1;
+ }
+
+ return false;
+ }
+ else
+ {
+ /* No pagemode, we have to check the tuple itself. */
+ Snapshot snapshot = scan->rs_snapshot;
+ Buffer buffer = scan->rs_cbuf;
+
+ bool visible = HeapTupleSatisfiesVisibility(tuple, snapshot, buffer);
+
+ CheckForSerializableConflictOut(visible, scan->rs_rd, tuple, buffer,
+ snapshot);
+
+ return visible;
+ }
+}
+
+/*
+ * Read next tuple using the correct sampling method.
+ */
+static HeapTuple
+samplenexttup(SampleScanState *node, HeapScanDesc scan)
+{
+ HeapTuple tuple = &(scan->rs_ctup);
+ bool pagemode = scan->rs_pageatatime;
+ BlockNumber blockno;
+ Page page;
+ ItemId itemid;
+ OffsetNumber tupoffset,
+ maxoffset;
+
+ if (!scan->rs_inited)
+ {
+ /*
+ * return null immediately if relation is empty
+ */
+ if (scan->rs_nblocks == 0)
+ {
+ Assert(!BufferIsValid(scan->rs_cbuf));
+ tuple->t_data = NULL;
+ return NULL;
+ }
+ blockno = DatumGetInt32(FunctionCall1(&node->tsmnextblock,
+ PointerGetDatum(node)));
+ if (!BlockNumberIsValid(blockno))
+ {
+ tuple->t_data = NULL;
+ return NULL;
+ }
+
+ heapgetpage(scan, blockno);
+ scan->rs_inited = true;
+ }
+ else
+ {
+ /* continue from previously returned page/tuple */
+ blockno = scan->rs_cblock; /* current page */
+ }
+
+ /*
+ * When pagemode is disabled, the scan will do visibility checks for each
+ * tuple it finds so the buffer needs to be locked.
+ */
+ if (!pagemode)
+ LockBuffer(scan->rs_cbuf, BUFFER_LOCK_SHARE);
+
+ page = (Page) BufferGetPage(scan->rs_cbuf);
+ maxoffset = PageGetMaxOffsetNumber(page);
+
+ for (;;)
+ {
+ CHECK_FOR_INTERRUPTS();
+
+ tupoffset = DatumGetUInt16(FunctionCall3(&node->tsmnexttuple,
+ PointerGetDatum(node),
+ UInt32GetDatum(blockno),
+ UInt16GetDatum(maxoffset)));
+
+ if (OffsetNumberIsValid(tupoffset))
+ {
+ bool visible;
+ bool found;
+
+ /* Skip invalid tuple pointers. */
+ itemid = PageGetItemId(page, tupoffset);
+ if (!ItemIdIsNormal(itemid))
+ continue;
+
+ tuple->t_data = (HeapTupleHeader) PageGetItem((Page) page, itemid);
+ tuple->t_len = ItemIdGetLength(itemid);
+ ItemPointerSet(&(tuple->t_self), blockno, tupoffset);
+
+ visible = SampleTupleVisible(tuple, tupoffset, scan);
+
+ /*
+ * Let the sampling method examine the actual tuple and decide if we
+ * should return it.
+ *
+ * Note that we let it examine even invisible tuples for
+ * statistical purposes, but not return them since user should
+ * never see invisible tuples.
+ */
+ if (OidIsValid(node->tsmexaminetuple.fn_oid))
+ {
+ found = DatumGetBool(FunctionCall4(&node->tsmexaminetuple,
+ PointerGetDatum(node),
+ UInt32GetDatum(blockno),
+ PointerGetDatum(tuple),
+ BoolGetDatum(visible)));
+ /* Should not happen if sampling method is well written. */
+ if (found && !visible)
+ elog(ERROR, "Sampling method wanted to return invisible tuple");
+ }
+ else
+ found = visible;
+
+ /* Found visible tuple, return it. */
+ if (found)
+ {
+ if (!pagemode)
+ LockBuffer(scan->rs_cbuf, BUFFER_LOCK_UNLOCK);
+ break;
+ }
+ else
+ {
+ /* Try next tuple from same page. */
+ continue;
+ }
+ }
+
+
+ if (!pagemode)
+ LockBuffer(scan->rs_cbuf, BUFFER_LOCK_UNLOCK);
+
+ blockno = DatumGetInt32(FunctionCall1(&node->tsmnextblock,
+ PointerGetDatum(node)));
+
+ /*
+ * Report our new scan position for synchronization purposes. We
+ * don't do that when moving backwards, however. That would just
+ * mess up any other forward-moving scanners.
+ *
+ * Note: we do this before checking for end of scan so that the
+ * final state of the position hint is back at the start of the
+ * rel. That's not strictly necessary, but otherwise when you run
+ * the same query multiple times the starting position would shift
+ * a little bit backwards on every invocation, which is confusing.
+ * We don't guarantee any specific ordering in general, though.
+ */
+ if (scan->rs_syncscan)
+ ss_report_location(scan->rs_rd, BlockNumberIsValid(blockno) ?
+ blockno : scan->rs_startblock);
+
+ /*
+ * Reached end of scan.
+ */
+ if (!BlockNumberIsValid(blockno))
+ {
+ if (BufferIsValid(scan->rs_cbuf))
+ ReleaseBuffer(scan->rs_cbuf);
+ scan->rs_cbuf = InvalidBuffer;
+ scan->rs_cblock = InvalidBlockNumber;
+ tuple->t_data = NULL;
+ scan->rs_inited = false;
+ return NULL;
+ }
+
+ heapgetpage(scan, blockno);
+
+ if (!pagemode)
+ LockBuffer(scan->rs_cbuf, BUFFER_LOCK_SHARE);
+
+ page = (Page) BufferGetPage(scan->rs_cbuf);
+ maxoffset = PageGetMaxOffsetNumber(page);
+ }
+
+ pgstat_count_heap_getnext(scan->rs_rd);
+
+ return &(scan->rs_ctup);
+}
+
+/*
+ * SampleRecheck -- access method routine to recheck a tuple in EvalPlanQual
+ */
+static bool
+SampleRecheck(SampleScanState *node, TupleTableSlot *slot)
+{
+ /* No need to recheck for SampleScan */
+ return true;
+}
+
+/* ----------------------------------------------------------------
+ * ExecSampleScan(node)
+ *
+ * Scans the relation using the sampling method and returns
+ * the next qualifying tuple.
+ * We call the ExecScan() routine and pass it the appropriate
+ * access method functions.
+ * ----------------------------------------------------------------
+ */
+TupleTableSlot *
+ExecSampleScan(SampleScanState *node)
+{
+ return ExecScan((ScanState *) node,
+ (ExecScanAccessMtd) SampleNext,
+ (ExecScanRecheckMtd) SampleRecheck);
+}
+
+/* ----------------------------------------------------------------
+ * InitScanRelation
+ *
+ * Set up to access the scan relation.
+ * ----------------------------------------------------------------
+ */
+static void
+InitScanRelation(SampleScanState *node, EState *estate, int eflags,
+ TableSampleClause *tablesample)
+{
+ Relation currentRelation;
+
+ /*
+ * get the relation object id from the relid'th entry in the range table,
+ * open that relation and acquire appropriate lock on it.
+ */
+ currentRelation = ExecOpenScanRelation(estate,
+ ((SampleScan *) node->ss.ps.plan)->scanrelid,
+ eflags);
+
+ node->ss.ss_currentRelation = currentRelation;
+
+ /*
+ * Even though we aren't going to do a conventional seqscan, it is useful
+ * to create a HeapScanDesc --- many of the fields in it are usable.
+ */
+ node->ss.ss_currentScanDesc =
+ heap_beginscan_ss(currentRelation, estate->es_snapshot, 0, NULL,
+ tablesample->tsmseqscan, tablesample->tsmpagemode);
+
+ /* and report the scan tuple slot's rowtype */
+ ExecAssignScanType(&node->ss, RelationGetDescr(currentRelation));
+}
+
+
+/* ----------------------------------------------------------------
+ * ExecInitSampleScan
+ * ----------------------------------------------------------------
+ */
+SampleScanState *
+ExecInitSampleScan(SampleScan *node, EState *estate, int eflags)
+{
+ SampleScanState *scanstate;
+ RangeTblEntry *rte = rt_fetch(node->scanrelid,
+ estate->es_range_table);
+
+ Assert(outerPlan(node) == NULL);
+ Assert(innerPlan(node) == NULL);
+ Assert(rte->tablesample != NULL);
+
+ /*
+ * create state structure
+ */
+ scanstate = makeNode(SampleScanState);
+ scanstate->ss.ps.plan = (Plan *) node;
+ scanstate->ss.ps.state = estate;
+
+ /*
+ * Miscellaneous initialization
+ *
+ * create expression context for node
+ */
+ ExecAssignExprContext(estate, &scanstate->ss.ps);
+
+ /*
+ * initialize child expressions
+ */
+ scanstate->ss.ps.targetlist = (List *)
+ ExecInitExpr((Expr *) node->plan.targetlist,
+ (PlanState *) scanstate);
+ scanstate->ss.ps.qual = (List *)
+ ExecInitExpr((Expr *) node->plan.qual,
+ (PlanState *) scanstate);
+
+ /*
+ * tuple table initialization
+ */
+ ExecInitResultTupleSlot(estate, &scanstate->ss.ps);
+ ExecInitScanTupleSlot(estate, &scanstate->ss);
+
+ /*
+ * initialize scan relation
+ */
+ InitScanRelation(scanstate, estate, eflags, rte->tablesample);
+
+ scanstate->ss.ps.ps_TupFromTlist = false;
+
+ /*
+ * Initialize result tuple type and projection info.
+ */
+ ExecAssignResultTypeFromTL(&scanstate->ss.ps);
+ ExecAssignScanProjectionInfo(&scanstate->ss);
+
+ InitSamplingMethod(scanstate, rte->tablesample);
+
+ return scanstate;
+}
+
+/* ----------------------------------------------------------------
+ * ExecEndSampleScan
+ *
+ * frees any storage allocated through C routines.
+ * ----------------------------------------------------------------
+ */
+void
+ExecEndSampleScan(SampleScanState *node)
+{
+ /*
+ * Tell sampling function that we finished thes can.
+ */
+ FunctionCall1(&node->tsmend, PointerGetDatum(node));
+
+ /*
+ * Free the exprcontext
+ */
+ ExecFreeExprContext(&node->ss.ps);
+
+ /*
+ * clean out the tuple table
+ */
+ ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
+
+ /*
+ * close heap scan
+ */
+ heap_endscan(node->ss.ss_currentScanDesc);
+
+ /*
+ * close the heap relation.
+ */
+ ExecCloseScanRelation(node->ss.ss_currentRelation);
+}
+
+/* ----------------------------------------------------------------
+ * Join Support
+ * ----------------------------------------------------------------
+ */
+
+/* ----------------------------------------------------------------
+ * ExecReScanSampleScan
+ *
+ * Rescans the relation.
+ *
+ * ----------------------------------------------------------------
+ */
+void
+ExecReScanSampleScan(SampleScanState *node)
+{
+ heap_rescan(node->ss.ss_currentScanDesc, NULL);
+
+ /*
+ * Tell sampling function to reset its state for rescan.
+ */
+ FunctionCall1(&node->tsmreset, PointerGetDatum(node));
+
+ ExecScanReScan(&node->ss);
+}
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 029761e..1a4c85b 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -630,6 +630,22 @@ _copyCustomScan(const CustomScan *from)
}
/*
+ * _copySampleScan
+ */
+static SampleScan *
+_copySampleScan(const SampleScan *from)
+{
+ SampleScan *newnode = makeNode(SampleScan);
+
+ /*
+ * copy node superclass fields
+ */
+ CopyScanFields((const Scan *) from, (Scan *) newnode);
+
+ return newnode;
+}
+
+/*
* CopyJoinFields
*
* This function copies the fields of the Join node. It is used by
@@ -2015,6 +2031,7 @@ _copyRangeTblEntry(const RangeTblEntry *from)
COPY_SCALAR_FIELD(rtekind);
COPY_SCALAR_FIELD(relid);
COPY_SCALAR_FIELD(relkind);
+ COPY_NODE_FIELD(tablesample);
COPY_NODE_FIELD(subquery);
COPY_SCALAR_FIELD(security_barrier);
COPY_SCALAR_FIELD(jointype);
@@ -2147,6 +2164,40 @@ _copyCommonTableExpr(const CommonTableExpr *from)
return newnode;
}
+static RangeTableSample *
+_copyRangeTableSample(const RangeTableSample *from)
+{
+ RangeTableSample *newnode = makeNode(RangeTableSample);
+
+ COPY_NODE_FIELD(relation);
+ COPY_STRING_FIELD(method);
+ COPY_NODE_FIELD(repeatable);
+ COPY_NODE_FIELD(args);
+
+ return newnode;
+}
+
+static TableSampleClause *
+_copyTableSampleClause(const TableSampleClause *from)
+{
+ TableSampleClause *newnode = makeNode(TableSampleClause);
+
+ COPY_SCALAR_FIELD(tsmid);
+ COPY_SCALAR_FIELD(tsmseqscan);
+ COPY_SCALAR_FIELD(tsmpagemode);
+ COPY_SCALAR_FIELD(tsminit);
+ COPY_SCALAR_FIELD(tsmnextblock);
+ COPY_SCALAR_FIELD(tsmnexttuple);
+ COPY_SCALAR_FIELD(tsmexaminetuple);
+ COPY_SCALAR_FIELD(tsmend);
+ COPY_SCALAR_FIELD(tsmreset);
+ COPY_SCALAR_FIELD(tsmcost);
+ COPY_NODE_FIELD(repeatable);
+ COPY_NODE_FIELD(args);
+
+ return newnode;
+}
+
static A_Expr *
_copyAExpr(const A_Expr *from)
{
@@ -4084,6 +4135,9 @@ copyObject(const void *from)
case T_CustomScan:
retval = _copyCustomScan(from);
break;
+ case T_SampleScan:
+ retval = _copySampleScan(from);
+ break;
case T_Join:
retval = _copyJoin(from);
break;
@@ -4732,6 +4786,12 @@ copyObject(const void *from)
case T_CommonTableExpr:
retval = _copyCommonTableExpr(from);
break;
+ case T_RangeTableSample:
+ retval = _copyRangeTableSample(from);
+ break;
+ case T_TableSampleClause:
+ retval = _copyTableSampleClause(from);
+ break;
case T_FuncWithArgs:
retval = _copyFuncWithArgs(from);
break;
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 190e50a..27626b5 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -2318,6 +2318,7 @@ _equalRangeTblEntry(const RangeTblEntry *a, const RangeTblEntry *b)
COMPARE_SCALAR_FIELD(rtekind);
COMPARE_SCALAR_FIELD(relid);
COMPARE_SCALAR_FIELD(relkind);
+ COMPARE_NODE_FIELD(tablesample);
COMPARE_NODE_FIELD(subquery);
COMPARE_SCALAR_FIELD(security_barrier);
COMPARE_SCALAR_FIELD(jointype);
@@ -2437,6 +2438,36 @@ _equalCommonTableExpr(const CommonTableExpr *a, const CommonTableExpr *b)
}
static bool
+_equalRangeTableSample(const RangeTableSample *a, const RangeTableSample *b)
+{
+ COMPARE_NODE_FIELD(relation);
+ COMPARE_STRING_FIELD(method);
+ COMPARE_NODE_FIELD(repeatable);
+ COMPARE_NODE_FIELD(args);
+
+ return true;
+}
+
+static bool
+_equalTableSampleClause(const TableSampleClause *a, const TableSampleClause *b)
+{
+ COMPARE_SCALAR_FIELD(tsmid);
+ COMPARE_SCALAR_FIELD(tsmseqscan);
+ COMPARE_SCALAR_FIELD(tsmpagemode);
+ COMPARE_SCALAR_FIELD(tsminit);
+ COMPARE_SCALAR_FIELD(tsmnextblock);
+ COMPARE_SCALAR_FIELD(tsmnexttuple);
+ COMPARE_SCALAR_FIELD(tsmexaminetuple);
+ COMPARE_SCALAR_FIELD(tsmend);
+ COMPARE_SCALAR_FIELD(tsmreset);
+ COMPARE_SCALAR_FIELD(tsmcost);
+ COMPARE_NODE_FIELD(repeatable);
+ COMPARE_NODE_FIELD(args);
+
+ return true;
+}
+
+static bool
_equalXmlSerialize(const XmlSerialize *a, const XmlSerialize *b)
{
COMPARE_SCALAR_FIELD(xmloption);
@@ -3155,6 +3186,12 @@ equal(const void *a, const void *b)
case T_CommonTableExpr:
retval = _equalCommonTableExpr(a, b);
break;
+ case T_RangeTableSample:
+ retval = _equalRangeTableSample(a, b);
+ break;
+ case T_TableSampleClause:
+ retval = _equalTableSampleClause(a, b);
+ break;
case T_FuncWithArgs:
retval = _equalFuncWithArgs(a, b);
break;
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index d6f1f5b..7742189 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -3219,6 +3219,18 @@ raw_expression_tree_walker(Node *node,
return walker(((WithClause *) node)->ctes, context);
case T_CommonTableExpr:
return walker(((CommonTableExpr *) node)->ctequery, context);
+ case T_RangeTableSample:
+ {
+ RangeTableSample *rts = (RangeTableSample *) node;
+
+ if (walker(rts->relation, context))
+ return true;
+ if (walker(rts->repeatable, context))
+ return true;
+ if (walker(rts->args, context))
+ return true;
+ }
+ break;
default:
elog(ERROR, "unrecognized node type: %d",
(int) nodeTag(node));
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 385b289..e26dbf0 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -580,6 +580,14 @@ _outCustomScan(StringInfo str, const CustomScan *node)
}
static void
+_outSampleScan(StringInfo str, const SampleScan *node)
+{
+ WRITE_NODE_TYPE("SAMPLESCAN");
+
+ _outScanInfo(str, (const Scan *) node);
+}
+
+static void
_outJoin(StringInfo str, const Join *node)
{
WRITE_NODE_TYPE("JOIN");
@@ -2404,6 +2412,36 @@ _outCommonTableExpr(StringInfo str, const CommonTableExpr *node)
}
static void
+_outRangeTableSample(StringInfo str, const RangeTableSample *node)
+{
+ WRITE_NODE_TYPE("RANGETABLESAMPLE");
+
+ WRITE_NODE_FIELD(relation);
+ WRITE_STRING_FIELD(method);
+ WRITE_NODE_FIELD(repeatable);
+ WRITE_NODE_FIELD(args);
+}
+
+static void
+_outTableSampleClause(StringInfo str, const TableSampleClause *node)
+{
+ WRITE_NODE_TYPE("TABLESAMPLECLAUSE");
+
+ WRITE_OID_FIELD(tsmid);
+ WRITE_BOOL_FIELD(tsmseqscan);
+ WRITE_BOOL_FIELD(tsmpagemode);
+ WRITE_OID_FIELD(tsminit);
+ WRITE_OID_FIELD(tsmnextblock);
+ WRITE_OID_FIELD(tsmnexttuple);
+ WRITE_OID_FIELD(tsmexaminetuple);
+ WRITE_OID_FIELD(tsmend);
+ WRITE_OID_FIELD(tsmreset);
+ WRITE_OID_FIELD(tsmcost);
+ WRITE_NODE_FIELD(repeatable);
+ WRITE_NODE_FIELD(args);
+}
+
+static void
_outSetOperationStmt(StringInfo str, const SetOperationStmt *node)
{
WRITE_NODE_TYPE("SETOPERATIONSTMT");
@@ -2433,6 +2471,7 @@ _outRangeTblEntry(StringInfo str, const RangeTblEntry *node)
case RTE_RELATION:
WRITE_OID_FIELD(relid);
WRITE_CHAR_FIELD(relkind);
+ WRITE_NODE_FIELD(tablesample);
break;
case RTE_SUBQUERY:
WRITE_NODE_FIELD(subquery);
@@ -2931,6 +2970,9 @@ _outNode(StringInfo str, const void *obj)
case T_CustomScan:
_outCustomScan(str, obj);
break;
+ case T_SampleScan:
+ _outSampleScan(str, obj);
+ break;
case T_Join:
_outJoin(str, obj);
break;
@@ -3272,6 +3314,12 @@ _outNode(StringInfo str, const void *obj)
case T_CommonTableExpr:
_outCommonTableExpr(str, obj);
break;
+ case T_RangeTableSample:
+ _outRangeTableSample(str, obj);
+ break;
+ case T_TableSampleClause:
+ _outTableSampleClause(str, obj);
+ break;
case T_SetOperationStmt:
_outSetOperationStmt(str, obj);
break;
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 563209c..05ed9a8 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -350,6 +350,46 @@ _readCommonTableExpr(void)
}
/*
+ * _readRangeTableSample
+ */
+static RangeTableSample *
+_readRangeTableSample(void)
+{
+ READ_LOCALS(RangeTableSample);
+
+ READ_NODE_FIELD(relation);
+ READ_STRING_FIELD(method);
+ READ_NODE_FIELD(repeatable);
+ READ_NODE_FIELD(args);
+
+ READ_DONE();
+}
+
+/*
+ * _readTableSampleClause
+ */
+static TableSampleClause *
+_readTableSampleClause(void)
+{
+ READ_LOCALS(TableSampleClause);
+
+ READ_OID_FIELD(tsmid);
+ READ_BOOL_FIELD(tsmseqscan);
+ READ_BOOL_FIELD(tsmpagemode);
+ READ_OID_FIELD(tsminit);
+ READ_OID_FIELD(tsmnextblock);
+ READ_OID_FIELD(tsmnexttuple);
+ READ_OID_FIELD(tsmexaminetuple);
+ READ_OID_FIELD(tsmend);
+ READ_OID_FIELD(tsmreset);
+ READ_OID_FIELD(tsmcost);
+ READ_NODE_FIELD(repeatable);
+ READ_NODE_FIELD(args);
+
+ READ_DONE();
+}
+
+/*
* _readSetOperationStmt
*/
static SetOperationStmt *
@@ -1218,6 +1258,7 @@ _readRangeTblEntry(void)
case RTE_RELATION:
READ_OID_FIELD(relid);
READ_CHAR_FIELD(relkind);
+ READ_NODE_FIELD(tablesample);
break;
case RTE_SUBQUERY:
READ_NODE_FIELD(subquery);
@@ -1313,6 +1354,10 @@ parseNodeString(void)
return_value = _readRowMarkClause();
else if (MATCH("COMMONTABLEEXPR", 15))
return_value = _readCommonTableExpr();
+ else if (MATCH("RANGETABLESAMPLE", 16))
+ return_value = _readRangeTableSample();
+ else if (MATCH("TABLESAMPLECLAUSE", 17))
+ return_value = _readTableSampleClause();
else if (MATCH("SETOPERATIONSTMT", 16))
return_value = _readSetOperationStmt();
else if (MATCH("ALIAS", 5))
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 58d78e6..5f12477 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -71,6 +71,10 @@ static void set_plain_rel_size(PlannerInfo *root, RelOptInfo *rel,
RangeTblEntry *rte);
static void set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
RangeTblEntry *rte);
+static void set_tablesample_rel_size(PlannerInfo *root, RelOptInfo *rel,
+ RangeTblEntry *rte);
+static void set_tablesample_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
+ RangeTblEntry *rte);
static void set_foreign_size(PlannerInfo *root, RelOptInfo *rel,
RangeTblEntry *rte);
static void set_foreign_pathlist(PlannerInfo *root, RelOptInfo *rel,
@@ -265,6 +269,11 @@ set_rel_size(PlannerInfo *root, RelOptInfo *rel,
/* Foreign table */
set_foreign_size(root, rel, rte);
}
+ else if (rte->tablesample != NULL)
+ {
+ /* Sampled relation */
+ set_tablesample_rel_size(root, rel, rte);
+ }
else
{
/* Plain relation */
@@ -332,6 +341,11 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
/* Foreign table */
set_foreign_pathlist(root, rel, rte);
}
+ else if (rte->tablesample != NULL)
+ {
+ /* Build sample scan on relation */
+ set_tablesample_rel_pathlist(root, rel, rte);
+ }
else
{
/* Plain relation */
@@ -418,6 +432,41 @@ set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
}
/*
+ * set_tablesample_rel_size
+ * Set size estimates for a sampled relation.
+ */
+static void
+set_tablesample_rel_size(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
+{
+ /* Mark rel with estimated output rows, width, etc */
+ set_baserel_size_estimates(root, rel);
+}
+
+/*
+ * set_tablesample_rel_pathlist
+ * Build access paths for a sampled relation
+ *
+ * There is only one possible path - sampling scan
+ */
+static void
+set_tablesample_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
+{
+ Relids required_outer;
+ Path *path;
+
+ /*
+ * We don't support pushing join clauses into the quals of a seqscan, but
+ * it could still have required parameterization due to LATERAL refs in
+ * its tlist.
+ */
+ required_outer = rel->lateral_relids;
+
+ /* We only do sample scan if it was requested */
+ path = create_samplescan_path(root, rel, required_outer);
+ rel->pathlist = list_make1(path);
+}
+
+/*
* set_foreign_size
* Set size estimates for a foreign table RTE
*/
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 1a0d358..51583a1 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -90,6 +90,7 @@
#include "utils/lsyscache.h"
#include "utils/selfuncs.h"
#include "utils/spccache.h"
+#include "utils/tablesample.h"
#include "utils/tuplesort.h"
@@ -220,6 +221,73 @@ cost_seqscan(Path *path, PlannerInfo *root,
}
/*
+ * cost_samplescan
+ * Determines and returns the cost of scanning a relation using sampling.
+ *
+ * From planner/optimizer perspective, we dont't care all that much about cost
+ * itself since there is always only one scan path to consider when sampling
+ * scan is present, but number of rows estimation is still important.
+ *
+ * 'baserel' is the relation to be scanned
+ * 'param_info' is the ParamPathInfo if this is a parameterized path, else NULL
+ */
+void
+cost_samplescan(Path *path, PlannerInfo *root, RelOptInfo *baserel)
+{
+ Cost startup_cost = 0;
+ Cost run_cost = 0;
+ double spc_seq_page_cost,
+ spc_random_page_cost,
+ spc_page_cost;
+ QualCost qpqual_cost;
+ Cost cpu_per_tuple;
+ BlockNumber pages;
+ double tuples;
+ RangeTblEntry *rte = planner_rt_fetch(baserel->relid, root);
+ TableSampleClause *tablesample = rte->tablesample;
+
+ /* Should only be applied to base relations */
+ Assert(baserel->relid > 0);
+ Assert(baserel->rtekind == RTE_RELATION);
+
+ /* Mark the path with the correct row estimate */
+ if (path->param_info)
+ path->rows = path->param_info->ppi_rows;
+ else
+ path->rows = baserel->rows;
+
+ /* Call the sampling method's costing function. */
+ OidFunctionCall6(tablesample->tsmcost, PointerGetDatum(root),
+ PointerGetDatum(path), PointerGetDatum(baserel),
+ PointerGetDatum(tablesample->args),
+ PointerGetDatum(&pages), PointerGetDatum(&tuples));
+
+ /* fetch estimated page cost for tablespace containing table */
+ get_tablespace_page_costs(baserel->reltablespace,
+ &spc_random_page_cost,
+ &spc_seq_page_cost);
+
+
+ spc_page_cost = tablesample->tsmseqscan ? spc_seq_page_cost :
+ spc_random_page_cost;
+
+ /*
+ * disk costs
+ */
+ run_cost += spc_page_cost * pages;
+
+ /* CPU costs */
+ get_restriction_qual_cost(root, baserel, path->param_info, &qpqual_cost);
+
+ startup_cost += qpqual_cost.startup;
+ cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple;
+ run_cost += cpu_per_tuple * tuples;
+
+ path->startup_cost = startup_cost;
+ path->total_cost = startup_cost + run_cost;
+}
+
+/*
* cost_index
* Determines and returns the cost of scanning a relation using an index.
*
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index cb69c03..3fc84e2 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -58,6 +58,8 @@ static Material *create_material_plan(PlannerInfo *root, MaterialPath *best_path
static Plan *create_unique_plan(PlannerInfo *root, UniquePath *best_path);
static SeqScan *create_seqscan_plan(PlannerInfo *root, Path *best_path,
List *tlist, List *scan_clauses);
+static SampleScan *create_samplescan_plan(PlannerInfo *root, Path *best_path,
+ List *tlist, List *scan_clauses);
static Scan *create_indexscan_plan(PlannerInfo *root, IndexPath *best_path,
List *tlist, List *scan_clauses, bool indexonly);
static BitmapHeapScan *create_bitmap_scan_plan(PlannerInfo *root,
@@ -100,6 +102,7 @@ static List *order_qual_clauses(PlannerInfo *root, List *clauses);
static void copy_path_costsize(Plan *dest, Path *src);
static void copy_plan_costsize(Plan *dest, Plan *src);
static SeqScan *make_seqscan(List *qptlist, List *qpqual, Index scanrelid);
+static SampleScan *make_samplescan(List *qptlist, List *qpqual, Index scanrelid);
static IndexScan *make_indexscan(List *qptlist, List *qpqual, Index scanrelid,
Oid indexid, List *indexqual, List *indexqualorig,
List *indexorderby, List *indexorderbyorig,
@@ -228,6 +231,7 @@ create_plan_recurse(PlannerInfo *root, Path *best_path)
switch (best_path->pathtype)
{
case T_SeqScan:
+ case T_SampleScan:
case T_IndexScan:
case T_IndexOnlyScan:
case T_BitmapHeapScan:
@@ -343,6 +347,13 @@ create_scan_plan(PlannerInfo *root, Path *best_path)
scan_clauses);
break;
+ case T_SampleScan:
+ plan = (Plan *) create_samplescan_plan(root,
+ best_path,
+ tlist,
+ scan_clauses);
+ break;
+
case T_IndexScan:
plan = (Plan *) create_indexscan_plan(root,
(IndexPath *) best_path,
@@ -546,6 +557,7 @@ disuse_physical_tlist(PlannerInfo *root, Plan *plan, Path *path)
switch (path->pathtype)
{
case T_SeqScan:
+ case T_SampleScan:
case T_IndexScan:
case T_IndexOnlyScan:
case T_BitmapHeapScan:
@@ -1133,6 +1145,45 @@ create_seqscan_plan(PlannerInfo *root, Path *best_path,
}
/*
+ * create_samplescan_plan
+ * Returns a samplecan plan for the base relation scanned by 'best_path'
+ * with restriction clauses 'scan_clauses' and targetlist 'tlist'.
+ */
+static SampleScan *
+create_samplescan_plan(PlannerInfo *root, Path *best_path,
+ List *tlist, List *scan_clauses)
+{
+ SampleScan *scan_plan;
+ Index scan_relid = best_path->parent->relid;
+
+ /* it should be a base rel with tablesample clause... */
+ Assert(scan_relid > 0);
+ Assert(best_path->parent->rtekind == RTE_RELATION);
+ Assert(best_path->pathtype == T_SampleScan);
+
+ /* Sort clauses into best execution order */
+ scan_clauses = order_qual_clauses(root, scan_clauses);
+
+ /* Reduce RestrictInfo list to bare expressions; ignore pseudoconstants */
+ scan_clauses = extract_actual_clauses(scan_clauses, false);
+
+ /* Replace any outer-relation variables with nestloop params */
+ if (best_path->param_info)
+ {
+ scan_clauses = (List *)
+ replace_nestloop_params(root, (Node *) scan_clauses);
+ }
+
+ scan_plan = make_samplescan(tlist,
+ scan_clauses,
+ scan_relid);
+
+ copy_path_costsize(&scan_plan->plan, best_path);
+
+ return scan_plan;
+}
+
+/*
* create_indexscan_plan
* Returns an indexscan plan for the base relation scanned by 'best_path'
* with restriction clauses 'scan_clauses' and targetlist 'tlist'.
@@ -3321,6 +3372,24 @@ make_seqscan(List *qptlist,
return node;
}
+static SampleScan *
+make_samplescan(List *qptlist,
+ List *qpqual,
+ Index scanrelid)
+{
+ SampleScan *node = makeNode(SampleScan);
+ Plan *plan = &node->plan;
+
+ /* cost should be inserted by caller */
+ plan->targetlist = qptlist;
+ plan->qual = qpqual;
+ plan->lefttree = NULL;
+ plan->righttree = NULL;
+ node->scanrelid = scanrelid;
+
+ return node;
+}
+
static IndexScan *
make_indexscan(List *qptlist,
List *qpqual,
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index ec828cd..82771dc 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -445,6 +445,17 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
fix_scan_list(root, splan->plan.qual, rtoffset);
}
break;
+ case T_SampleScan:
+ {
+ SampleScan *splan = (SampleScan *) plan;
+
+ splan->scanrelid += rtoffset;
+ splan->plan.targetlist =
+ fix_scan_list(root, splan->plan.targetlist, rtoffset);
+ splan->plan.qual =
+ fix_scan_list(root, splan->plan.qual, rtoffset);
+ }
+ break;
case T_IndexScan:
{
IndexScan *splan = (IndexScan *) plan;
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index acfd0bc..9971b54 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -2167,6 +2167,7 @@ finalize_plan(PlannerInfo *root, Plan *plan, Bitmapset *valid_params,
break;
case T_SeqScan:
+ case T_SampleScan:
context.paramids = bms_add_members(context.paramids, scan_params);
break;
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index faca30b..ea7a47b 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -706,6 +706,26 @@ create_seqscan_path(PlannerInfo *root, RelOptInfo *rel, Relids required_outer)
}
/*
+ * create_samplescan_path
+ * Like seqscan but uses sampling function while scanning.
+ */
+Path *
+create_samplescan_path(PlannerInfo *root, RelOptInfo *rel, Relids required_outer)
+{
+ Path *pathnode = makeNode(Path);
+
+ pathnode->pathtype = T_SampleScan;
+ pathnode->parent = rel;
+ pathnode->param_info = get_baserel_parampathinfo(root, rel,
+ required_outer);
+ pathnode->pathkeys = NIL; /* samplescan has unordered result */
+
+ cost_samplescan(pathnode, root, rel);
+
+ return pathnode;
+}
+
+/*
* create_index_path
* Creates a path node for an index scan.
*
@@ -1778,6 +1798,8 @@ reparameterize_path(PlannerInfo *root, Path *path,
case T_SubqueryScan:
return create_subqueryscan_path(root, rel, path->pathkeys,
required_outer);
+ case T_SampleScan:
+ return (Path *) create_samplescan_path(root, rel, required_outer);
default:
break;
}
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 88ec83c..711cdd5 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -448,6 +448,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
%type <range> relation_expr
%type <range> relation_expr_opt_alias
%type <target> target_el single_set_clause set_target insert_column_item
+%type <node> relation_expr_tablesample tablesample_clause opt_repeatable_clause
%type <str> generic_option_name
%type <node> generic_option_arg
@@ -615,8 +616,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
STATEMENT STATISTICS STDIN STDOUT STORAGE STRICT_P STRIP_P SUBSTRING
SYMMETRIC SYSID SYSTEM_P
- TABLE TABLES TABLESPACE TEMP TEMPLATE TEMPORARY TEXT_P THEN TIME TIMESTAMP
- TO TRAILING TRANSACTION TREAT TRIGGER TRIM TRUE_P
+ TABLE TABLES TABLESAMPLE TABLESPACE TEMP TEMPLATE TEMPORARY TEXT_P THEN
+ TIME TIMESTAMP TO TRAILING TRANSACTION TREAT TRIGGER TRIM TRUE_P
TRUNCATE TRUSTED TYPE_P TYPES_P
UNBOUNDED UNCOMMITTED UNENCRYPTED UNION UNIQUE UNKNOWN UNLISTEN UNLOGGED
@@ -10197,6 +10198,11 @@ table_ref: relation_expr opt_alias_clause
$1->alias = $2;
$$ = (Node *) $1;
}
+ | relation_expr_tablesample
+ {
+ RangeTableSample *n = (RangeTableSample *) $1;
+ $$ = (Node *) n;
+ }
| func_table func_alias_clause
{
RangeFunction *n = (RangeFunction *) $1;
@@ -10522,6 +10528,32 @@ relation_expr_opt_alias: relation_expr %prec UMINUS
}
;
+
+relation_expr_tablesample: relation_expr opt_alias_clause tablesample_clause
+ {
+ RangeTableSample *n = (RangeTableSample *) $3;
+ n->relation = $1;
+ n->relation->alias = $2;
+ $$ = (Node *) n;
+ }
+ ;
+
+tablesample_clause:
+ TABLESAMPLE ColId '(' func_arg_list ')' opt_repeatable_clause
+ {
+ RangeTableSample *n = makeNode(RangeTableSample);
+ n->method = $2;
+ n->args = $4;
+ n->repeatable = $6;
+ $$ = (Node *) n;
+ }
+ ;
+
+opt_repeatable_clause:
+ REPEATABLE '(' a_expr ')' { $$ = (Node *) $3; }
+ | /*EMPTY*/ { $$ = NULL; }
+ ;
+
/*
* func_table represents a function invocation in a FROM list. It can be
* a plain function call, like "foo(...)", or a ROWS FROM expression with
@@ -13612,6 +13644,7 @@ type_func_name_keyword:
| OVERLAPS
| RIGHT
| SIMILAR
+ | TABLESAMPLE
| VERBOSE
;
diff --git a/src/backend/parser/parse_clause.c b/src/backend/parser/parse_clause.c
index 8d90b50..8318948 100644
--- a/src/backend/parser/parse_clause.c
+++ b/src/backend/parser/parse_clause.c
@@ -16,6 +16,7 @@
#include "postgres.h"
#include "access/heapam.h"
+#include "access/htup_details.h"
#include "catalog/heap.h"
#include "catalog/pg_type.h"
#include "commands/defrem.h"
@@ -29,6 +30,7 @@
#include "parser/parse_coerce.h"
#include "parser/parse_collate.h"
#include "parser/parse_expr.h"
+#include "parser/parse_func.h"
#include "parser/parse_oper.h"
#include "parser/parse_relation.h"
#include "parser/parse_target.h"
@@ -36,6 +38,7 @@
#include "utils/guc.h"
#include "utils/lsyscache.h"
#include "utils/rel.h"
+#include "utils/syscache.h"
/* Convenience macro for the most common makeNamespaceItem() case */
@@ -414,6 +417,29 @@ transformJoinOnClause(ParseState *pstate, JoinExpr *j, List *namespace)
return result;
}
+static RangeTblEntry *
+transformTableSampleEntry(ParseState *pstate, RangeTableSample *r)
+{
+ RangeTblEntry *rte;
+ TableSampleClause *tablesample = NULL;
+
+ /* We first need to build a range table entry */
+ rte = transformTableEntry(pstate, r->relation);
+
+ if (rte->relkind != RELKIND_RELATION &&
+ rte->relkind != RELKIND_MATVIEW)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("TABLESAMPLE clause can only be used on tables and materialized views"),
+ parser_errposition(pstate,
+ exprLocation((Node *) r))));
+
+ tablesample = ParseTableSample(pstate, r->method, r->repeatable, r->args);
+ rte->tablesample = tablesample;
+
+ return rte;
+}
+
/*
* transformTableEntry --- transform a RangeVar (simple relation reference)
*/
@@ -1122,6 +1148,27 @@ transformFromClauseItem(ParseState *pstate, Node *n,
return (Node *) j;
}
+ else if (IsA(n, RangeTableSample))
+ {
+ /* Tablesample reference */
+ RangeTableSample *rv = (RangeTableSample *) n;
+ RangeTblRef *rtr;
+ RangeTblEntry *rte = NULL;
+ int rtindex;
+
+
+ rte = transformTableSampleEntry(pstate, rv);
+
+ /* assume new rte is at end */
+ rtindex = list_length(pstate->p_rtable);
+ Assert(rte == rt_fetch(rtindex, pstate->p_rtable));
+ *top_rte = rte;
+ *top_rti = rtindex;
+ *namespace = list_make1(makeDefaultNSItem(rte));
+ rtr = makeNode(RangeTblRef);
+ rtr->rtindex = rtindex;
+ return (Node *) rtr;
+ }
else
elog(ERROR, "unrecognized node type: %d", (int) nodeTag(n));
return NULL; /* can't get here, keep compiler quiet */
diff --git a/src/backend/parser/parse_func.c b/src/backend/parser/parse_func.c
index 1385776..0610873 100644
--- a/src/backend/parser/parse_func.c
+++ b/src/backend/parser/parse_func.c
@@ -18,6 +18,7 @@
#include "catalog/pg_aggregate.h"
#include "catalog/pg_proc.h"
#include "catalog/pg_type.h"
+#include "catalog/pg_tablesample_method.h"
#include "funcapi.h"
#include "lib/stringinfo.h"
#include "nodes/makefuncs.h"
@@ -26,6 +27,7 @@
#include "parser/parse_clause.h"
#include "parser/parse_coerce.h"
#include "parser/parse_func.h"
+#include "parser/parse_expr.h"
#include "parser/parse_relation.h"
#include "parser/parse_target.h"
#include "parser/parse_type.h"
@@ -767,6 +769,135 @@ ParseFuncOrColumn(ParseState *pstate, List *funcname, List *fargs,
}
+/*
+ * ParseTableSample
+ *
+ * Parse TABLESAMPLE clause and process the arguments
+ */
+extern TableSampleClause *
+ParseTableSample(ParseState *pstate, char *samplemethod, Node *repeatable,
+ List *sampleargs)
+{
+ HeapTuple tuple;
+ Form_pg_tablesample_method tsm;
+ Form_pg_proc procform;
+ TableSampleClause *tablesample;
+ List *fargs;
+ ListCell *larg;
+ int nargs, initnargs;
+ Oid actual_arg_types[FUNC_MAX_ARGS];
+ Oid init_arg_types[FUNC_MAX_ARGS];
+
+ /* Load the tablesample method */
+ tuple = SearchSysCache1(TABLESAMPLEMETHODNAME, PointerGetDatum(samplemethod));
+ if (!HeapTupleIsValid(tuple))
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_OBJECT),
+ errmsg("tablesample method \"%s\" does not exist",
+ samplemethod)));
+
+ tablesample = makeNode(TableSampleClause);
+ tablesample->tsmid = HeapTupleGetOid(tuple);
+
+ tsm = (Form_pg_tablesample_method) GETSTRUCT(tuple);
+
+ tablesample->tsmseqscan = tsm->tsmseqscan;
+ tablesample->tsmpagemode = tsm->tsmpagemode;
+ tablesample->tsminit = tsm->tsminit;
+ tablesample->tsmnextblock = tsm->tsmnextblock;
+ tablesample->tsmnexttuple = tsm->tsmnexttuple;
+ tablesample->tsmexaminetuple = tsm->tsmexaminetuple;
+ tablesample->tsmend = tsm->tsmend;
+ tablesample->tsmreset = tsm->tsmreset;
+ tablesample->tsmcost = tsm->tsmcost;
+
+ ReleaseSysCache(tuple);
+
+ /* Validate the parameters against init function definition. */
+ tuple = SearchSysCache1(PROCOID,
+ ObjectIdGetDatum(tablesample->tsminit));
+
+ if (!HeapTupleIsValid(tuple)) /* should not happen */
+ elog(ERROR, "cache lookup failed for function %u",
+ tablesample->tsminit);
+
+ procform = (Form_pg_proc) GETSTRUCT(tuple);
+ initnargs = procform->pronargs;
+ Assert(initnargs >= 3);
+
+ /*
+ * First parameter is used to pass the SampleScanState, second is
+ * seed (REPEATABLE), skip the processing for them here, just assert
+ * that the types are correct.
+ */
+ Assert(procform->proargtypes.values[0] == INTERNALOID);
+ Assert(procform->proargtypes.values[1] == INT4OID);
+ initnargs -= 2;
+ memcpy(init_arg_types, procform->proargtypes.values + 2,
+ initnargs * sizeof(Oid));
+
+ /* Now we are done with the catalog */
+ ReleaseSysCache(tuple);
+
+ /* Process repeatable (seed) */
+ if (repeatable != NULL)
+ {
+ Node *arg = repeatable;
+
+ if (arg && IsA(arg, A_Const))
+ {
+ A_Const *con = (A_Const *) arg;
+
+ if (con->val.type == T_Null)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("REPEATABLE clause must be NOT NULL numeric value"),
+ parser_errposition(pstate, con->location)));
+
+ }
+
+ arg = transformExpr(pstate, arg, EXPR_KIND_FROM_FUNCTION);
+ arg = coerce_to_specific_type(pstate, arg, INT4OID, "REPEATABLE");
+ tablesample->repeatable = arg;
+ }
+ else
+ tablesample->repeatable = NULL;
+
+ /* Transform the rest of arguments ... */
+ fargs = NIL;
+ nargs = 0;
+ foreach(larg, sampleargs)
+ {
+ Node *arg = transformExpr(pstate, (Node *) lfirst(larg), EXPR_KIND_FROM_FUNCTION);
+ Oid argtype = exprType(arg);
+
+ fargs = lappend(fargs, arg);
+
+ actual_arg_types[nargs++] = argtype;
+ }
+
+ /*
+ * Check if parameters are correct.
+ *
+ * XXX: can we do better at hinting here?
+ */
+ if (initnargs != nargs ||
+ !can_coerce_type(initnargs, actual_arg_types, init_arg_types,
+ COERCION_IMPLICIT))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("wrong parameters for tablesample method \"%s\"",
+ samplemethod)));
+
+ /* perform the necessary typecasting of arguments */
+ make_fn_arguments(pstate, fargs, actual_arg_types, init_arg_types);
+
+ /* Pass the arguments down */
+ tablesample->args = fargs;
+
+ return tablesample;
+}
+
/* func_match_argtypes()
*
* Given a list of candidate functions (having the right name and number
diff --git a/src/backend/utils/Makefile b/src/backend/utils/Makefile
index 8374533..9daa2ae 100644
--- a/src/backend/utils/Makefile
+++ b/src/backend/utils/Makefile
@@ -9,7 +9,8 @@ top_builddir = ../../..
include $(top_builddir)/src/Makefile.global
OBJS = fmgrtab.o
-SUBDIRS = adt cache error fmgr hash init mb misc mmgr resowner sort time
+SUBDIRS = adt cache error fmgr hash init mb misc mmgr resowner sort time \
+ tablesample
# location of Catalog.pm
catalogdir = $(top_srcdir)/src/backend/catalog
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index 29b5b1b..a6bd34c 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -32,6 +32,7 @@
#include "catalog/pg_opclass.h"
#include "catalog/pg_operator.h"
#include "catalog/pg_proc.h"
+#include "catalog/pg_tablesample_method.h"
#include "catalog/pg_trigger.h"
#include "catalog/pg_type.h"
#include "commands/defrem.h"
@@ -344,6 +345,8 @@ static void make_ruledef(StringInfo buf, HeapTuple ruletup, TupleDesc rulettc,
int prettyFlags);
static void make_viewdef(StringInfo buf, HeapTuple ruletup, TupleDesc rulettc,
int prettyFlags, int wrapColumn);
+static void get_tablesample_def(TableSampleClause *tablesample,
+ deparse_context *context);
static void get_query_def(Query *query, StringInfo buf, List *parentnamespace,
TupleDesc resultDesc,
int prettyFlags, int wrapColumn, int startIndent);
@@ -4185,6 +4188,50 @@ make_viewdef(StringInfo buf, HeapTuple ruletup, TupleDesc rulettc,
heap_close(ev_relation, AccessShareLock);
}
+/* ----------
+ * get_tablesample_def - Convert TableSampleClause back to SQL
+ * ----------
+ */
+static void
+get_tablesample_def(TableSampleClause *tablesample, deparse_context *context)
+{
+ StringInfo buf = context->buf;
+ HeapTuple tuple;
+ Form_pg_tablesample_method tsm;
+ char *tsmname;
+ int nargs;
+ ListCell *l;
+
+ /* Load the tablesample method */
+ tuple = SearchSysCache1(TABLESAMPLEMETHODOID, ObjectIdGetDatum(tablesample->tsmid));
+ if (!HeapTupleIsValid(tuple))
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_OBJECT),
+ errmsg("cache lookup failed for tablesample method %u",
+ tablesample->tsmid)));
+
+ tsm = (Form_pg_tablesample_method) GETSTRUCT(tuple);
+ tsmname = NameStr(tsm->tsmname);
+ appendStringInfo(buf, " TABLESAMPLE %s (", quote_identifier(tsmname));
+
+ ReleaseSysCache(tuple);
+
+ nargs = 0;
+ foreach(l, tablesample->args)
+ {
+ if (nargs++ > 0)
+ appendStringInfoString(buf, ", ");
+ get_rule_expr((Node *) lfirst(l), context, true);
+ }
+ appendStringInfoChar(buf, ')');
+
+ if (tablesample->repeatable != NULL)
+ {
+ appendStringInfoString(buf, " REPEATABLE (");
+ get_rule_expr(tablesample->repeatable, context, true);
+ appendStringInfoChar(buf, ')');
+ }
+}
/* ----------
* get_query_def - Parse back one query parsetree
@@ -8453,6 +8500,9 @@ get_from_clause_item(Node *jtnode, Query *query, deparse_context *context)
only_marker(rte),
generate_relation_name(rte->relid,
context->namespaces));
+
+ if (rte->tablesample)
+ get_tablesample_def(rte->tablesample, context);
break;
case RTE_SUBQUERY:
/* Subquery RTE */
diff --git a/src/backend/utils/cache/syscache.c b/src/backend/utils/cache/syscache.c
index bd27168..3a8f01e 100644
--- a/src/backend/utils/cache/syscache.c
+++ b/src/backend/utils/cache/syscache.c
@@ -55,6 +55,7 @@
#include "catalog/pg_shdescription.h"
#include "catalog/pg_shseclabel.h"
#include "catalog/pg_statistic.h"
+#include "catalog/pg_tablesample_method.h"
#include "catalog/pg_tablespace.h"
#include "catalog/pg_ts_config.h"
#include "catalog/pg_ts_config_map.h"
@@ -642,6 +643,28 @@ static const struct cachedesc cacheinfo[] = {
},
128
},
+ {TableSampleMethodRelationId, /* TABLESAMPLEMETHODNAME */
+ TableSampleMethodNameIndexId,
+ 1,
+ {
+ Anum_pg_tablesample_method_tsmname,
+ 0,
+ 0,
+ 0,
+ },
+ 2
+ },
+ {TableSampleMethodRelationId, /* TABLESAMPLEMETHODOID */
+ TableSampleMethodOidIndexId,
+ 1,
+ {
+ ObjectIdAttributeNumber,
+ 0,
+ 0,
+ 0,
+ },
+ 2
+ },
{TableSpaceRelationId, /* TABLESPACEOID */
TablespaceOidIndexId,
1,
diff --git a/src/backend/utils/misc/sampling.c b/src/backend/utils/misc/sampling.c
index 1eeabaf..f213c46 100644
--- a/src/backend/utils/misc/sampling.c
+++ b/src/backend/utils/misc/sampling.c
@@ -46,6 +46,8 @@ BlockSampler_Init(BlockSampler bs, BlockNumber nblocks, int samplesize,
bs->n = samplesize;
bs->t = 0; /* blocks scanned so far */
bs->m = 0; /* blocks selected so far */
+
+ sampler_random_init_state(randseed, bs->randstate);
}
bool
@@ -92,7 +94,7 @@ BlockSampler_Next(BlockSampler bs)
* less than k, which means that we cannot fail to select enough blocks.
*----------
*/
- V = sampler_random_fract();
+ V = sampler_random_fract(bs->randstate);
p = 1.0 - (double) k / (double) K;
while (V < p)
{
@@ -126,8 +128,14 @@ BlockSampler_Next(BlockSampler bs)
void
reservoir_init_selection_state(ReservoirState rs, int n)
{
+ /*
+ * Reservoir sampling is not used anywhere where it would need to return
+ * repeatable results so we can initialize it randomly.
+ */
+ sampler_random_init_state(random(), rs->randstate);
+
/* Initial value of W (for use when Algorithm Z is first applied) */
- *rs = exp(-log(sampler_random_fract()) / n);
+ rs->W = exp(-log(sampler_random_fract(rs->randstate)) / n);
}
double
@@ -142,7 +150,7 @@ reservoir_get_next_S(ReservoirState rs, double t, int n)
double V,
quot;
- V = sampler_random_fract(); /* Generate V */
+ V = sampler_random_fract(rs->randstate); /* Generate V */
S = 0;
t += 1;
/* Note: "num" in Vitter's code is always equal to t - n */
@@ -158,7 +166,7 @@ reservoir_get_next_S(ReservoirState rs, double t, int n)
else
{
/* Now apply Algorithm Z */
- double W = *rs;
+ double W = rs->W;
double term = t - (double) n + 1;
for (;;)
@@ -174,7 +182,7 @@ reservoir_get_next_S(ReservoirState rs, double t, int n)
tmp;
/* Generate U and X */
- U = sampler_random_fract();
+ U = sampler_random_fract(rs->randstate);
X = t * (W - 1.0);
S = floor(X); /* S is tentatively set to floor(X) */
/* Test if U <= h(S)/cg(X) in the manner of (6.3) */
@@ -203,11 +211,11 @@ reservoir_get_next_S(ReservoirState rs, double t, int n)
y *= numer / denom;
denom -= 1;
}
- W = exp(-log(sampler_random_fract()) / n); /* Generate W in advance */
+ W = exp(-log(sampler_random_fract(rs->randstate)) / n); /* Generate W in advance */
if (exp(log(y) / n) <= (t + X) / t)
break;
}
- *rs = W;
+ rs->W = W;
}
return S;
}
@@ -217,10 +225,17 @@ reservoir_get_next_S(ReservoirState rs, double t, int n)
* Random number generator used by sampling
*----------
*/
+void
+sampler_random_init_state(long seed, SamplerRandomState randstate)
+{
+ randstate[0] = 0x330e;
+ randstate[1] = (unsigned short) seed;
+ randstate[2] = (unsigned short) (seed >> 16);
+}
/* Select a random value R uniformly distributed in (0 - 1) */
double
-sampler_random_fract()
+sampler_random_fract(SamplerRandomState randstate)
{
- return ((double) random() + 1) / ((double) MAX_RANDOM_VALUE + 2);
+ return pg_erand48(randstate);
}
diff --git a/src/backend/utils/tablesample/Makefile b/src/backend/utils/tablesample/Makefile
new file mode 100644
index 0000000..df92939
--- /dev/null
+++ b/src/backend/utils/tablesample/Makefile
@@ -0,0 +1,17 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+# Makefile for utils/tablesample
+#
+# IDENTIFICATION
+# src/backend/utils/tablesample/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/backend/utils/tablesample
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS = system.o bernoulli.o
+
+include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/utils/tablesample/bernoulli.c b/src/backend/utils/tablesample/bernoulli.c
new file mode 100644
index 0000000..36f4bcb
--- /dev/null
+++ b/src/backend/utils/tablesample/bernoulli.c
@@ -0,0 +1,224 @@
+/*-------------------------------------------------------------------------
+ *
+ * bernoulli.c
+ * interface routines for BERNOULLI tablesample method
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/utils/tablesample/bernoulli.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "fmgr.h"
+
+#include "access/relscan.h"
+#include "nodes/execnodes.h"
+#include "nodes/relation.h"
+#include "optimizer/clauses.h"
+#include "storage/bufmgr.h"
+#include "utils/sampling.h"
+#include "utils/tablesample.h"
+
+
+/* State */
+typedef struct
+{
+ uint32 seed; /* random seed */
+ BlockNumber startblock; /* starting block, we use ths for syncscan support */
+ BlockNumber nblocks; /* number of blocks */
+ BlockNumber blockno; /* current block */
+ float4 probability; /* probabilty that tuple will be returned (0.0-1.0) */
+ OffsetNumber lt; /* last tuple returned from current block */
+ SamplerRandomState randstate; /* random generator state */
+} BernoulliSamplerData;
+
+/*
+ * Initialize the state.
+ */
+Datum
+tsm_bernoulli_init(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ uint32 seed = PG_GETARG_UINT32(1);
+ float4 percent = PG_ARGISNULL(2) ? -1 : PG_GETARG_FLOAT4(2);
+ HeapScanDesc scan = scanstate->ss.ss_currentScanDesc;
+ BernoulliSamplerData *sampler;
+
+ if (percent < 0 || percent > 100)
+ ereport(ERROR,
+ (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+ errmsg("invalid sample size"),
+ errhint("Sample size must be numeric value between 0 and 100 (inclusive).")));
+
+ sampler = palloc0(sizeof(BernoulliSamplerData));
+
+ /* Remember initial values for reinit */
+ sampler->seed = seed;
+ sampler->startblock = scan->rs_startblock;
+ sampler->nblocks = scan->rs_nblocks;
+ sampler->blockno = InvalidBlockNumber;
+ sampler->probability = percent / 100;
+ sampler->lt = InvalidOffsetNumber;
+ sampler_random_init_state(sampler->seed, sampler->randstate);
+
+ scanstate->tsmdata = (void *) sampler;
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Get next block number to read or InvalidBlockNumber if we are at the
+ * end of the relation.
+ */
+Datum
+tsm_bernoulli_nextblock(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ BernoulliSamplerData *sampler =
+ (BernoulliSamplerData *) scanstate->tsmdata;
+
+ /*
+ * Bernoulli sampling scans all blocks on the table and supports
+ * syncscan so loop from startblock to startblock instead of
+ * from 0 to nblocks.
+ */
+ if (sampler->blockno == InvalidBlockNumber)
+ sampler->blockno = sampler->startblock;
+ else
+ {
+ sampler->blockno++;
+
+ if (sampler->blockno >= sampler->nblocks)
+ sampler->blockno = 0;
+
+ if (sampler->blockno == sampler->startblock)
+ PG_RETURN_UINT32(InvalidBlockNumber);
+ }
+
+ PG_RETURN_UINT32(sampler->blockno);
+}
+
+/*
+ * Get next tuple from current block.
+ *
+ * This method implements the main logic in bernoulli sampling.
+ * The algorithm simply generates new random number (in 0.0-1.0 range) and if
+ * it falls within user specified probability (in the same range) return the
+ * tuple offset.
+ *
+ * If we reach end of the block return InvalidOffsetNumber which tells
+ * SampleScan to go to next block.
+ */
+Datum
+tsm_bernoulli_nexttuple(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ OffsetNumber maxoffset = PG_GETARG_UINT16(2);
+ BernoulliSamplerData *sampler =
+ (BernoulliSamplerData *) scanstate->tsmdata;
+ OffsetNumber tupoffset = sampler->lt;
+ float4 probability = sampler->probability;
+
+ if (tupoffset == InvalidOffsetNumber)
+ tupoffset = FirstOffsetNumber;
+ else
+ tupoffset++;
+
+ /*
+ * Loop over tuple offsets until the random generator returns value that
+ * is within the probability of returning the tuple or until we reach
+ * end of the block.
+ *
+ * (This is our implementation of bernoulli trial)
+ */
+ while (sampler_random_fract(sampler->randstate) > probability)
+ {
+ tupoffset++;
+
+ if (tupoffset > maxoffset)
+ break;
+ }
+
+ if (tupoffset > maxoffset)
+ /* Tell SampleScan that we want next block. */
+ tupoffset = InvalidOffsetNumber;
+
+ sampler->lt = tupoffset;
+
+ PG_RETURN_UINT16(tupoffset);
+}
+
+/*
+ * Cleanup method.
+ */
+Datum
+tsm_bernoulli_end(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+
+ pfree(scanstate->tsmdata);
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Reset state (called by ReScan).
+ */
+Datum
+tsm_bernoulli_reset(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ BernoulliSamplerData *sampler =
+ (BernoulliSamplerData *) scanstate->tsmdata;
+
+ sampler->blockno = InvalidBlockNumber;
+ sampler->lt = InvalidOffsetNumber;
+ sampler_random_init_state(sampler->seed, sampler->randstate);
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Costing function.
+ */
+Datum
+tsm_bernoulli_cost(PG_FUNCTION_ARGS)
+{
+ PlannerInfo *root = (PlannerInfo *) PG_GETARG_POINTER(0);
+ Path *path = (Path *) PG_GETARG_POINTER(1);
+ RelOptInfo *baserel = (RelOptInfo *) PG_GETARG_POINTER(2);
+ List *args = (List *) PG_GETARG_POINTER(3);
+ BlockNumber *pages = (BlockNumber *) PG_GETARG_POINTER(4);
+ double *tuples = (double *) PG_GETARG_POINTER(5);
+ Node *pctnode;
+ float4 samplesize;
+
+ *pages = baserel->pages;
+
+ pctnode = linitial(args);
+ pctnode = estimate_expression_value(root, pctnode);
+
+ if (IsA(pctnode, RelabelType))
+ pctnode = (Node *) ((RelabelType *) pctnode)->arg;
+
+ if (IsA(pctnode, Const))
+ {
+ samplesize = DatumGetFloat4(((Const *) pctnode)->constvalue);
+ samplesize /= 100.0;
+ }
+ else
+ {
+ /* Default samplesize if the estimation didn't return Const. */
+ samplesize = 0.1f;
+ }
+
+ *tuples = path->rows * samplesize;
+ path->rows = *tuples;
+
+ PG_RETURN_VOID();
+}
diff --git a/src/backend/utils/tablesample/system.c b/src/backend/utils/tablesample/system.c
new file mode 100644
index 0000000..07d1f3a
--- /dev/null
+++ b/src/backend/utils/tablesample/system.c
@@ -0,0 +1,185 @@
+/*-------------------------------------------------------------------------
+ *
+ * system.c
+ * interface routines for system tablesample method
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/utils/tablesample/system.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "fmgr.h"
+
+#include "nodes/execnodes.h"
+#include "nodes/relation.h"
+#include "optimizer/clauses.h"
+#include "storage/bufmgr.h"
+#include "utils/sampling.h"
+#include "utils/tablesample.h"
+
+
+/*
+ * State
+ */
+typedef struct
+{
+ BlockSamplerData bs;
+ uint32 seed; /* random seed */
+ BlockNumber tblocks; /* total blocks in relation */
+ int samplesize; /* number of blocks to return */
+ OffsetNumber lt; /* last tuple returned from current block */
+} SystemSamplerData;
+
+
+/*
+ * Initializes the state.
+ */
+Datum
+tsm_system_init(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ uint32 seed = PG_GETARG_UINT32(1);
+ float4 percent = PG_ARGISNULL(2) ? -1 : PG_GETARG_FLOAT4(2);
+ SystemSamplerData *sampler;
+
+ if (percent < 0 || percent > 100)
+ ereport(ERROR,
+ (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+ errmsg("invalid sample size"),
+ errhint("Sample size must be numeric value between 0 and 100 (inclusive).")));
+
+ sampler = palloc0(sizeof(SystemSamplerData));
+
+ /* Remember initial values for reinit */
+ sampler->seed = seed;
+ sampler->tblocks =
+ RelationGetNumberOfBlocks(scanstate->ss.ss_currentRelation);
+ sampler->samplesize = 1 + (int) (sampler->tblocks * (percent / 100.0));
+ sampler->lt = InvalidOffsetNumber;
+
+ BlockSampler_Init(&sampler->bs, sampler->tblocks, sampler->samplesize,
+ sampler->seed);
+
+ scanstate->tsmdata = (void *) sampler;
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Get next block number or InvalidBlockNumber when we're done.
+ *
+ * Uses the same logic as VACUUM for picking the random blocks.
+ */
+Datum
+tsm_system_nextblock(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ SystemSamplerData *sampler = (SystemSamplerData *) scanstate->tsmdata;
+ BlockNumber blockno;
+
+ if (!BlockSampler_HasMore(&sampler->bs))
+ PG_RETURN_UINT32(InvalidBlockNumber);
+
+ blockno = BlockSampler_Next(&sampler->bs);
+
+ PG_RETURN_UINT32(blockno);
+}
+
+/*
+ * Get next tuple offset in current block or InvalidOffsetNumber if we are done
+ * with this block.
+ */
+Datum
+tsm_system_nexttuple(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ OffsetNumber maxoffset = PG_GETARG_UINT16(2);
+ SystemSamplerData *sampler = (SystemSamplerData *) scanstate->tsmdata;
+ OffsetNumber tupoffset = sampler->lt;
+
+ if (tupoffset == InvalidOffsetNumber)
+ tupoffset = FirstOffsetNumber;
+ else
+ tupoffset++;
+
+ if (tupoffset > maxoffset)
+ tupoffset = InvalidOffsetNumber;
+
+ sampler->lt = tupoffset;
+
+ PG_RETURN_UINT16(tupoffset);
+}
+
+/*
+ * Cleanup method.
+ */
+Datum
+tsm_system_end(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+
+ pfree(scanstate->tsmdata);
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Reset state (called by ReScan).
+ */
+Datum
+tsm_system_reset(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ SystemSamplerData *sampler = (SystemSamplerData *) scanstate->tsmdata;
+
+ sampler->lt = InvalidOffsetNumber;
+ BlockSampler_Init(&sampler->bs, sampler->tblocks, sampler->samplesize,
+ sampler->seed);
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Costing function.
+ */
+Datum
+tsm_system_cost(PG_FUNCTION_ARGS)
+{
+ PlannerInfo *root = (PlannerInfo *) PG_GETARG_POINTER(0);
+ Path *path = (Path *) PG_GETARG_POINTER(1);
+ RelOptInfo *baserel = (RelOptInfo *) PG_GETARG_POINTER(2);
+ List *args = (List *) PG_GETARG_POINTER(3);
+ BlockNumber *pages = (BlockNumber *) PG_GETARG_POINTER(4);
+ double *tuples = (double *) PG_GETARG_POINTER(5);
+ Node *pctnode;
+ float4 samplesize;
+
+ pctnode = linitial(args);
+ pctnode = estimate_expression_value(root, pctnode);
+
+ if (IsA(pctnode, RelabelType))
+ pctnode = (Node *) ((RelabelType *) pctnode)->arg;
+
+ if (IsA(pctnode, Const))
+ {
+ samplesize = DatumGetFloat4(((Const *) pctnode)->constvalue);
+ samplesize /= 100.0;
+ }
+ else
+ {
+ /* Default samplesize if the estimation didn't return Const. */
+ samplesize = 0.1f;
+ }
+
+ *pages = baserel->pages * samplesize;
+ *tuples = path->rows * samplesize;
+ path->rows = *tuples;
+
+ PG_RETURN_VOID();
+}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 888cce7..69cc702 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -113,8 +113,12 @@ extern HeapScanDesc heap_beginscan_strat(Relation relation, Snapshot snapshot,
bool allow_strat, bool allow_sync);
extern HeapScanDesc heap_beginscan_bm(Relation relation, Snapshot snapshot,
int nkeys, ScanKey key);
+extern HeapScanDesc heap_beginscan_ss(Relation relation, Snapshot snapshot,
+ int nkeys, ScanKey key,
+ bool allow_strat, bool allow_pagemode);
extern void heap_setscanlimits(HeapScanDesc scan, BlockNumber startBlk,
BlockNumber endBlk);
+extern void heapgetpage(HeapScanDesc scan, BlockNumber page);
extern void heap_rescan(HeapScanDesc scan, ScanKey key);
extern void heap_endscan(HeapScanDesc scan);
extern HeapTuple heap_getnext(HeapScanDesc scan, ScanDirection direction);
diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
index 9bb6362..e2b2b4f 100644
--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -29,6 +29,7 @@ typedef struct HeapScanDescData
int rs_nkeys; /* number of scan keys */
ScanKey rs_key; /* array of scan key descriptors */
bool rs_bitmapscan; /* true if this is really a bitmap scan */
+ bool rs_samplescan; /* true if this is really a sample scan */
bool rs_pageatatime; /* verify visibility page-at-a-time? */
bool rs_allow_strat; /* allow or disallow use of access strategy */
bool rs_allow_sync; /* allow or disallow use of syncscan */
diff --git a/src/include/catalog/indexing.h b/src/include/catalog/indexing.h
index a680229..e01bd0c 100644
--- a/src/include/catalog/indexing.h
+++ b/src/include/catalog/indexing.h
@@ -305,6 +305,11 @@ DECLARE_UNIQUE_INDEX(pg_policy_oid_index, 3257, on pg_policy using btree(oid oid
DECLARE_UNIQUE_INDEX(pg_policy_polrelid_polname_index, 3258, on pg_policy using btree(polrelid oid_ops, polname name_ops));
#define PolicyPolrelidPolnameIndexId 3258
+DECLARE_UNIQUE_INDEX(pg_tablesample_method_name_index, 3291, on pg_tablesample_method using btree(tsmname name_ops));
+#define TableSampleMethodNameIndexId 3291
+DECLARE_UNIQUE_INDEX(pg_tablesample_method_oid_index, 3292, on pg_tablesample_method using btree(oid oid_ops));
+#define TableSampleMethodOidIndexId 3292
+
/* last step of initialization script: build the indexes declared above */
BUILD_INDICES
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index d90ecc5..91aab0d 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -5182,6 +5182,31 @@ DESCR("rank of hypothetical row without gaps");
DATA(insert OID = 3993 ( dense_rank_final PGNSP PGUID 12 1 0 2276 0 f f f f f f i 2 0 20 "2281 2276" "{2281,2276}" "{i,v}" _null_ _null_ hypothetical_dense_rank_final _null_ _null_ _null_ ));
DESCR("aggregate final function");
+DATA(insert OID = 3295 ( tsm_system_init PGNSP PGUID 12 1 0 0 0 f f f f t f v 3 0 2278 "2281 23 700" _null_ _null_ _null_ _null_ tsm_system_init _null_ _null_ _null_ ));
+DESCR("tsm_system_init(internal)");
+DATA(insert OID = 3296 ( tsm_system_nextblock PGNSP PGUID 12 1 0 0 0 f f f f t f v 2 0 23 "2281 16" _null_ _null_ _null_ _null_ tsm_system_nextblock _null_ _null_ _null_ ));
+DESCR("tsm_system_nextblock(internal)");
+DATA(insert OID = 3297 ( tsm_system_nexttuple PGNSP PGUID 12 1 0 0 0 f f f f t f v 4 0 21 "2281 23 21 16" _null_ _null_ _null_ _null_ tsm_system_nexttuple _null_ _null_ _null_ ));
+DESCR("tsm_system_nexttuple(internal)");
+DATA(insert OID = 3298 ( tsm_system_end PGNSP PGUID 12 1 0 0 0 f f f f t f v 1 0 2278 "2281" _null_ _null_ _null_ _null_ tsm_system_end _null_ _null_ _null_ ));
+DESCR("tsm_system_end(internal)");
+DATA(insert OID = 3299 ( tsm_system_reset PGNSP PGUID 12 1 0 0 0 f f f f t f v 1 0 2278 "2281" _null_ _null_ _null_ _null_ tsm_system_reset _null_ _null_ _null_ ));
+DESCR("tsm_system_reset(internal)");
+DATA(insert OID = 3300 ( tsm_system_cost PGNSP PGUID 12 1 0 0 0 f f f f t f v 7 0 2278 "2281 2281 2281 2281 2281 2281 2281" _null_ _null_ _null_ _null_ tsm_system_cost _null_ _null_ _null_ ));
+DESCR("tsm_system_cost(internal)");
+
+DATA(insert OID = 3301 ( tsm_bernoulli_init PGNSP PGUID 12 1 0 0 0 f f f f t f v 3 0 2278 "2281 23 700" _null_ _null_ _null_ _null_ tsm_bernoulli_init _null_ _null_ _null_ ));
+DESCR("tsm_bernoulli_init(internal)");
+DATA(insert OID = 3302 ( tsm_bernoulli_nextblock PGNSP PGUID 12 1 0 0 0 f f f f t f v 2 0 23 "2281 16" _null_ _null_ _null_ _null_ tsm_bernoulli_nextblock _null_ _null_ _null_ ));
+DESCR("tsm_bernoulli_nextblock(internal)");
+DATA(insert OID = 3303 ( tsm_bernoulli_nexttuple PGNSP PGUID 12 1 0 0 0 f f f f t f v 4 0 21 "2281 23 21 16" _null_ _null_ _null_ _null_ tsm_bernoulli_nexttuple _null_ _null_ _null_ ));
+DESCR("tsm_bernoulli_nexttuple(internal)");
+DATA(insert OID = 3304 ( tsm_bernoulli_end PGNSP PGUID 12 1 0 0 0 f f f f t f v 1 0 2278 "2281" _null_ _null_ _null_ _null_ tsm_bernoulli_end _null_ _null_ _null_ ));
+DESCR("tsm_bernoulli_end(internal)");
+DATA(insert OID = 3306 ( tsm_bernoulli_reset PGNSP PGUID 12 1 0 0 0 f f f f t f v 1 0 2278 "2281" _null_ _null_ _null_ _null_ tsm_bernoulli_reset _null_ _null_ _null_ ));
+DESCR("tsm_bernoulli_reset(internal)");
+DATA(insert OID = 3307 ( tsm_bernoulli_cost PGNSP PGUID 12 1 0 0 0 f f f f t f v 7 0 2278 "2281 2281 2281 2281 2281 2281 2281" _null_ _null_ _null_ _null_ tsm_bernoulli_cost _null_ _null_ _null_ ));
+DESCR("tsm_bernoulli_cost(internal)");
/*
* Symbolic values for provolatile column: these indicate whether the result
diff --git a/src/include/catalog/pg_tablesample_method.h b/src/include/catalog/pg_tablesample_method.h
new file mode 100644
index 0000000..ec826e3
--- /dev/null
+++ b/src/include/catalog/pg_tablesample_method.h
@@ -0,0 +1,77 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_tablesample_method.h
+ * definition of the table scan methods.
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/catalog/pg_tablesample_method.h
+ *
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PG_TABLESAMPLE_METHOD_H
+#define PG_TABLESAMPLE_METHOD_H
+
+#include "catalog/genbki.h"
+
+/* ----------------
+ * pg_tablesample_method definition. cpp turns this into
+ * typedef struct FormData_pg_tablesample_method
+ * ----------------
+ */
+#define TableSampleMethodRelationId 3290
+
+CATALOG(pg_tablesample_method,3290)
+{
+ NameData tsmname; /* tablesample method name */
+ bool tsmseqscan; /* does this method scan whole table sequentially? */
+ bool tsmpagemode; /* does this method scan page at a time? */
+ regproc tsminit; /* init scan function */
+ regproc tsmnextblock; /* function returning next block to sample
+ or InvalidBlockOffset if finished */
+ regproc tsmnexttuple; /* function returning next tuple offset from current block
+ or InvalidOffsetNumber if end of the block was reacher */
+ regproc tsmexaminetuple; /* optional function which can examine tuple contents and
+ decide if tuple should be returned or not */
+ regproc tsmend; /* end scan function*/
+ regproc tsmreset; /* reset state - used by rescan */
+ regproc tsmcost; /* costing function */
+} FormData_pg_tablesample_method;
+
+/* ----------------
+ * Form_pg_tablesample_method corresponds to a pointer to a tuple with
+ * the format of pg_tablesample_method relation.
+ * ----------------
+ */
+typedef FormData_pg_tablesample_method *Form_pg_tablesample_method;
+
+/* ----------------
+ * compiler constants for pg_tablesample_method
+ * ----------------
+ */
+#define Natts_pg_tablesample_method 10
+#define Anum_pg_tablesample_method_tsmname 1
+#define Anum_pg_tablesample_method_tsmseqscan 2
+#define Anum_pg_tablesample_method_tsmpagemode 3
+#define Anum_pg_tablesample_method_tsminit 4
+#define Anum_pg_tablesample_method_tsmnextblock 5
+#define Anum_pg_tablesample_method_tsmnexttuple 6
+#define Anum_pg_tablesample_method_tsmexaminetuple 7
+#define Anum_pg_tablesample_method_tsmend 8
+#define Anum_pg_tablesample_method_tsmreset 9
+#define Anum_pg_tablesample_method_tsmcost 10
+
+/* ----------------
+ * initial contents of pg_tablesample_method
+ * ----------------
+ */
+
+DATA(insert OID = 3293 ( system false true tsm_system_init tsm_system_nextblock tsm_system_nexttuple - tsm_system_end tsm_system_reset tsm_system_cost ));
+DESCR("SYSTEM table sampling method");
+DATA(insert OID = 3294 ( bernoulli true false tsm_bernoulli_init tsm_bernoulli_nextblock tsm_bernoulli_nexttuple - tsm_bernoulli_end tsm_bernoulli_reset tsm_bernoulli_cost ));
+DESCR("BERNOULLI table sampling method");
+
+#endif /* PG_TABLESAMPLE_METHOD_H */
diff --git a/src/include/executor/nodeSamplescan.h b/src/include/executor/nodeSamplescan.h
new file mode 100644
index 0000000..4b769da
--- /dev/null
+++ b/src/include/executor/nodeSamplescan.h
@@ -0,0 +1,24 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeSamplescan.h
+ *
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/executor/nodeSamplescan.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef NODESAMPLESCAN_H
+#define NODESAMPLESCAN_H
+
+#include "nodes/execnodes.h"
+
+extern SampleScanState *ExecInitSampleScan(SampleScan *node, EState *estate, int eflags);
+extern TupleTableSlot *ExecSampleScan(SampleScanState *node);
+extern void ExecEndSampleScan(SampleScanState *node);
+extern void ExecReScanSampleScan(SampleScanState *node);
+
+#endif /* NODESAMPLESCAN_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index ac75f86..20edee4 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1216,6 +1216,24 @@ typedef struct ScanState
typedef ScanState SeqScanState;
/*
+ * SampleScan
+ */
+typedef struct SampleScanState
+{
+ ScanState ss;
+
+ /* Sampling method functions. */
+ FmgrInfo tsminit;
+ FmgrInfo tsmnextblock;
+ FmgrInfo tsmnexttuple;
+ FmgrInfo tsmexaminetuple;
+ FmgrInfo tsmend;
+ FmgrInfo tsmreset;
+
+ void *tsmdata; /* for use by table scan method */
+} SampleScanState;
+
+/*
* These structs store information about index quals that don't have simple
* constant right-hand sides. See comments for ExecIndexBuildScanKeys()
* for discussion.
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 38469ef..caaedbf 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -61,6 +61,7 @@ typedef enum NodeTag
T_ValuesScan,
T_CteScan,
T_WorkTableScan,
+ T_SampleScan,
T_ForeignScan,
T_CustomScan,
T_Join,
@@ -97,6 +98,7 @@ typedef enum NodeTag
T_BitmapOrState,
T_ScanState,
T_SeqScanState,
+ T_SampleScanState,
T_IndexScanState,
T_IndexOnlyScanState,
T_BitmapIndexScanState,
@@ -414,6 +416,8 @@ typedef enum NodeTag
T_WithClause,
T_CommonTableExpr,
T_RoleSpec,
+ T_RangeTableSample,
+ T_TableSampleClause,
/*
* TAGS FOR REPLICATION GRAMMAR PARSE NODES (replnodes.h)
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 0e257ac..a4288d1 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -334,6 +334,26 @@ typedef struct FuncCall
} FuncCall;
/*
+ * TableSampleClause - a sampling method information
+ */
+typedef struct TableSampleClause
+{
+ NodeTag type;
+ Oid tsmid;
+ bool tsmseqscan;
+ bool tsmpagemode;
+ Oid tsminit;
+ Oid tsmnextblock;
+ Oid tsmnexttuple;
+ Oid tsmexaminetuple;
+ Oid tsmend;
+ Oid tsmreset;
+ Oid tsmcost;
+ Node *repeatable;
+ List *args;
+} TableSampleClause;
+
+/*
* A_Star - '*' representing all columns of a table or compound field
*
* This can appear within ColumnRef.fields, A_Indirection.indirection, and
@@ -534,6 +554,21 @@ typedef struct RangeFunction
} RangeFunction;
/*
+ * RangeTableSample - represents <table> TABLESAMPLE <method> (<params>) REPEATABLE (<num>)
+ *
+ * We are more generic than SQL Standard so we pass generic function
+ * arguments to the sampling method.
+ */
+typedef struct RangeTableSample
+{
+ NodeTag type;
+ RangeVar *relation;
+ char *method; /* sampling method */
+ Node *repeatable;
+ List *args; /* arguments for sampling method */
+} RangeTableSample;
+
+/*
* ColumnDef - column definition (used in various creates)
*
* If the column has a default value, we may have the value expression
@@ -769,6 +804,7 @@ typedef struct RangeTblEntry
*/
Oid relid; /* OID of the relation */
char relkind; /* relation kind (see pg_class.relkind) */
+ TableSampleClause *tablesample; /* sampling method and parameters */
/*
* Fields valid for a subquery RTE (else NULL):
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 21cbfa8..ddc3708 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -279,6 +279,12 @@ typedef struct Scan
typedef Scan SeqScan;
/* ----------------
+ * table sample scan node
+ * ----------------
+ */
+typedef Scan SampleScan;
+
+/* ----------------
* index scan node
*
* indexqualorig is an implicitly-ANDed list of index qual expressions, each
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 9c2000b..24003ae 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -68,6 +68,7 @@ extern double index_pages_fetched(double tuples_fetched, BlockNumber pages,
double index_pages, PlannerInfo *root);
extern void cost_seqscan(Path *path, PlannerInfo *root, RelOptInfo *baserel,
ParamPathInfo *param_info);
+extern void cost_samplescan(Path *path, PlannerInfo *root, RelOptInfo *baserel);
extern void cost_index(IndexPath *path, PlannerInfo *root,
double loop_count);
extern void cost_bitmap_heap_scan(Path *path, PlannerInfo *root, RelOptInfo *baserel,
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 9923f0e..89c8ded 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -32,6 +32,8 @@ extern bool add_path_precheck(RelOptInfo *parent_rel,
extern Path *create_seqscan_path(PlannerInfo *root, RelOptInfo *rel,
Relids required_outer);
+extern Path *create_samplescan_path(PlannerInfo *root, RelOptInfo *rel,
+ Relids required_outer);
extern IndexPath *create_index_path(PlannerInfo *root,
IndexOptInfo *index,
List *indexclauses,
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index 7c243ec..ae90df8 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -366,6 +366,7 @@ PG_KEYWORD("sysid", SYSID, UNRESERVED_KEYWORD)
PG_KEYWORD("system", SYSTEM_P, UNRESERVED_KEYWORD)
PG_KEYWORD("table", TABLE, RESERVED_KEYWORD)
PG_KEYWORD("tables", TABLES, UNRESERVED_KEYWORD)
+PG_KEYWORD("tablesample", TABLESAMPLE, TYPE_FUNC_NAME_KEYWORD)
PG_KEYWORD("tablespace", TABLESPACE, UNRESERVED_KEYWORD)
PG_KEYWORD("temp", TEMP, UNRESERVED_KEYWORD)
PG_KEYWORD("template", TEMPLATE, UNRESERVED_KEYWORD)
diff --git a/src/include/parser/parse_func.h b/src/include/parser/parse_func.h
index 3264691..6727e55 100644
--- a/src/include/parser/parse_func.h
+++ b/src/include/parser/parse_func.h
@@ -33,6 +33,10 @@ typedef enum
extern Node *ParseFuncOrColumn(ParseState *pstate, List *funcname, List *fargs,
FuncCall *fn, int location);
+extern TableSampleClause *ParseTableSample(ParseState *pstate,
+ char *samplemethod,
+ Node *repeatable, List *args);
+
extern FuncDetailCode func_get_detail(List *funcname,
List *fargs, List *fargnames,
int nargs, Oid *argtypes,
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index 6bd786d..185bd81 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -63,7 +63,6 @@ typedef struct RelationAmInfo
FmgrInfo amcanreturn;
} RelationAmInfo;
-
/*
* Here are the contents of a relation cache entry.
*/
diff --git a/src/include/utils/sampling.h b/src/include/utils/sampling.h
index e3e7f9c..4ac208d 100644
--- a/src/include/utils/sampling.h
+++ b/src/include/utils/sampling.h
@@ -15,7 +15,12 @@
#include "storage/bufmgr.h"
-extern double sampler_random_fract(void);
+/* Random generator for sampling code */
+typedef unsigned short SamplerRandomState[3];
+
+extern void sampler_random_init_state(long seed,
+ SamplerRandomState randstate);
+extern double sampler_random_fract(SamplerRandomState randstate);
/* Block sampling methods */
/* Data structure for Algorithm S from Knuth 3.4.2 */
@@ -25,6 +30,7 @@ typedef struct
int n; /* desired sample size */
BlockNumber t; /* current block number */
int m; /* blocks selected so far */
+ SamplerRandomState randstate; /* random generator state */
} BlockSamplerData;
typedef BlockSamplerData *BlockSampler;
@@ -35,7 +41,12 @@ extern bool BlockSampler_HasMore(BlockSampler bs);
extern BlockNumber BlockSampler_Next(BlockSampler bs);
/* Reservoid sampling methods */
-typedef double ReservoirStateData;
+typedef struct
+{
+ double W;
+ SamplerRandomState randstate; /* random generator state */
+} ReservoirStateData;
+
typedef ReservoirStateData *ReservoirState;
extern void reservoir_init_selection_state(ReservoirState rs, int n);
diff --git a/src/include/utils/syscache.h b/src/include/utils/syscache.h
index ba0b090..6b628f6 100644
--- a/src/include/utils/syscache.h
+++ b/src/include/utils/syscache.h
@@ -79,6 +79,8 @@ enum SysCacheIdentifier
RELOID,
RULERELNAME,
STATRELATTINH,
+ TABLESAMPLEMETHODNAME,
+ TABLESAMPLEMETHODOID,
TABLESPACEOID,
TSCONFIGMAP,
TSCONFIGNAMENSP,
diff --git a/src/include/utils/tablesample.h b/src/include/utils/tablesample.h
new file mode 100644
index 0000000..1a24cb6
--- /dev/null
+++ b/src/include/utils/tablesample.h
@@ -0,0 +1,27 @@
+/*--------------------------------------------------------------------------
+ * tablesample.h
+ * Header file for builtin table sampling methods.
+ *
+ * Copyright (c) 2006-2014, PostgreSQL Global Development Group
+ *
+ * src/include/utils/tablesample.h
+ *--------------------------------------------------------------------------
+ */
+#ifndef TABLESAMPLE_H
+#define TABLESAMPLE_H
+
+extern Datum tsm_system_init(PG_FUNCTION_ARGS);
+extern Datum tsm_system_nextblock(PG_FUNCTION_ARGS);
+extern Datum tsm_system_nexttuple(PG_FUNCTION_ARGS);
+extern Datum tsm_system_end(PG_FUNCTION_ARGS);
+extern Datum tsm_system_reset(PG_FUNCTION_ARGS);
+extern Datum tsm_system_cost(PG_FUNCTION_ARGS);
+
+extern Datum tsm_bernoulli_init(PG_FUNCTION_ARGS);
+extern Datum tsm_bernoulli_nextblock(PG_FUNCTION_ARGS);
+extern Datum tsm_bernoulli_nexttuple(PG_FUNCTION_ARGS);
+extern Datum tsm_bernoulli_end(PG_FUNCTION_ARGS);
+extern Datum tsm_bernoulli_reset(PG_FUNCTION_ARGS);
+extern Datum tsm_bernoulli_cost(PG_FUNCTION_ARGS);
+
+#endif /* TABLESAMPLE_H */
diff --git a/src/test/regress/expected/sanity_check.out b/src/test/regress/expected/sanity_check.out
index c7be273..5946edf 100644
--- a/src/test/regress/expected/sanity_check.out
+++ b/src/test/regress/expected/sanity_check.out
@@ -127,6 +127,7 @@ pg_shdepend|t
pg_shdescription|t
pg_shseclabel|t
pg_statistic|t
+pg_tablesample_method|t
pg_tablespace|t
pg_trigger|t
pg_ts_config|t
diff --git a/src/test/regress/expected/tablesample.out b/src/test/regress/expected/tablesample.out
new file mode 100644
index 0000000..5ba23c3
--- /dev/null
+++ b/src/test/regress/expected/tablesample.out
@@ -0,0 +1,168 @@
+CREATE TABLE test_tablesample (id INT, name text) WITH (fillfactor=10); -- force smaller pages so we don't have to load too much data to get multiple pages
+INSERT INTO test_tablesample SELECT i, repeat(i::text, 200) FROM generate_series(0, 9) s(i) ORDER BY i;
+SELECT t.id FROM test_tablesample AS t TABLESAMPLE SYSTEM (50) REPEATABLE (10);
+ id
+----
+ 0
+ 1
+ 2
+ 3
+ 4
+ 5
+ 9
+(7 rows)
+
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (100.0/11) REPEATABLE (9999);
+ id
+----
+ 6
+ 7
+ 8
+(3 rows)
+
+SELECT count(*) FROM test_tablesample TABLESAMPLE SYSTEM (100);
+ count
+-------
+ 10
+(1 row)
+
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50) REPEATABLE (100);
+ id
+----
+ 0
+ 1
+ 2
+ 6
+ 7
+ 8
+ 9
+(7 rows)
+
+SELECT id FROM test_tablesample TABLESAMPLE BERNOULLI (50) REPEATABLE (100);
+ id
+----
+ 0
+ 1
+ 3
+ 4
+ 5
+(5 rows)
+
+SELECT id FROM test_tablesample TABLESAMPLE BERNOULLI (5.5) REPEATABLE (1);
+ id
+----
+ 0
+ 5
+(2 rows)
+
+CREATE VIEW test_tablesample_v1 AS SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (10*2) REPEATABLE (2);
+CREATE VIEW test_tablesample_v2 AS SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (99);
+SELECT pg_get_viewdef('test_tablesample_v1'::regclass);
+ pg_get_viewdef
+--------------------------------------------------------------------------------
+ SELECT test_tablesample.id +
+ FROM test_tablesample TABLESAMPLE system (((10 * 2))::real) REPEATABLE (2);
+(1 row)
+
+SELECT pg_get_viewdef('test_tablesample_v2'::regclass);
+ pg_get_viewdef
+-----------------------------------------------------------
+ SELECT test_tablesample.id +
+ FROM test_tablesample TABLESAMPLE system ((99)::real);
+(1 row)
+
+BEGIN;
+DECLARE tablesample_cur CURSOR FOR SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50) REPEATABLE (100);
+FETCH FIRST FROM tablesample_cur;
+ id
+----
+ 0
+(1 row)
+
+FETCH NEXT FROM tablesample_cur;
+ id
+----
+ 1
+(1 row)
+
+FETCH NEXT FROM tablesample_cur;
+ id
+----
+ 2
+(1 row)
+
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50) REPEATABLE (10);
+ id
+----
+ 0
+ 1
+ 2
+ 3
+ 4
+ 5
+ 9
+(7 rows)
+
+FETCH NEXT FROM tablesample_cur;
+ id
+----
+ 6
+(1 row)
+
+FETCH NEXT FROM tablesample_cur;
+ id
+----
+ 7
+(1 row)
+
+FETCH NEXT FROM tablesample_cur;
+ id
+----
+ 8
+(1 row)
+
+FETCH FIRST FROM tablesample_cur;
+ id
+----
+ 0
+(1 row)
+
+FETCH NEXT FROM tablesample_cur;
+ id
+----
+ 1
+(1 row)
+
+FETCH NEXT FROM tablesample_cur;
+ id
+----
+ 2
+(1 row)
+
+FETCH NEXT FROM tablesample_cur;
+ id
+----
+ 6
+(1 row)
+
+FETCH NEXT FROM tablesample_cur;
+ id
+----
+ 7
+(1 row)
+
+FETCH NEXT FROM tablesample_cur;
+ id
+----
+ 8
+(1 row)
+
+CLOSE tablesample_cur;
+END;
+-- should fail
+SELECT id FROM test_tablesample_v1 TABLESAMPLE BERNOULLI (1);
+ERROR: TABLESAMPLE clause can only be used on tables and materialized views
+DROP TABLE test_tablesample CASCADE;
+NOTICE: drop cascades to 2 other objects
+DETAIL: drop cascades to view test_tablesample_v1
+drop cascades to view test_tablesample_v2
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 6d3b865..300e1fb 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -83,7 +83,7 @@ test: select_into select_distinct select_distinct_on select_implicit select_havi
# ----------
# Another group of parallel tests
# ----------
-test: brin gin gist spgist privileges security_label collate matview lock replica_identity rowsecurity object_address
+test: brin gin gist spgist privileges security_label collate matview lock replica_identity rowsecurity object_address tablesample
# ----------
# Another group of parallel tests
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 8326894..d815496 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -153,3 +153,4 @@ test: with
test: xml
test: event_trigger
test: stats
+test: tablesample
diff --git a/src/test/regress/sql/tablesample.sql b/src/test/regress/sql/tablesample.sql
new file mode 100644
index 0000000..d0c069c
--- /dev/null
+++ b/src/test/regress/sql/tablesample.sql
@@ -0,0 +1,42 @@
+CREATE TABLE test_tablesample (id INT, name text) WITH (fillfactor=10); -- force smaller pages so we don't have to load too much data to get multiple pages
+
+INSERT INTO test_tablesample SELECT i, repeat(i::text, 200) FROM generate_series(0, 9) s(i) ORDER BY i;
+
+SELECT t.id FROM test_tablesample AS t TABLESAMPLE SYSTEM (50) REPEATABLE (10);
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (100.0/11) REPEATABLE (9999);
+SELECT count(*) FROM test_tablesample TABLESAMPLE SYSTEM (100);
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50) REPEATABLE (100);
+SELECT id FROM test_tablesample TABLESAMPLE BERNOULLI (50) REPEATABLE (100);
+SELECT id FROM test_tablesample TABLESAMPLE BERNOULLI (5.5) REPEATABLE (1);
+
+CREATE VIEW test_tablesample_v1 AS SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (10*2) REPEATABLE (2);
+CREATE VIEW test_tablesample_v2 AS SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (99);
+SELECT pg_get_viewdef('test_tablesample_v1'::regclass);
+SELECT pg_get_viewdef('test_tablesample_v2'::regclass);
+
+BEGIN;
+DECLARE tablesample_cur CURSOR FOR SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50) REPEATABLE (100);
+FETCH FIRST FROM tablesample_cur;
+FETCH NEXT FROM tablesample_cur;
+FETCH NEXT FROM tablesample_cur;
+
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50) REPEATABLE (10);
+
+FETCH NEXT FROM tablesample_cur;
+FETCH NEXT FROM tablesample_cur;
+FETCH NEXT FROM tablesample_cur;
+
+FETCH FIRST FROM tablesample_cur;
+FETCH NEXT FROM tablesample_cur;
+FETCH NEXT FROM tablesample_cur;
+FETCH NEXT FROM tablesample_cur;
+FETCH NEXT FROM tablesample_cur;
+FETCH NEXT FROM tablesample_cur;
+
+CLOSE tablesample_cur;
+END;
+
+-- should fail
+SELECT id FROM test_tablesample_v1 TABLESAMPLE BERNOULLI (1);
+
+DROP TABLE test_tablesample CASCADE;
--
1.9.1
0001-separate-block-sampling-functions-v2.patchtext/x-diff; name=0001-separate-block-sampling-functions-v2.patchDownload
>From b3262d29606aaf6df97d48267658f6b2469791ac Mon Sep 17 00:00:00 2001
From: Petr Jelinek <pjmodos@pjmodos.net>
Date: Wed, 7 Jan 2015 23:36:56 +0100
Subject: [PATCH 1/4] separate block sampling functions v2
---
contrib/file_fdw/file_fdw.c | 9 +-
contrib/postgres_fdw/postgres_fdw.c | 10 +-
src/backend/commands/analyze.c | 225 +----------------------------------
src/backend/utils/misc/Makefile | 2 +-
src/backend/utils/misc/sampling.c | 226 ++++++++++++++++++++++++++++++++++++
src/include/commands/vacuum.h | 3 -
src/include/utils/sampling.h | 44 +++++++
7 files changed, 287 insertions(+), 232 deletions(-)
create mode 100644 src/backend/utils/misc/sampling.c
create mode 100644 src/include/utils/sampling.h
diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index 4368897..249d541 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -34,6 +34,7 @@
#include "optimizer/var.h"
#include "utils/memutils.h"
#include "utils/rel.h"
+#include "utils/sampling.h"
PG_MODULE_MAGIC;
@@ -1005,7 +1006,7 @@ file_acquire_sample_rows(Relation onerel, int elevel,
{
int numrows = 0;
double rowstoskip = -1; /* -1 means not set yet */
- double rstate;
+ ReservoirStateData rstate;
TupleDesc tupDesc;
Datum *values;
bool *nulls;
@@ -1043,7 +1044,7 @@ file_acquire_sample_rows(Relation onerel, int elevel,
ALLOCSET_DEFAULT_MAXSIZE);
/* Prepare for sampling rows */
- rstate = anl_init_selection_state(targrows);
+ reservoir_init_selection_state(&rstate, targrows);
/* Set up callback to identify error line number. */
errcallback.callback = CopyFromErrorCallback;
@@ -1087,7 +1088,7 @@ file_acquire_sample_rows(Relation onerel, int elevel,
* not-yet-incremented value of totalrows as t.
*/
if (rowstoskip < 0)
- rowstoskip = anl_get_next_S(*totalrows, targrows, &rstate);
+ rowstoskip = reservoir_get_next_S(&rstate, *totalrows, targrows);
if (rowstoskip <= 0)
{
@@ -1095,7 +1096,7 @@ file_acquire_sample_rows(Relation onerel, int elevel,
* Found a suitable tuple, so save it, replacing one old tuple
* at random
*/
- int k = (int) (targrows * anl_random_fract());
+ int k = (int) (targrows * sampler_random_fract());
Assert(k >= 0 && k < targrows);
heap_freetuple(rows[k]);
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 478e124..74ef792 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -37,6 +37,7 @@
#include "utils/lsyscache.h"
#include "utils/memutils.h"
#include "utils/rel.h"
+#include "utils/sampling.h"
PG_MODULE_MAGIC;
@@ -202,7 +203,7 @@ typedef struct PgFdwAnalyzeState
/* for random sampling */
double samplerows; /* # of rows fetched */
double rowstoskip; /* # of rows to skip before next sample */
- double rstate; /* random state */
+ ReservoirStateData rstate; /* state for reservoir sampling*/
/* working memory contexts */
MemoryContext anl_cxt; /* context for per-analyze lifespan data */
@@ -2397,7 +2398,7 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,
astate.numrows = 0;
astate.samplerows = 0;
astate.rowstoskip = -1; /* -1 means not set yet */
- astate.rstate = anl_init_selection_state(targrows);
+ reservoir_init_selection_state(&astate.rstate, targrows);
/* Remember ANALYZE context, and create a per-tuple temp context */
astate.anl_cxt = CurrentMemoryContext;
@@ -2537,13 +2538,12 @@ analyze_row_processor(PGresult *res, int row, PgFdwAnalyzeState *astate)
* analyze.c; see Jeff Vitter's paper.
*/
if (astate->rowstoskip < 0)
- astate->rowstoskip = anl_get_next_S(astate->samplerows, targrows,
- &astate->rstate);
+ astate->rowstoskip = reservoir_get_next_S(&astate->rstate, astate->samplerows, targrows);
if (astate->rowstoskip <= 0)
{
/* Choose a random reservoir element to replace. */
- pos = (int) (targrows * anl_random_fract());
+ pos = (int) (targrows * sampler_random_fract());
Assert(pos >= 0 && pos < targrows);
heap_freetuple(astate->rows[pos]);
}
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index d4d1914..5730f26 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -50,23 +50,13 @@
#include "utils/lsyscache.h"
#include "utils/memutils.h"
#include "utils/pg_rusage.h"
+#include "utils/sampling.h"
#include "utils/sortsupport.h"
#include "utils/syscache.h"
#include "utils/timestamp.h"
#include "utils/tqual.h"
-/* Data structure for Algorithm S from Knuth 3.4.2 */
-typedef struct
-{
- BlockNumber N; /* number of blocks, known in advance */
- int n; /* desired sample size */
- BlockNumber t; /* current block number */
- int m; /* blocks selected so far */
-} BlockSamplerData;
-
-typedef BlockSamplerData *BlockSampler;
-
/* Per-index data for ANALYZE */
typedef struct AnlIndexData
{
@@ -88,10 +78,6 @@ static BufferAccessStrategy vac_strategy;
static void do_analyze_rel(Relation onerel, int options, List *va_cols,
AcquireSampleRowsFunc acquirefunc, BlockNumber relpages,
bool inh, bool in_outer_xact, int elevel);
-static void BlockSampler_Init(BlockSampler bs, BlockNumber nblocks,
- int samplesize);
-static bool BlockSampler_HasMore(BlockSampler bs);
-static BlockNumber BlockSampler_Next(BlockSampler bs);
static void compute_index_stats(Relation onerel, double totalrows,
AnlIndexData *indexdata, int nindexes,
HeapTuple *rows, int numrows,
@@ -948,94 +934,6 @@ examine_attribute(Relation onerel, int attnum, Node *index_expr)
}
/*
- * BlockSampler_Init -- prepare for random sampling of blocknumbers
- *
- * BlockSampler is used for stage one of our new two-stage tuple
- * sampling mechanism as discussed on pgsql-hackers 2004-04-02 (subject
- * "Large DB"). It selects a random sample of samplesize blocks out of
- * the nblocks blocks in the table. If the table has less than
- * samplesize blocks, all blocks are selected.
- *
- * Since we know the total number of blocks in advance, we can use the
- * straightforward Algorithm S from Knuth 3.4.2, rather than Vitter's
- * algorithm.
- */
-static void
-BlockSampler_Init(BlockSampler bs, BlockNumber nblocks, int samplesize)
-{
- bs->N = nblocks; /* measured table size */
-
- /*
- * If we decide to reduce samplesize for tables that have less or not much
- * more than samplesize blocks, here is the place to do it.
- */
- bs->n = samplesize;
- bs->t = 0; /* blocks scanned so far */
- bs->m = 0; /* blocks selected so far */
-}
-
-static bool
-BlockSampler_HasMore(BlockSampler bs)
-{
- return (bs->t < bs->N) && (bs->m < bs->n);
-}
-
-static BlockNumber
-BlockSampler_Next(BlockSampler bs)
-{
- BlockNumber K = bs->N - bs->t; /* remaining blocks */
- int k = bs->n - bs->m; /* blocks still to sample */
- double p; /* probability to skip block */
- double V; /* random */
-
- Assert(BlockSampler_HasMore(bs)); /* hence K > 0 and k > 0 */
-
- if ((BlockNumber) k >= K)
- {
- /* need all the rest */
- bs->m++;
- return bs->t++;
- }
-
- /*----------
- * It is not obvious that this code matches Knuth's Algorithm S.
- * Knuth says to skip the current block with probability 1 - k/K.
- * If we are to skip, we should advance t (hence decrease K), and
- * repeat the same probabilistic test for the next block. The naive
- * implementation thus requires an anl_random_fract() call for each block
- * number. But we can reduce this to one anl_random_fract() call per
- * selected block, by noting that each time the while-test succeeds,
- * we can reinterpret V as a uniform random number in the range 0 to p.
- * Therefore, instead of choosing a new V, we just adjust p to be
- * the appropriate fraction of its former value, and our next loop
- * makes the appropriate probabilistic test.
- *
- * We have initially K > k > 0. If the loop reduces K to equal k,
- * the next while-test must fail since p will become exactly zero
- * (we assume there will not be roundoff error in the division).
- * (Note: Knuth suggests a "<=" loop condition, but we use "<" just
- * to be doubly sure about roundoff error.) Therefore K cannot become
- * less than k, which means that we cannot fail to select enough blocks.
- *----------
- */
- V = anl_random_fract();
- p = 1.0 - (double) k / (double) K;
- while (V < p)
- {
- /* skip */
- bs->t++;
- K--; /* keep K == N - t */
-
- /* adjust p to be new cutoff point in reduced range */
- p *= 1.0 - (double) k / (double) K;
- }
-
- /* select */
- bs->m++;
- return bs->t++;
-}
-
-/*
* acquire_sample_rows -- acquire a random sample of rows from the table
*
* Selected rows are returned in the caller-allocated array rows[], which
@@ -1081,7 +979,7 @@ acquire_sample_rows(Relation onerel, int elevel,
BlockNumber totalblocks;
TransactionId OldestXmin;
BlockSamplerData bs;
- double rstate;
+ ReservoirStateData rstate;
Assert(targrows > 0);
@@ -1091,9 +989,9 @@ acquire_sample_rows(Relation onerel, int elevel,
OldestXmin = GetOldestXmin(onerel, true);
/* Prepare for sampling block numbers */
- BlockSampler_Init(&bs, totalblocks, targrows);
+ BlockSampler_Init(&bs, totalblocks, targrows, random());
/* Prepare for sampling rows */
- rstate = anl_init_selection_state(targrows);
+ reservoir_init_selection_state(&rstate, targrows);
/* Outer loop over blocks to sample */
while (BlockSampler_HasMore(&bs))
@@ -1241,8 +1139,7 @@ acquire_sample_rows(Relation onerel, int elevel,
* t.
*/
if (rowstoskip < 0)
- rowstoskip = anl_get_next_S(samplerows, targrows,
- &rstate);
+ rowstoskip = reservoir_get_next_S(&rstate, samplerows, targrows);
if (rowstoskip <= 0)
{
@@ -1250,7 +1147,7 @@ acquire_sample_rows(Relation onerel, int elevel,
* Found a suitable tuple, so save it, replacing one
* old tuple at random
*/
- int k = (int) (targrows * anl_random_fract());
+ int k = (int) (targrows * sampler_random_fract());
Assert(k >= 0 && k < targrows);
heap_freetuple(rows[k]);
@@ -1309,116 +1206,6 @@ acquire_sample_rows(Relation onerel, int elevel,
return numrows;
}
-/* Select a random value R uniformly distributed in (0 - 1) */
-double
-anl_random_fract(void)
-{
- return ((double) random() + 1) / ((double) MAX_RANDOM_VALUE + 2);
-}
-
-/*
- * These two routines embody Algorithm Z from "Random sampling with a
- * reservoir" by Jeffrey S. Vitter, in ACM Trans. Math. Softw. 11, 1
- * (Mar. 1985), Pages 37-57. Vitter describes his algorithm in terms
- * of the count S of records to skip before processing another record.
- * It is computed primarily based on t, the number of records already read.
- * The only extra state needed between calls is W, a random state variable.
- *
- * anl_init_selection_state computes the initial W value.
- *
- * Given that we've already read t records (t >= n), anl_get_next_S
- * determines the number of records to skip before the next record is
- * processed.
- */
-double
-anl_init_selection_state(int n)
-{
- /* Initial value of W (for use when Algorithm Z is first applied) */
- return exp(-log(anl_random_fract()) / n);
-}
-
-double
-anl_get_next_S(double t, int n, double *stateptr)
-{
- double S;
-
- /* The magic constant here is T from Vitter's paper */
- if (t <= (22.0 * n))
- {
- /* Process records using Algorithm X until t is large enough */
- double V,
- quot;
-
- V = anl_random_fract(); /* Generate V */
- S = 0;
- t += 1;
- /* Note: "num" in Vitter's code is always equal to t - n */
- quot = (t - (double) n) / t;
- /* Find min S satisfying (4.1) */
- while (quot > V)
- {
- S += 1;
- t += 1;
- quot *= (t - (double) n) / t;
- }
- }
- else
- {
- /* Now apply Algorithm Z */
- double W = *stateptr;
- double term = t - (double) n + 1;
-
- for (;;)
- {
- double numer,
- numer_lim,
- denom;
- double U,
- X,
- lhs,
- rhs,
- y,
- tmp;
-
- /* Generate U and X */
- U = anl_random_fract();
- X = t * (W - 1.0);
- S = floor(X); /* S is tentatively set to floor(X) */
- /* Test if U <= h(S)/cg(X) in the manner of (6.3) */
- tmp = (t + 1) / term;
- lhs = exp(log(((U * tmp * tmp) * (term + S)) / (t + X)) / n);
- rhs = (((t + X) / (term + S)) * term) / t;
- if (lhs <= rhs)
- {
- W = rhs / lhs;
- break;
- }
- /* Test if U <= f(S)/cg(X) */
- y = (((U * (t + 1)) / term) * (t + S + 1)) / (t + X);
- if ((double) n < S)
- {
- denom = t;
- numer_lim = term + S;
- }
- else
- {
- denom = t - (double) n + S;
- numer_lim = t + 1;
- }
- for (numer = t + S; numer >= numer_lim; numer -= 1)
- {
- y *= numer / denom;
- denom -= 1;
- }
- W = exp(-log(anl_random_fract()) / n); /* Generate W in advance */
- if (exp(log(y) / n) <= (t + X) / t)
- break;
- }
- *stateptr = W;
- }
- return S;
-}
-
/*
* qsort comparator for sorting rows[] array
*/
diff --git a/src/backend/utils/misc/Makefile b/src/backend/utils/misc/Makefile
index 378b77e..7889101 100644
--- a/src/backend/utils/misc/Makefile
+++ b/src/backend/utils/misc/Makefile
@@ -15,7 +15,7 @@ include $(top_builddir)/src/Makefile.global
override CPPFLAGS := -I. -I$(srcdir) $(CPPFLAGS)
OBJS = guc.o help_config.o pg_rusage.o ps_status.o rls.o \
- superuser.o timeout.o tzparser.o
+ sampling.o superuser.o timeout.o tzparser.o
# This location might depend on the installation directories. Therefore
# we can't subsitute it into pg_config.h.
diff --git a/src/backend/utils/misc/sampling.c b/src/backend/utils/misc/sampling.c
new file mode 100644
index 0000000..1eeabaf
--- /dev/null
+++ b/src/backend/utils/misc/sampling.c
@@ -0,0 +1,226 @@
+/*-------------------------------------------------------------------------
+ *
+ * sampling.c
+ * Relation block sampling routines.
+ *
+ * Portions Copyright (c) 1996-2012, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ * src/backend/utils/misc/sampling.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include <math.h>
+
+#include "utils/sampling.h"
+
+
+/*
+ * BlockSampler_Init -- prepare for random sampling of blocknumbers
+ *
+ * BlockSampler provides algorithm for block level sampling of a relation
+ * as discussed on pgsql-hackers 2004-04-02 (subject "Large DB")
+ * It selects a random sample of samplesize blocks out of
+ * the nblocks blocks in the table. If the table has less than
+ * samplesize blocks, all blocks are selected.
+ *
+ * Since we know the total number of blocks in advance, we can use the
+ * straightforward Algorithm S from Knuth 3.4.2, rather than Vitter's
+ * algorithm.
+ */
+void
+BlockSampler_Init(BlockSampler bs, BlockNumber nblocks, int samplesize,
+ long randseed)
+{
+ bs->N = nblocks; /* measured table size */
+
+ /*
+ * If we decide to reduce samplesize for tables that have less or not much
+ * more than samplesize blocks, here is the place to do it.
+ */
+ bs->n = samplesize;
+ bs->t = 0; /* blocks scanned so far */
+ bs->m = 0; /* blocks selected so far */
+}
+
+bool
+BlockSampler_HasMore(BlockSampler bs)
+{
+ return (bs->t < bs->N) && (bs->m < bs->n);
+}
+
+BlockNumber
+BlockSampler_Next(BlockSampler bs)
+{
+ BlockNumber K = bs->N - bs->t; /* remaining blocks */
+ int k = bs->n - bs->m; /* blocks still to sample */
+ double p; /* probability to skip block */
+ double V; /* random */
+
+ Assert(BlockSampler_HasMore(bs)); /* hence K > 0 and k > 0 */
+
+ if ((BlockNumber) k >= K)
+ {
+ /* need all the rest */
+ bs->m++;
+ return bs->t++;
+ }
+
+ /*----------
+ * It is not obvious that this code matches Knuth's Algorithm S.
+ * Knuth says to skip the current block with probability 1 - k/K.
+ * If we are to skip, we should advance t (hence decrease K), and
+ * repeat the same probabilistic test for the next block. The naive
+ * implementation thus requires an sampler_random_fract() call for each
+ * block number. But we can reduce this to one sampler_random_fract()
+ * call per selected block, by noting that each time the while-test
+ * succeeds, we can reinterpret V as a uniform random number in the range
+ * 0 to p. Therefore, instead of choosing a new V, we just adjust p to be
+ * the appropriate fraction of its former value, and our next loop
+ * makes the appropriate probabilistic test.
+ *
+ * We have initially K > k > 0. If the loop reduces K to equal k,
+ * the next while-test must fail since p will become exactly zero
+ * (we assume there will not be roundoff error in the division).
+ * (Note: Knuth suggests a "<=" loop condition, but we use "<" just
+ * to be doubly sure about roundoff error.) Therefore K cannot become
+ * less than k, which means that we cannot fail to select enough blocks.
+ *----------
+ */
+ V = sampler_random_fract();
+ p = 1.0 - (double) k / (double) K;
+ while (V < p)
+ {
+ /* skip */
+ bs->t++;
+ K--; /* keep K == N - t */
+
+ /* adjust p to be new cutoff point in reduced range */
+ p *= 1.0 - (double) k / (double) K;
+ }
+
+ /* select */
+ bs->m++;
+ return bs->t++;
+}
+
+/*
+ * These two routines embody Algorithm Z from "Random sampling with a
+ * reservoir" by Jeffrey S. Vitter, in ACM Trans. Math. Softw. 11, 1
+ * (Mar. 1985), Pages 37-57. Vitter describes his algorithm in terms
+ * of the count S of records to skip before processing another record.
+ * It is computed primarily based on t, the number of records already read.
+ * The only extra state needed between calls is W, a random state variable.
+ *
+ * reservoir_init_selection_state computes the initial W value.
+ *
+ * Given that we've already read t records (t >= n), reservoir_get_next_S
+ * determines the number of records to skip before the next record is
+ * processed.
+ */
+void
+reservoir_init_selection_state(ReservoirState rs, int n)
+{
+ /* Initial value of W (for use when Algorithm Z is first applied) */
+ *rs = exp(-log(sampler_random_fract()) / n);
+}
+
+double
+reservoir_get_next_S(ReservoirState rs, double t, int n)
+{
+ double S;
+
+ /* The magic constant here is T from Vitter's paper */
+ if (t <= (22.0 * n))
+ {
+ /* Process records using Algorithm X until t is large enough */
+ double V,
+ quot;
+
+ V = sampler_random_fract(); /* Generate V */
+ S = 0;
+ t += 1;
+ /* Note: "num" in Vitter's code is always equal to t - n */
+ quot = (t - (double) n) / t;
+ /* Find min S satisfying (4.1) */
+ while (quot > V)
+ {
+ S += 1;
+ t += 1;
+ quot *= (t - (double) n) / t;
+ }
+ }
+ else
+ {
+ /* Now apply Algorithm Z */
+ double W = *rs;
+ double term = t - (double) n + 1;
+
+ for (;;)
+ {
+ double numer,
+ numer_lim,
+ denom;
+ double U,
+ X,
+ lhs,
+ rhs,
+ y,
+ tmp;
+
+ /* Generate U and X */
+ U = sampler_random_fract();
+ X = t * (W - 1.0);
+ S = floor(X); /* S is tentatively set to floor(X) */
+ /* Test if U <= h(S)/cg(X) in the manner of (6.3) */
+ tmp = (t + 1) / term;
+ lhs = exp(log(((U * tmp * tmp) * (term + S)) / (t + X)) / n);
+ rhs = (((t + X) / (term + S)) * term) / t;
+ if (lhs <= rhs)
+ {
+ W = rhs / lhs;
+ break;
+ }
+ /* Test if U <= f(S)/cg(X) */
+ y = (((U * (t + 1)) / term) * (t + S + 1)) / (t + X);
+ if ((double) n < S)
+ {
+ denom = t;
+ numer_lim = term + S;
+ }
+ else
+ {
+ denom = t - (double) n + S;
+ numer_lim = t + 1;
+ }
+ for (numer = t + S; numer >= numer_lim; numer -= 1)
+ {
+ y *= numer / denom;
+ denom -= 1;
+ }
+ W = exp(-log(sampler_random_fract()) / n); /* Generate W in advance */
+ if (exp(log(y) / n) <= (t + X) / t)
+ break;
+ }
+ *rs = W;
+ }
+ return S;
+}
+
+
+/*----------
+ * Random number generator used by sampling
+ *----------
+ */
+
+/* Select a random value R uniformly distributed in (0 - 1) */
+double
+sampler_random_fract()
+{
+ return ((double) random() + 1) / ((double) MAX_RANDOM_VALUE + 2);
+}
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index 9fd2516..27f5195 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -194,8 +194,5 @@ extern void analyze_rel(Oid relid, RangeVar *relation, int options,
List *va_cols, bool in_outer_xact,
BufferAccessStrategy bstrategy);
extern bool std_typanalyze(VacAttrStats *stats);
-extern double anl_random_fract(void);
-extern double anl_init_selection_state(int n);
-extern double anl_get_next_S(double t, int n, double *stateptr);
#endif /* VACUUM_H */
diff --git a/src/include/utils/sampling.h b/src/include/utils/sampling.h
new file mode 100644
index 0000000..e3e7f9c
--- /dev/null
+++ b/src/include/utils/sampling.h
@@ -0,0 +1,44 @@
+/*-------------------------------------------------------------------------
+ *
+ * sampling.h
+ * definitions for sampling functions
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/utils/sampling.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef SAMPLING_H
+#define SAMPLING_H
+
+#include "storage/bufmgr.h"
+
+extern double sampler_random_fract(void);
+
+/* Block sampling methods */
+/* Data structure for Algorithm S from Knuth 3.4.2 */
+typedef struct
+{
+ BlockNumber N; /* number of blocks, known in advance */
+ int n; /* desired sample size */
+ BlockNumber t; /* current block number */
+ int m; /* blocks selected so far */
+} BlockSamplerData;
+
+typedef BlockSamplerData *BlockSampler;
+
+extern void BlockSampler_Init(BlockSampler bs, BlockNumber nblocks,
+ int samplesize, long randseed);
+extern bool BlockSampler_HasMore(BlockSampler bs);
+extern BlockNumber BlockSampler_Next(BlockSampler bs);
+
+/* Reservoid sampling methods */
+typedef double ReservoirStateData;
+typedef ReservoirStateData *ReservoirState;
+
+extern void reservoir_init_selection_state(ReservoirState rs, int n);
+extern double reservoir_get_next_S(ReservoirState rs, double t, int n);
+
+#endif /* SAMPLING_H */
--
1.9.1
On Fri, Apr 3, 2015 at 3:06 AM, Petr Jelinek <petr@2ndquadrant.com> wrote:
Hi,
so here is version 11.
Thanks, patch looks much better, but I think still few more
things needs to discussed/fixed.
1.
+tablesample_clause:
+ TABLESAMPLE ColId '(' func_arg_list ')' opt_repeatable_clause
Why do you want to allow func_arg_list?
Basically if user tries to pass multiple arguments in
TABLESAMPLE method's clause like (10,20), then I think
that should be caught in grammer level as an error.
SQL - 2003 specs says that argument to REPAEATABLE and TABLESAMPLE
method is same <numeric value expr>
It seems to me that you want to allow it to make it extendable
to user defined Tablesample methods, but not sure if it is
right to use func_arg_list for the same because sample method
arguments can have different definition than function arguments.
Now even if we want to keep sample method arguments same as
function arguments that sounds like a separate discussion.
In general, I feel you already have good basic infrastructure for
supportting other sample methods, but it is better to keep the new
DDL's for doing the same as a separate patch than this patch, as that
way we can reduce the scope of this patch, OTOH if you or others
feel that it is mandatory to have new DDL's support for other
tablesample methods then we have to deal with this now itself.
2.
postgres=# explain update test_tablesample TABLESAMPLE system(30) set id =
2;
ERROR: syntax error at or near "TABLESAMPLE"
LINE 1: explain update test_tablesample TABLESAMPLE system(30) set i...
postgres=# Delete from test_tablesample TABLESAMPLE system(30);
ERROR: syntax error at or near "TABLESAMPLE"
LINE 1: Delete from test_tablesample TABLESAMPLE system(30);
Isn't TABLESAMPLE clause suppose to work with Update/Delete
statements?
3.
+static RangeTblEntry *
+transformTableSampleEntry(ParseState *pstate, RangeTableSample *r)
..
+ parser_errposition(pstate,
+ exprLocation((Node *) r))));
Better to align exprLocation() with pstate
4.
SampleTupleVisible()
{
..
else
{
/* No pagemode, we have to check the tuple itself. */
Snapshot
snapshot = scan->rs_snapshot;
Buffer buffer = scan->rs_cbuf;
bool visible =
HeapTupleSatisfiesVisibility(tuple, snapshot, buffer);
..
}
I think it is better to check if PageIsAllVisible() in above
code before visibility check as that can avoid visibility checks.
5.
ParseTableSample(ParseState *pstate, char *samplemethod, Node *repeatable,
List
*sampleargs)
{
..
if (con->val.type == T_Null)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("REPEATABLE
clause must be NOT NULL numeric value"),
parser_errposition
(pstate, con->location)));
..
}
InitSamplingMethod(SampleScanState *scanstate, TableSampleClause
*tablesample)
{
..
if (fcinfo.argnull[1])
ereport(ERROR,
(errcode
(ERRCODE_NULL_VALUE_NOT_ALLOWED),
errmsg("REPEATABLE clause cannot be
NULL")));
..
}
I think it would be better if we can have same error message
and probably the same error code in both of the above cases.
6.
extern TableSampleClause *
ParseTableSample(ParseState *pstate, char *samplemethod, Node *repeatable,
List *sampleargs)
It is better to expose function (usage of extern) via header file.
Is there a need to mention extern here?
7.
ParseTableSample()
{
..
arg = coerce_to_specific_type(pstate, arg, INT4OID, "REPEATABLE");
..
}
What is the reason for coercing value of REPEATABLE clause to INT4OID
when numeric value is expected for the clause. If user gives the
input value as -2.3, it will generate a seed which doesn't seem to
be right.
8.
+DATA(insert OID = 3295 ( tsm_system_init PGNSP PGUID 12 1 0 0 0 f f f f t
f v 3 0 2278 "2281
23 700" _null_ _null_ _null_ _null_ tsm_system_init _null_ _null_ _null_ ));
+DATA(insert OID = 3301 ( tsm_bernoulli_init PGNSP PGUID 12 1 0 0 0 f f f
f t f v 3 0 2278 "2281
23 700" _null_ _null_ _null_ _null_ tsm_bernoulli_init _null_ _null_ _null_
));
Datatype for second argument is kept as 700 (Float4), shouldn't
it be 1700 (Numeric)?
9.
postgres=# explain SELECT t1.id FROM test_tablesample as t1 TABLESAMPLE
SYSTEM (
10), test_tablesample as t2 TABLESAMPLE BERNOULLI (20);
QUERY PLAN
----------------------------------------------------------------------------
Nested Loop (cost=0.00..4.05 rows=100 width=4)
-> Sample Scan on test_tablesample t1 (cost=0.00..0.01 rows=1 width=4)
-> Sample Scan on test_tablesample t2 (cost=0.00..4.02 rows=2 width=0)
(3 rows)
Isn't it better to display sample method name in explain.
I think it can help in case of join queries.
We can use below format to display:
Sample Scan (System) on test_tablesample ...
10. Bernoulli.c
+/* State */
+typedef struct
+{
+ uint32 seed; /* random seed */
+ BlockNumber startblock; /* starting block, we use ths for syncscan
support */
typo.
/ths/this
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On 04/04/15 14:57, Amit Kapila wrote:
1. +tablesample_clause: +TABLESAMPLE ColId '(' func_arg_list ')' opt_repeatable_clauseWhy do you want to allow func_arg_list?
Basically if user tries to pass multiple arguments in
TABLESAMPLE method's clause like (10,20), then I think
that should be caught in grammer level as an error.
It will be reported as error during parse analysis if the TABLESAMPLE
method does not accept two parameters, same as when the expression used
wrong type for example.
SQL - 2003 specs says that argument to REPAEATABLE and TABLESAMPLE
method is same <numeric value expr>It seems to me that you want to allow it to make it extendable
to user defined Tablesample methods, but not sure if it is
right to use func_arg_list for the same because sample method
arguments can have different definition than function arguments.
Now even if we want to keep sample method arguments same as
function arguments that sounds like a separate discussion.
Yes I want extensibility here. And I think the tablesample method
arguments are same thing as function arguments given that in the end
they are arguments for the init function of tablesampling method.
I would be ok with just expr_list, naming parameters here isn't overly
important, but ability to have different types and numbers of parameters
for custom TABLESAMPLE methods *is* important.
In general, I feel you already have good basic infrastructure for
supportting other sample methods, but it is better to keep the new
DDL's for doing the same as a separate patch than this patch, as that
way we can reduce the scope of this patch, OTOH if you or others
feel that it is mandatory to have new DDL's support for other
tablesample methods then we have to deal with this now itself.
Well I did attach it as separate diff mainly for that reason. I agree
that DDL does not have to be committed immediately with the rest of the
patch (although it's the simplest part of the patch IMHO).
2.
postgres=# explain update test_tablesample TABLESAMPLE system(30) set id
= 2;
ERROR: syntax error at or near "TABLESAMPLE"
LINE 1: explain update test_tablesample TABLESAMPLE system(30) set i...postgres=# Delete from test_tablesample TABLESAMPLE system(30);
ERROR: syntax error at or near "TABLESAMPLE"
LINE 1: Delete from test_tablesample TABLESAMPLE system(30);Isn't TABLESAMPLE clause suppose to work with Update/Delete
statements?
It's supported in the FROM part of UPDATE and USING part of DELETE. I
think that that's sufficient.
Standard is somewhat useless for UPDATE and DELETE as it only defines
quite limited syntax there. From what I've seen when doing research
MSSQL also only supports it in their equivalent of FROM/USING list,
Oracle does not seem to support their SAMPLING clause outside of SELECTs
at all and if I got the cryptic DB2 manual correctly I think they don't
support it outside of (sub)SELECTs either.
4.
SampleTupleVisible()
{
..
else
{
/* No pagemode, we have to check the tuple itself. */
Snapshot
snapshot = scan->rs_snapshot;
Bufferbuffer = scan->rs_cbuf;bool visible =
HeapTupleSatisfiesVisibility(tuple, snapshot, buffer);
..
}I think it is better to check if PageIsAllVisible() in above
code before visibility check as that can avoid visibility checks.
It's probably even better to do that one level up in the samplenexttup()
and only call the SampleTupleVisible if page is not allvisible
(PageIsAllVisible() is cheap).
6.
extern TableSampleClause *
ParseTableSample(ParseState *pstate, char *samplemethod, Node *repeatable,
List *sampleargs)It is better to expose function (usage of extern) via header file.
Is there a need to mention extern here?
Eh, stupid copy-paste error when copying function name from header to
actual source file.
7.
ParseTableSample()
{
..
arg = coerce_to_specific_type(pstate, arg, INT4OID, "REPEATABLE");
..
}What is the reason for coercing value of REPEATABLE clause to INT4OID
when numeric value is expected for the clause. If user gives the
input value as -2.3, it will generate a seed which doesn't seem to
be right.
Because the REPEATABLE is numeric expression so it can produce whatever
number but we need int4 internally (well float4 would also work just the
code would be slightly uglier). And we do this type of coercion even for
table data (you can insert -2.3 into integer column and it will work) so
I don't see what's wrong with it here.
8. +DATA(insert OID = 3295 ( tsm_system_initPGNSP PGUID 12 1 0 0 0 f f f f t f v 3 0 2278 "2281 23 700" _null_ _null_ _null_ _null_tsm_system_init _null_ _null_ _null_ )); +DATA(insert OID = 3301 ( tsm_bernoulli_initPGNSP PGUID 12 1 0 0 0 f f f f t f v 3 0 2278 "2281 23 700" _null_ _null_ _null_ _null_tsm_bernoulli_init _null_ _null_ _null_ ));Datatype for second argument is kept as 700 (Float4), shouldn't
it be 1700 (Numeric)?
Why is that? Given the sampling error I think the float4 is enough for
specifying the percentage and it makes the calculations much easier and
faster than dealing with Numeric would.
9.
postgres=# explain SELECT t1.id <http://t1.id> FROM test_tablesample as
t1 TABLESAMPLE SYSTEM (
10), test_tablesample as t2 TABLESAMPLE BERNOULLI (20);
QUERY PLAN
----------------------------------------------------------------------------
Nested Loop (cost=0.00..4.05 rows=100 width=4)
-> Sample Scan on test_tablesample t1 (cost=0.00..0.01 rows=1 width=4)
-> Sample Scan on test_tablesample t2 (cost=0.00..4.02 rows=2 width=0)
(3 rows)Isn't it better to display sample method name in explain.
I think it can help in case of join queries.
We can use below format to display:
Sample Scan (System) on test_tablesample ...
Good idea.
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2 April 2015 at 17:36, Petr Jelinek <petr@2ndquadrant.com> wrote:
so here is version 11.
Looks great.
Comment on docs:
The SELECT docs refer only to SYSTEM and BERNOULLI. It doesn't mention
that if other methods are available they could be used also. The
phrasing was "sampling method can be one of <list>."
Are we ready for a final detailed review and commit?
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, RemoteDBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, Apr 4, 2015 at 8:25 PM, Petr Jelinek <petr@2ndquadrant.com> wrote:
On 04/04/15 14:57, Amit Kapila wrote:
1. +tablesample_clause: +TABLESAMPLE ColId '(' func_arg_list ')' opt_repeatable_clauseIt seems to me that you want to allow it to make it extendable
to user defined Tablesample methods, but not sure if it is
right to use func_arg_list for the same because sample method
arguments can have different definition than function arguments.
Now even if we want to keep sample method arguments same as
function arguments that sounds like a separate discussion.Yes I want extensibility here. And I think the tablesample method
arguments are same thing as function arguments given that in the end they
are arguments for the init function of tablesampling method.
I would be ok with just expr_list, naming parameters here isn't overly
important, but ability to have different types and numbers of parameters
for custom TABLESAMPLE methods *is* important.
Yeah, named parameters is one reason which I think won't
be required for sample methods and neither the same is
mentioned in docs (if user has to use, what is the way he
can pass the same) and another is number of arguments
for sampling methods which is currently seems to be same
as FUNC_MAX_ARGS, I think that is sufficient but do we
want to support that many arguments for sampling method.
I have shared my thoughts regarding this point with you, if
you don't agree with the same, then proceed as you think is
the best way.
2.
postgres=# explain update test_tablesample TABLESAMPLE system(30) set id
= 2;
ERROR: syntax error at or near "TABLESAMPLE"
LINE 1: explain update test_tablesample TABLESAMPLE system(30) set i...postgres=# Delete from test_tablesample TABLESAMPLE system(30);
ERROR: syntax error at or near "TABLESAMPLE"
LINE 1: Delete from test_tablesample TABLESAMPLE system(30);Isn't TABLESAMPLE clause suppose to work with Update/Delete
statements?It's supported in the FROM part of UPDATE and USING part of DELETE. I
think that that's sufficient.
But I think the Update on target table with sample scan is
supported via views which doesn't seem to be the right thing
in case you just want to support it via FROM/USING, example
postgres=# create view vw_test As select * from test_tablesample
TABLESAMPLE sys
tem(30);
postgres=# explain update vw_test set id = 4;
QUERY PLAN
---------------------------------------------------------------------------
Update on test_tablesample (cost=0.00..4.04 rows=4 width=210)
-> Sample Scan on test_tablesample (cost=0.00..4.04 rows=4 width=210)
(2 rows)
Standard is somewhat useless for UPDATE and DELETE as it only defines
quite limited syntax there. From what I've seen when doing research MSSQL
also only supports it in their equivalent of FROM/USING list, Oracle does
not seem to support their SAMPLING clause outside of SELECTs at all and if
I got the cryptic DB2 manual correctly I think they don't support it
outside of (sub)SELECTs either.
By the way, what is the usecase to support sample scan in
Update or Delete statement?
Also, isn't it better to mention in the docs for Update and
Delete incase we are going to support tablesample clause
for them?
7.
ParseTableSample()
{
..
arg = coerce_to_specific_type(pstate, arg, INT4OID, "REPEATABLE");
..
}What is the reason for coercing value of REPEATABLE clause to INT4OID
when numeric value is expected for the clause. If user gives the
input value as -2.3, it will generate a seed which doesn't seem to
be right.Because the REPEATABLE is numeric expression so it can produce whatever
number but we need int4 internally (well float4 would also work just the
code would be slightly uglier).
Okay, I understand that part. Here the real point is why not just expose
it as int4 to user rather than telling in docs that it is numeric and
actually we neither expect nor use it as numberic.
Even Oracle supports supports it as int with below description.
The seed_value must be an integer between 0 and 4294967295
And we do this type of coercion even for table data (you can insert -2.3
into integer column and it will work) so I don't see what's wrong with it
here.
I am not sure we can compare it with column of a table. I think we
can support it within a valid range (similar to tablesample method) and
if user inputs value outside the range, then return error.
8.
+DATA(insert OID = 3295 ( tsm_system_initPGNSP PGUID 12 1 0 0 0 f f f f
t f v 3 0 2278 "2281
23 700" _null_ _null_ _null_ _null_tsm_system_init _null_ _null_ _null_
));
+DATA(insert OID = 3301 ( tsm_bernoulli_initPGNSP PGUID 12 1 0 0 0 f f
f f t f v 3 0 2278 "2281
23 700" _null_ _null_ _null_ _null_tsm_bernoulli_init _null_ _null_
_null_ ));Datatype for second argument is kept as 700 (Float4), shouldn't
it be 1700 (Numeric)?Why is that?
As we are exposing it as numeric.
Given the sampling error I think the float4 is enough for specifying the
percentage and it makes the calculations much easier and faster than
dealing with Numeric would.
Your explanation makes sense to me and we can leave it as it is.
One more point:
- [ ONLY ] <replaceable class="parameter">table_name</replaceable> [ * ]
[ [ AS ] <replaceable
class="parameter">alias</replaceable> [ ( <replaceable
class="parameter">column_alias</replaceable> [, ...] )
] ]
+ [ ONLY ] <replaceable class="parameter">table_name</replaceable> [ * ]
[ TABLESAMPLE <replaceable
class="parameter">sampling_method</replaceable> ( <replaceable
class="parameter">argument</replaceable> [,
...] ) [ REPEATABLE ( <replaceable class="parameter">seed</replaceable> ) ]
] [ [ AS ] <replaceable
class="parameter">alias</replaceable> [ ( <replaceable
class="parameter">column_alias</replaceable> [, ...] )
] ]
In documentation, AS is still after TABLESAMPLE clause even
though you have already changed gram.y for the same.
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On 06/04/15 12:33, Amit Kapila wrote:
On Sat, Apr 4, 2015 at 8:25 PM, Petr Jelinek <petr@2ndquadrant.com
<mailto:petr@2ndquadrant.com>> wrote:Yes I want extensibility here. And I think the tablesample method
arguments are same thing as function arguments given that in the end
they are arguments for the init function of tablesampling method.I would be ok with just expr_list, naming parameters here isn't
overly important, but ability to have different types and numbers of
parameters for custom TABLESAMPLE methods *is* important.Yeah, named parameters is one reason which I think won't
be required for sample methods and neither the same is
mentioned in docs (if user has to use, what is the way he
can pass the same) and another is number of arguments
for sampling methods which is currently seems to be same
as FUNC_MAX_ARGS, I think that is sufficient but do we
want to support that many arguments for sampling method.
I think I'll go with simple list of a_exprs. The reason for that is that
I can foresee sampling methods that need multiple parameters, but I
don't think naming them is very important. Also adding support for
naming parameters can be done in the future if we decide so without
breaking compatibility. Side benefit is that it makes hinting about what
is wrong with input somewhat easier.
I don't think we need to come up with different limit from
FUNC_MAX_ARGS. I don't think any sampling method would need that many
parameters but I also don't see what would additional smaller limit give us.
2.
postgres=# explain update test_tablesample TABLESAMPLE system(30) set id
= 2;
ERROR: syntax error at or near "TABLESAMPLE"
LINE 1: explain update test_tablesample TABLESAMPLE system(30) set i...postgres=# Delete from test_tablesample TABLESAMPLE system(30);
ERROR: syntax error at or near "TABLESAMPLE"
LINE 1: Delete from test_tablesample TABLESAMPLE system(30);Isn't TABLESAMPLE clause suppose to work with Update/Delete
statements?It's supported in the FROM part of UPDATE and USING part of DELETE. I
think that that's sufficient.
But I think the Update on target table with sample scan is
supported via views which doesn't seem to be the right thing
in case you just want to support it via FROM/USING, examplepostgres=# create view vw_test As select * from test_tablesample
TABLESAMPLE sys
tem(30);
postgres=# explain update vw_test set id = 4;
QUERY PLAN
---------------------------------------------------------------------------
Update on test_tablesample (cost=0.00..4.04 rows=4 width=210)
-> Sample Scan on test_tablesample (cost=0.00..4.04 rows=4 width=210)
(2 rows)
Right, I'll make those views not auto-updatable.
Standard is somewhat useless for UPDATE and DELETE as it only defines
quite limited syntax there. From what I've seen when doing research
MSSQL also only supports it in their equivalent of FROM/USING list,
Oracle does not seem to support their SAMPLING clause outside of SELECTs
at all and if I got the cryptic DB2 manual correctly I think they don't
support it outside of (sub)SELECTs either.By the way, what is the usecase to support sample scan in
Update or Delete statement?
Well for the USING/FROM part the use-case is same as for SELECT -
providing sample of the data for the query (it can be useful also for
getting pseudo random rows fast). And if we didn't support it, it could
still be done using sub-select so why not have it directly.
Also, isn't it better to mention in the docs for Update and
Delete incase we are going to support tablesample clause
for them?
Most of other clauses that we support in FROM are not mentioned in
UPDATE/DELETE docs, both of those commands just say something like
"refer to the SELECT FROM docs for more info". Do you think TABLESAMPLE
deserves special treatment in this regard?
7.
ParseTableSample()
{
..
arg = coerce_to_specific_type(pstate, arg, INT4OID, "REPEATABLE");
..
}What is the reason for coercing value of REPEATABLE clause to INT4OID
when numeric value is expected for the clause. If user gives the
input value as -2.3, it will generate a seed which doesn't seem to
be right.Because the REPEATABLE is numeric expression so it can produce
whatever number but we need int4 internally (well float4 would also work
just the code would be slightly uglier).Okay, I understand that part. Here the real point is why not just expose
it as int4 to user rather than telling in docs that it is numeric and
actually we neither expect nor use it as numberic.Even Oracle supports supports it as int with below description.
The seed_value must be an integer between 0 and 4294967295
Well the thing with SQL Standard's "numeric value expression" is that it
actually does not mean numeric data type, it's just simple arithmetic
expression with some given rules (by the standard), but the output data
type can be either implementation specific approximate number or
implementation specific exact number (depending on inputs by standard's
definition, but meh). We support a_expr instead which gives much more
flexibility on input. For now I changed wording of the docs to say that
input is a number instead of using word numeric there.
And we do this type of coercion even for table data (you can insert
-2.3 into integer column and it will work) so I don't see what's wrong
with it here.I am not sure we can compare it with column of a table. I think we
can support it within a valid range (similar to tablesample method) and
if user inputs value outside the range, then return error.
But that's not what standard says, it says any numeric value expression
is valid. The fact that Oracle limits it to some range should not make
us do the same. I think most important thing here is that using -2.3
will produce same results if called repeatedly (if there are no changes
to data, vacuum etc). Yes passing -2 will produce same results, I don't
know if that is a problem. The main reason why I have the coercion there
is so that users don't have to explicitly typecast expression results.
8.
+DATA(insert OID = 3295 ( tsm_system_initPGNSP PGUID 12 1 0 0 0 f f f f
t f v 3 0 2278 "2281
23 700" _null_ _null_ _null_ _null_tsm_system_init _null_ _null__null_ ));
+DATA(insert OID = 3301 ( tsm_bernoulli_initPGNSP PGUID 12 1 0 0 0 f f
f f t f v 3 0 2278 "2281
23 700" _null_ _null_ _null_ _null_tsm_bernoulli_init _null_ _null_
_null_ ));Datatype for second argument is kept as 700 (Float4), shouldn't
it be 1700 (Numeric)?Why is that?
As we are exposing it as numeric.
See my comment for the REPEATABLE. Checking the docs, I actually wrote
there "floating point" so hopefully it's not confusing.
Given the sampling error I think the float4 is enough for specifying
the percentage and it makes the calculations much easier and faster than
dealing with Numeric would.Your explanation makes sense to me and we can leave it as it is.
Cool.
One more point:
- [ ONLY ] <replaceable class="parameter">table_name</replaceable> [ * ] [ [ AS ] <replaceable class="parameter">alias</replaceable> [ ( <replaceable class="parameter">column_alias</replaceable> [, ...] ) ] ] + [ ONLY ] <replaceable class="parameter">table_name</replaceable> [ * ] [ TABLESAMPLE <replaceable class="parameter">sampling_method</replaceable> ( <replaceable class="parameter">argument</replaceable> [, ...] ) [ REPEATABLE ( <replaceable class="parameter">seed</replaceable> ) ] ] [ [ AS ] <replaceable class="parameter">alias</replaceable> [ ( <replaceable class="parameter">column_alias</replaceable> [, ...] ) ] ]In documentation, AS is still after TABLESAMPLE clause even
though you have already changed gram.y for the same.
Ah right, thanks.
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 06/04/15 11:02, Simon Riggs wrote:
On 2 April 2015 at 17:36, Petr Jelinek <petr@2ndquadrant.com> wrote:
so here is version 11.
Looks great.
Comment on docs:
The SELECT docs refer only to SYSTEM and BERNOULLI. It doesn't mention
that if other methods are available they could be used also. The
phrasing was "sampling method can be one of <list>."
Will reword.
Are we ready for a final detailed review and commit?
I plan to send v12 in the evening with some additional changes that came
up from Amit's comments + some improvements to error reporting. I think
it will be ready then.
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Apr 6, 2015 at 5:56 PM, Petr Jelinek <petr@2ndquadrant.com> wrote:
On 06/04/15 12:33, Amit Kapila wrote:
But I think the Update on target table with sample scan is
supported via views which doesn't seem to be the right thing
in case you just want to support it via FROM/USING, examplepostgres=# create view vw_test As select * from test_tablesample
TABLESAMPLE sys
tem(30);
postgres=# explain update vw_test set id = 4;
QUERY PLAN
---------------------------------------------------------------------------
Update on test_tablesample (cost=0.00..4.04 rows=4 width=210)
-> Sample Scan on test_tablesample (cost=0.00..4.04 rows=4
width=210)
(2 rows)
Right, I'll make those views not auto-updatable.
Standard is somewhat useless for UPDATE and DELETE as it only defines
quite limited syntax there. From what I've seen when doing research
MSSQL also only supports it in their equivalent of FROM/USING list,
Oracle does not seem to support their SAMPLING clause outside of SELECTs
at all and if I got the cryptic DB2 manual correctly I think they don't
support it outside of (sub)SELECTs either.By the way, what is the usecase to support sample scan in
Update or Delete statement?Well for the USING/FROM part the use-case is same as for SELECT -
providing sample of the data for the query (it can be useful also for
getting pseudo random rows fast). And if we didn't support it, it could
still be done using sub-select so why not have it directly.
I can understand why someone wants to read sample data via
SELECT, but not clearly able to understand, why some one wants
to Update or Delete random data in table and if there is a valid
case, then why just based on sub-selects used in where clause
or table reference in FROM/USING list. Can't we keep it simple
such that either we support to Update/Delete based on Tablesample
clause or prohibit it in all cases?
Also, isn't it better to mention in the docs for Update and
Delete incase we are going to support tablesample clause
for them?Most of other clauses that we support in FROM are not mentioned in
UPDATE/DELETE docs, both of those commands just say something like "refer
to the SELECT FROM docs for more info". Do you think TABLESAMPLE deserves
special treatment in this regard?
Nothing too important, just as I got confused while using,
someone else can also get confused, but I think we can leave
it.
And we do this type of coercion even for table data (you can insert
-2.3 into integer column and it will work) so I don't see what's wrong
with it here.I am not sure we can compare it with column of a table. I think we
can support it within a valid range (similar to tablesample method) and
if user inputs value outside the range, then return error.But that's not what standard says, it says any numeric value expression
is valid. The fact that Oracle limits it to some range should not make us
do the same. I think most important thing here is that using -2.3 will
produce same results if called repeatedly (if there are no changes to data,
vacuum etc). Yes passing -2 will produce same results, I don't know if that
is a problem. The main reason why I have the coercion there is so that
users don't have to explicitly typecast expression results.
Actually, not a big point, but I felt it will be clear if there is a valid
range and actually we are not doing anything with negative (part)
of seed input by the user.
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On 06/04/15 15:07, Amit Kapila wrote:
On Mon, Apr 6, 2015 at 5:56 PM, Petr Jelinek <petr@2ndquadrant.com
<mailto:petr@2ndquadrant.com>> wrote:On 06/04/15 12:33, Amit Kapila wrote:
But I think the Update on target table with sample scan is
supported via views which doesn't seem to be the right thing
in case you just want to support it via FROM/USING, examplepostgres=# create view vw_test As select * from test_tablesample
TABLESAMPLE sys
tem(30);
postgres=# explain update vw_test set id = 4;
QUERY PLAN---------------------------------------------------------------------------
Update on test_tablesample (cost=0.00..4.04 rows=4 width=210)
-> Sample Scan on test_tablesample (cost=0.00..4.04 rows=4width=210)
(2 rows)
Right, I'll make those views not auto-updatable.
Standard is somewhat useless for UPDATE and DELETE as it only defines
quite limited syntax there. From what I've seen when doing research
MSSQL also only supports it in their equivalent of FROM/USING list,
Oracle does not seem to support their SAMPLING clause outside of SELECTs
at all and if I got the cryptic DB2 manual correctly I think they don't
support it outside of (sub)SELECTs either.By the way, what is the usecase to support sample scan in
Update or Delete statement?Well for the USING/FROM part the use-case is same as for SELECT -
providing sample of the data for the query (it can be useful also for
getting pseudo random rows fast). And if we didn't support it, it could
still be done using sub-select so why not have it directly.I can understand why someone wants to read sample data via
SELECT, but not clearly able to understand, why some one wants
to Update or Delete random data in table and if there is a valid
case, then why just based on sub-selects used in where clause
or table reference in FROM/USING list. Can't we keep it simple
such that either we support to Update/Delete based on Tablesample
clause or prohibit it in all cases?
Well, I don't understand why would somebody do it either, but then again
during research of this feature I've found questions on stack overflow
and similar sites about how to do it, so people must have use-cases.
And in any case as you say sub-select would work there so there is no
reason to explicitly disable it. Plus there is already difference
between what can be the target table in DELETE/UPDATE versus what can be
in the FROM/USING clause and I think the TABLESAMPLE behavior follows
that separation nicely - it's well demonstrated by the fact that we
would have to add explicit exception to some places in code to disallow it.
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 06/04/15 14:30, Petr Jelinek wrote:
On 06/04/15 11:02, Simon Riggs wrote:
Are we ready for a final detailed review and commit?
I plan to send v12 in the evening with some additional changes that came
up from Amit's comments + some improvements to error reporting. I think
it will be ready then.
Ok so here it is.
Changes vs v11:
- changed input parameter list to expr_list
- improved error reporting, particularly when input parameters are wrong
- fixed SELECT docs to show correct syntax and mention that there can be
more sampling methods
- added name of the sampling method to the explain output - I don't like
the code much there as it has to look into RTE but on the other hand I
don't want to create new scan node just so it can hold the name of the
sampling method for explain
- made views containing TABLESAMPLE clause not autoupdatable
- added PageIsAllVisible() check before trying to check for tuple visibility
- some typo/white space fixes
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
0001-separate-block-sampling-functions-v2.patchtext/x-diff; name=0001-separate-block-sampling-functions-v2.patchDownload
>From 788bb84c35b0ece6a7e287b67cb924b8af32723c Mon Sep 17 00:00:00 2001
From: Petr Jelinek <pjmodos@pjmodos.net>
Date: Wed, 7 Jan 2015 23:36:56 +0100
Subject: [PATCH 1/4] separate block sampling functions v2
---
contrib/file_fdw/file_fdw.c | 9 +-
contrib/postgres_fdw/postgres_fdw.c | 10 +-
src/backend/commands/analyze.c | 225 +----------------------------------
src/backend/utils/misc/Makefile | 2 +-
src/backend/utils/misc/sampling.c | 226 ++++++++++++++++++++++++++++++++++++
src/include/commands/vacuum.h | 3 -
src/include/utils/sampling.h | 44 +++++++
7 files changed, 287 insertions(+), 232 deletions(-)
create mode 100644 src/backend/utils/misc/sampling.c
create mode 100644 src/include/utils/sampling.h
diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index 4368897..249d541 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -34,6 +34,7 @@
#include "optimizer/var.h"
#include "utils/memutils.h"
#include "utils/rel.h"
+#include "utils/sampling.h"
PG_MODULE_MAGIC;
@@ -1005,7 +1006,7 @@ file_acquire_sample_rows(Relation onerel, int elevel,
{
int numrows = 0;
double rowstoskip = -1; /* -1 means not set yet */
- double rstate;
+ ReservoirStateData rstate;
TupleDesc tupDesc;
Datum *values;
bool *nulls;
@@ -1043,7 +1044,7 @@ file_acquire_sample_rows(Relation onerel, int elevel,
ALLOCSET_DEFAULT_MAXSIZE);
/* Prepare for sampling rows */
- rstate = anl_init_selection_state(targrows);
+ reservoir_init_selection_state(&rstate, targrows);
/* Set up callback to identify error line number. */
errcallback.callback = CopyFromErrorCallback;
@@ -1087,7 +1088,7 @@ file_acquire_sample_rows(Relation onerel, int elevel,
* not-yet-incremented value of totalrows as t.
*/
if (rowstoskip < 0)
- rowstoskip = anl_get_next_S(*totalrows, targrows, &rstate);
+ rowstoskip = reservoir_get_next_S(&rstate, *totalrows, targrows);
if (rowstoskip <= 0)
{
@@ -1095,7 +1096,7 @@ file_acquire_sample_rows(Relation onerel, int elevel,
* Found a suitable tuple, so save it, replacing one old tuple
* at random
*/
- int k = (int) (targrows * anl_random_fract());
+ int k = (int) (targrows * sampler_random_fract());
Assert(k >= 0 && k < targrows);
heap_freetuple(rows[k]);
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 478e124..74ef792 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -37,6 +37,7 @@
#include "utils/lsyscache.h"
#include "utils/memutils.h"
#include "utils/rel.h"
+#include "utils/sampling.h"
PG_MODULE_MAGIC;
@@ -202,7 +203,7 @@ typedef struct PgFdwAnalyzeState
/* for random sampling */
double samplerows; /* # of rows fetched */
double rowstoskip; /* # of rows to skip before next sample */
- double rstate; /* random state */
+ ReservoirStateData rstate; /* state for reservoir sampling*/
/* working memory contexts */
MemoryContext anl_cxt; /* context for per-analyze lifespan data */
@@ -2397,7 +2398,7 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,
astate.numrows = 0;
astate.samplerows = 0;
astate.rowstoskip = -1; /* -1 means not set yet */
- astate.rstate = anl_init_selection_state(targrows);
+ reservoir_init_selection_state(&astate.rstate, targrows);
/* Remember ANALYZE context, and create a per-tuple temp context */
astate.anl_cxt = CurrentMemoryContext;
@@ -2537,13 +2538,12 @@ analyze_row_processor(PGresult *res, int row, PgFdwAnalyzeState *astate)
* analyze.c; see Jeff Vitter's paper.
*/
if (astate->rowstoskip < 0)
- astate->rowstoskip = anl_get_next_S(astate->samplerows, targrows,
- &astate->rstate);
+ astate->rowstoskip = reservoir_get_next_S(&astate->rstate, astate->samplerows, targrows);
if (astate->rowstoskip <= 0)
{
/* Choose a random reservoir element to replace. */
- pos = (int) (targrows * anl_random_fract());
+ pos = (int) (targrows * sampler_random_fract());
Assert(pos >= 0 && pos < targrows);
heap_freetuple(astate->rows[pos]);
}
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 15ec0ad..952cf20 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -50,23 +50,13 @@
#include "utils/lsyscache.h"
#include "utils/memutils.h"
#include "utils/pg_rusage.h"
+#include "utils/sampling.h"
#include "utils/sortsupport.h"
#include "utils/syscache.h"
#include "utils/timestamp.h"
#include "utils/tqual.h"
-/* Data structure for Algorithm S from Knuth 3.4.2 */
-typedef struct
-{
- BlockNumber N; /* number of blocks, known in advance */
- int n; /* desired sample size */
- BlockNumber t; /* current block number */
- int m; /* blocks selected so far */
-} BlockSamplerData;
-
-typedef BlockSamplerData *BlockSampler;
-
/* Per-index data for ANALYZE */
typedef struct AnlIndexData
{
@@ -89,10 +79,6 @@ static void do_analyze_rel(Relation onerel, int options,
VacuumParams *params, List *va_cols,
AcquireSampleRowsFunc acquirefunc, BlockNumber relpages,
bool inh, bool in_outer_xact, int elevel);
-static void BlockSampler_Init(BlockSampler bs, BlockNumber nblocks,
- int samplesize);
-static bool BlockSampler_HasMore(BlockSampler bs);
-static BlockNumber BlockSampler_Next(BlockSampler bs);
static void compute_index_stats(Relation onerel, double totalrows,
AnlIndexData *indexdata, int nindexes,
HeapTuple *rows, int numrows,
@@ -951,94 +937,6 @@ examine_attribute(Relation onerel, int attnum, Node *index_expr)
}
/*
- * BlockSampler_Init -- prepare for random sampling of blocknumbers
- *
- * BlockSampler is used for stage one of our new two-stage tuple
- * sampling mechanism as discussed on pgsql-hackers 2004-04-02 (subject
- * "Large DB"). It selects a random sample of samplesize blocks out of
- * the nblocks blocks in the table. If the table has less than
- * samplesize blocks, all blocks are selected.
- *
- * Since we know the total number of blocks in advance, we can use the
- * straightforward Algorithm S from Knuth 3.4.2, rather than Vitter's
- * algorithm.
- */
-static void
-BlockSampler_Init(BlockSampler bs, BlockNumber nblocks, int samplesize)
-{
- bs->N = nblocks; /* measured table size */
-
- /*
- * If we decide to reduce samplesize for tables that have less or not much
- * more than samplesize blocks, here is the place to do it.
- */
- bs->n = samplesize;
- bs->t = 0; /* blocks scanned so far */
- bs->m = 0; /* blocks selected so far */
-}
-
-static bool
-BlockSampler_HasMore(BlockSampler bs)
-{
- return (bs->t < bs->N) && (bs->m < bs->n);
-}
-
-static BlockNumber
-BlockSampler_Next(BlockSampler bs)
-{
- BlockNumber K = bs->N - bs->t; /* remaining blocks */
- int k = bs->n - bs->m; /* blocks still to sample */
- double p; /* probability to skip block */
- double V; /* random */
-
- Assert(BlockSampler_HasMore(bs)); /* hence K > 0 and k > 0 */
-
- if ((BlockNumber) k >= K)
- {
- /* need all the rest */
- bs->m++;
- return bs->t++;
- }
-
- /*----------
- * It is not obvious that this code matches Knuth's Algorithm S.
- * Knuth says to skip the current block with probability 1 - k/K.
- * If we are to skip, we should advance t (hence decrease K), and
- * repeat the same probabilistic test for the next block. The naive
- * implementation thus requires an anl_random_fract() call for each block
- * number. But we can reduce this to one anl_random_fract() call per
- * selected block, by noting that each time the while-test succeeds,
- * we can reinterpret V as a uniform random number in the range 0 to p.
- * Therefore, instead of choosing a new V, we just adjust p to be
- * the appropriate fraction of its former value, and our next loop
- * makes the appropriate probabilistic test.
- *
- * We have initially K > k > 0. If the loop reduces K to equal k,
- * the next while-test must fail since p will become exactly zero
- * (we assume there will not be roundoff error in the division).
- * (Note: Knuth suggests a "<=" loop condition, but we use "<" just
- * to be doubly sure about roundoff error.) Therefore K cannot become
- * less than k, which means that we cannot fail to select enough blocks.
- *----------
- */
- V = anl_random_fract();
- p = 1.0 - (double) k / (double) K;
- while (V < p)
- {
- /* skip */
- bs->t++;
- K--; /* keep K == N - t */
-
- /* adjust p to be new cutoff point in reduced range */
- p *= 1.0 - (double) k / (double) K;
- }
-
- /* select */
- bs->m++;
- return bs->t++;
-}
-
-/*
* acquire_sample_rows -- acquire a random sample of rows from the table
*
* Selected rows are returned in the caller-allocated array rows[], which
@@ -1084,7 +982,7 @@ acquire_sample_rows(Relation onerel, int elevel,
BlockNumber totalblocks;
TransactionId OldestXmin;
BlockSamplerData bs;
- double rstate;
+ ReservoirStateData rstate;
Assert(targrows > 0);
@@ -1094,9 +992,9 @@ acquire_sample_rows(Relation onerel, int elevel,
OldestXmin = GetOldestXmin(onerel, true);
/* Prepare for sampling block numbers */
- BlockSampler_Init(&bs, totalblocks, targrows);
+ BlockSampler_Init(&bs, totalblocks, targrows, random());
/* Prepare for sampling rows */
- rstate = anl_init_selection_state(targrows);
+ reservoir_init_selection_state(&rstate, targrows);
/* Outer loop over blocks to sample */
while (BlockSampler_HasMore(&bs))
@@ -1244,8 +1142,7 @@ acquire_sample_rows(Relation onerel, int elevel,
* t.
*/
if (rowstoskip < 0)
- rowstoskip = anl_get_next_S(samplerows, targrows,
- &rstate);
+ rowstoskip = reservoir_get_next_S(&rstate, samplerows, targrows);
if (rowstoskip <= 0)
{
@@ -1253,7 +1150,7 @@ acquire_sample_rows(Relation onerel, int elevel,
* Found a suitable tuple, so save it, replacing one
* old tuple at random
*/
- int k = (int) (targrows * anl_random_fract());
+ int k = (int) (targrows * sampler_random_fract());
Assert(k >= 0 && k < targrows);
heap_freetuple(rows[k]);
@@ -1312,116 +1209,6 @@ acquire_sample_rows(Relation onerel, int elevel,
return numrows;
}
-/* Select a random value R uniformly distributed in (0 - 1) */
-double
-anl_random_fract(void)
-{
- return ((double) random() + 1) / ((double) MAX_RANDOM_VALUE + 2);
-}
-
-/*
- * These two routines embody Algorithm Z from "Random sampling with a
- * reservoir" by Jeffrey S. Vitter, in ACM Trans. Math. Softw. 11, 1
- * (Mar. 1985), Pages 37-57. Vitter describes his algorithm in terms
- * of the count S of records to skip before processing another record.
- * It is computed primarily based on t, the number of records already read.
- * The only extra state needed between calls is W, a random state variable.
- *
- * anl_init_selection_state computes the initial W value.
- *
- * Given that we've already read t records (t >= n), anl_get_next_S
- * determines the number of records to skip before the next record is
- * processed.
- */
-double
-anl_init_selection_state(int n)
-{
- /* Initial value of W (for use when Algorithm Z is first applied) */
- return exp(-log(anl_random_fract()) / n);
-}
-
-double
-anl_get_next_S(double t, int n, double *stateptr)
-{
- double S;
-
- /* The magic constant here is T from Vitter's paper */
- if (t <= (22.0 * n))
- {
- /* Process records using Algorithm X until t is large enough */
- double V,
- quot;
-
- V = anl_random_fract(); /* Generate V */
- S = 0;
- t += 1;
- /* Note: "num" in Vitter's code is always equal to t - n */
- quot = (t - (double) n) / t;
- /* Find min S satisfying (4.1) */
- while (quot > V)
- {
- S += 1;
- t += 1;
- quot *= (t - (double) n) / t;
- }
- }
- else
- {
- /* Now apply Algorithm Z */
- double W = *stateptr;
- double term = t - (double) n + 1;
-
- for (;;)
- {
- double numer,
- numer_lim,
- denom;
- double U,
- X,
- lhs,
- rhs,
- y,
- tmp;
-
- /* Generate U and X */
- U = anl_random_fract();
- X = t * (W - 1.0);
- S = floor(X); /* S is tentatively set to floor(X) */
- /* Test if U <= h(S)/cg(X) in the manner of (6.3) */
- tmp = (t + 1) / term;
- lhs = exp(log(((U * tmp * tmp) * (term + S)) / (t + X)) / n);
- rhs = (((t + X) / (term + S)) * term) / t;
- if (lhs <= rhs)
- {
- W = rhs / lhs;
- break;
- }
- /* Test if U <= f(S)/cg(X) */
- y = (((U * (t + 1)) / term) * (t + S + 1)) / (t + X);
- if ((double) n < S)
- {
- denom = t;
- numer_lim = term + S;
- }
- else
- {
- denom = t - (double) n + S;
- numer_lim = t + 1;
- }
- for (numer = t + S; numer >= numer_lim; numer -= 1)
- {
- y *= numer / denom;
- denom -= 1;
- }
- W = exp(-log(anl_random_fract()) / n); /* Generate W in advance */
- if (exp(log(y) / n) <= (t + X) / t)
- break;
- }
- *stateptr = W;
- }
- return S;
-}
-
/*
* qsort comparator for sorting rows[] array
*/
diff --git a/src/backend/utils/misc/Makefile b/src/backend/utils/misc/Makefile
index 378b77e..7889101 100644
--- a/src/backend/utils/misc/Makefile
+++ b/src/backend/utils/misc/Makefile
@@ -15,7 +15,7 @@ include $(top_builddir)/src/Makefile.global
override CPPFLAGS := -I. -I$(srcdir) $(CPPFLAGS)
OBJS = guc.o help_config.o pg_rusage.o ps_status.o rls.o \
- superuser.o timeout.o tzparser.o
+ sampling.o superuser.o timeout.o tzparser.o
# This location might depend on the installation directories. Therefore
# we can't subsitute it into pg_config.h.
diff --git a/src/backend/utils/misc/sampling.c b/src/backend/utils/misc/sampling.c
new file mode 100644
index 0000000..1eeabaf
--- /dev/null
+++ b/src/backend/utils/misc/sampling.c
@@ -0,0 +1,226 @@
+/*-------------------------------------------------------------------------
+ *
+ * sampling.c
+ * Relation block sampling routines.
+ *
+ * Portions Copyright (c) 1996-2012, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ * src/backend/utils/misc/sampling.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include <math.h>
+
+#include "utils/sampling.h"
+
+
+/*
+ * BlockSampler_Init -- prepare for random sampling of blocknumbers
+ *
+ * BlockSampler provides algorithm for block level sampling of a relation
+ * as discussed on pgsql-hackers 2004-04-02 (subject "Large DB")
+ * It selects a random sample of samplesize blocks out of
+ * the nblocks blocks in the table. If the table has less than
+ * samplesize blocks, all blocks are selected.
+ *
+ * Since we know the total number of blocks in advance, we can use the
+ * straightforward Algorithm S from Knuth 3.4.2, rather than Vitter's
+ * algorithm.
+ */
+void
+BlockSampler_Init(BlockSampler bs, BlockNumber nblocks, int samplesize,
+ long randseed)
+{
+ bs->N = nblocks; /* measured table size */
+
+ /*
+ * If we decide to reduce samplesize for tables that have less or not much
+ * more than samplesize blocks, here is the place to do it.
+ */
+ bs->n = samplesize;
+ bs->t = 0; /* blocks scanned so far */
+ bs->m = 0; /* blocks selected so far */
+}
+
+bool
+BlockSampler_HasMore(BlockSampler bs)
+{
+ return (bs->t < bs->N) && (bs->m < bs->n);
+}
+
+BlockNumber
+BlockSampler_Next(BlockSampler bs)
+{
+ BlockNumber K = bs->N - bs->t; /* remaining blocks */
+ int k = bs->n - bs->m; /* blocks still to sample */
+ double p; /* probability to skip block */
+ double V; /* random */
+
+ Assert(BlockSampler_HasMore(bs)); /* hence K > 0 and k > 0 */
+
+ if ((BlockNumber) k >= K)
+ {
+ /* need all the rest */
+ bs->m++;
+ return bs->t++;
+ }
+
+ /*----------
+ * It is not obvious that this code matches Knuth's Algorithm S.
+ * Knuth says to skip the current block with probability 1 - k/K.
+ * If we are to skip, we should advance t (hence decrease K), and
+ * repeat the same probabilistic test for the next block. The naive
+ * implementation thus requires an sampler_random_fract() call for each
+ * block number. But we can reduce this to one sampler_random_fract()
+ * call per selected block, by noting that each time the while-test
+ * succeeds, we can reinterpret V as a uniform random number in the range
+ * 0 to p. Therefore, instead of choosing a new V, we just adjust p to be
+ * the appropriate fraction of its former value, and our next loop
+ * makes the appropriate probabilistic test.
+ *
+ * We have initially K > k > 0. If the loop reduces K to equal k,
+ * the next while-test must fail since p will become exactly zero
+ * (we assume there will not be roundoff error in the division).
+ * (Note: Knuth suggests a "<=" loop condition, but we use "<" just
+ * to be doubly sure about roundoff error.) Therefore K cannot become
+ * less than k, which means that we cannot fail to select enough blocks.
+ *----------
+ */
+ V = sampler_random_fract();
+ p = 1.0 - (double) k / (double) K;
+ while (V < p)
+ {
+ /* skip */
+ bs->t++;
+ K--; /* keep K == N - t */
+
+ /* adjust p to be new cutoff point in reduced range */
+ p *= 1.0 - (double) k / (double) K;
+ }
+
+ /* select */
+ bs->m++;
+ return bs->t++;
+}
+
+/*
+ * These two routines embody Algorithm Z from "Random sampling with a
+ * reservoir" by Jeffrey S. Vitter, in ACM Trans. Math. Softw. 11, 1
+ * (Mar. 1985), Pages 37-57. Vitter describes his algorithm in terms
+ * of the count S of records to skip before processing another record.
+ * It is computed primarily based on t, the number of records already read.
+ * The only extra state needed between calls is W, a random state variable.
+ *
+ * reservoir_init_selection_state computes the initial W value.
+ *
+ * Given that we've already read t records (t >= n), reservoir_get_next_S
+ * determines the number of records to skip before the next record is
+ * processed.
+ */
+void
+reservoir_init_selection_state(ReservoirState rs, int n)
+{
+ /* Initial value of W (for use when Algorithm Z is first applied) */
+ *rs = exp(-log(sampler_random_fract()) / n);
+}
+
+double
+reservoir_get_next_S(ReservoirState rs, double t, int n)
+{
+ double S;
+
+ /* The magic constant here is T from Vitter's paper */
+ if (t <= (22.0 * n))
+ {
+ /* Process records using Algorithm X until t is large enough */
+ double V,
+ quot;
+
+ V = sampler_random_fract(); /* Generate V */
+ S = 0;
+ t += 1;
+ /* Note: "num" in Vitter's code is always equal to t - n */
+ quot = (t - (double) n) / t;
+ /* Find min S satisfying (4.1) */
+ while (quot > V)
+ {
+ S += 1;
+ t += 1;
+ quot *= (t - (double) n) / t;
+ }
+ }
+ else
+ {
+ /* Now apply Algorithm Z */
+ double W = *rs;
+ double term = t - (double) n + 1;
+
+ for (;;)
+ {
+ double numer,
+ numer_lim,
+ denom;
+ double U,
+ X,
+ lhs,
+ rhs,
+ y,
+ tmp;
+
+ /* Generate U and X */
+ U = sampler_random_fract();
+ X = t * (W - 1.0);
+ S = floor(X); /* S is tentatively set to floor(X) */
+ /* Test if U <= h(S)/cg(X) in the manner of (6.3) */
+ tmp = (t + 1) / term;
+ lhs = exp(log(((U * tmp * tmp) * (term + S)) / (t + X)) / n);
+ rhs = (((t + X) / (term + S)) * term) / t;
+ if (lhs <= rhs)
+ {
+ W = rhs / lhs;
+ break;
+ }
+ /* Test if U <= f(S)/cg(X) */
+ y = (((U * (t + 1)) / term) * (t + S + 1)) / (t + X);
+ if ((double) n < S)
+ {
+ denom = t;
+ numer_lim = term + S;
+ }
+ else
+ {
+ denom = t - (double) n + S;
+ numer_lim = t + 1;
+ }
+ for (numer = t + S; numer >= numer_lim; numer -= 1)
+ {
+ y *= numer / denom;
+ denom -= 1;
+ }
+ W = exp(-log(sampler_random_fract()) / n); /* Generate W in advance */
+ if (exp(log(y) / n) <= (t + X) / t)
+ break;
+ }
+ *rs = W;
+ }
+ return S;
+}
+
+
+/*----------
+ * Random number generator used by sampling
+ *----------
+ */
+
+/* Select a random value R uniformly distributed in (0 - 1) */
+double
+sampler_random_fract()
+{
+ return ((double) random() + 1) / ((double) MAX_RANDOM_VALUE + 2);
+}
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index 71f0165..ce7b28d 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -197,8 +197,5 @@ extern void analyze_rel(Oid relid, RangeVar *relation, int options,
VacuumParams *params, List *va_cols, bool in_outer_xact,
BufferAccessStrategy bstrategy);
extern bool std_typanalyze(VacAttrStats *stats);
-extern double anl_random_fract(void);
-extern double anl_init_selection_state(int n);
-extern double anl_get_next_S(double t, int n, double *stateptr);
#endif /* VACUUM_H */
diff --git a/src/include/utils/sampling.h b/src/include/utils/sampling.h
new file mode 100644
index 0000000..e3e7f9c
--- /dev/null
+++ b/src/include/utils/sampling.h
@@ -0,0 +1,44 @@
+/*-------------------------------------------------------------------------
+ *
+ * sampling.h
+ * definitions for sampling functions
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/utils/sampling.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef SAMPLING_H
+#define SAMPLING_H
+
+#include "storage/bufmgr.h"
+
+extern double sampler_random_fract(void);
+
+/* Block sampling methods */
+/* Data structure for Algorithm S from Knuth 3.4.2 */
+typedef struct
+{
+ BlockNumber N; /* number of blocks, known in advance */
+ int n; /* desired sample size */
+ BlockNumber t; /* current block number */
+ int m; /* blocks selected so far */
+} BlockSamplerData;
+
+typedef BlockSamplerData *BlockSampler;
+
+extern void BlockSampler_Init(BlockSampler bs, BlockNumber nblocks,
+ int samplesize, long randseed);
+extern bool BlockSampler_HasMore(BlockSampler bs);
+extern BlockNumber BlockSampler_Next(BlockSampler bs);
+
+/* Reservoid sampling methods */
+typedef double ReservoirStateData;
+typedef ReservoirStateData *ReservoirState;
+
+extern void reservoir_init_selection_state(ReservoirState rs, int n);
+extern double reservoir_get_next_S(ReservoirState rs, double t, int n);
+
+#endif /* SAMPLING_H */
--
1.9.1
0002-tablesample-v12.patchtext/x-diff; name=0002-tablesample-v12.patchDownload
>From c17558e5da82b2b94460f6889510d54cdae16b16 Mon Sep 17 00:00:00 2001
From: Petr Jelinek <pjmodos@pjmodos.net>
Date: Wed, 7 Jan 2015 23:37:55 +0100
Subject: [PATCH 2/4] tablesample v12
---
contrib/file_fdw/file_fdw.c | 2 +-
contrib/postgres_fdw/postgres_fdw.c | 2 +-
doc/src/sgml/catalogs.sgml | 120 ++++++
doc/src/sgml/ref/select.sgml | 44 ++-
src/backend/access/Makefile | 3 +-
src/backend/access/heap/heapam.c | 43 ++-
src/backend/catalog/Makefile | 2 +-
src/backend/commands/analyze.c | 2 +-
src/backend/commands/explain.c | 13 +
src/backend/executor/Makefile | 2 +-
src/backend/executor/execAmi.c | 8 +
src/backend/executor/execCurrent.c | 1 +
src/backend/executor/execProcnode.c | 14 +
src/backend/executor/nodeSamplescan.c | 562 ++++++++++++++++++++++++++++
src/backend/nodes/copyfuncs.c | 60 +++
src/backend/nodes/equalfuncs.c | 37 ++
src/backend/nodes/nodeFuncs.c | 12 +
src/backend/nodes/outfuncs.c | 48 +++
src/backend/nodes/readfuncs.c | 45 +++
src/backend/optimizer/path/allpaths.c | 49 +++
src/backend/optimizer/path/costsize.c | 68 ++++
src/backend/optimizer/plan/createplan.c | 69 ++++
src/backend/optimizer/plan/setrefs.c | 11 +
src/backend/optimizer/plan/subselect.c | 1 +
src/backend/optimizer/util/pathnode.c | 22 ++
src/backend/parser/gram.y | 36 +-
src/backend/parser/parse_clause.c | 47 +++
src/backend/parser/parse_func.c | 143 +++++++
src/backend/rewrite/rewriteHandler.c | 3 +
src/backend/utils/Makefile | 3 +-
src/backend/utils/adt/ruleutils.c | 50 +++
src/backend/utils/cache/lsyscache.c | 27 ++
src/backend/utils/cache/syscache.c | 23 ++
src/backend/utils/misc/sampling.c | 33 +-
src/backend/utils/tablesample/Makefile | 17 +
src/backend/utils/tablesample/bernoulli.c | 224 +++++++++++
src/backend/utils/tablesample/system.c | 185 +++++++++
src/include/access/heapam.h | 4 +
src/include/access/relscan.h | 1 +
src/include/catalog/indexing.h | 5 +
src/include/catalog/pg_proc.h | 25 ++
src/include/catalog/pg_tablesample_method.h | 78 ++++
src/include/executor/nodeSamplescan.h | 24 ++
src/include/nodes/execnodes.h | 18 +
src/include/nodes/nodes.h | 4 +
src/include/nodes/parsenodes.h | 36 ++
src/include/nodes/plannodes.h | 6 +
src/include/optimizer/cost.h | 1 +
src/include/optimizer/pathnode.h | 2 +
src/include/parser/kwlist.h | 1 +
src/include/parser/parse_func.h | 5 +
src/include/utils/lsyscache.h | 1 +
src/include/utils/rel.h | 1 -
src/include/utils/sampling.h | 15 +-
src/include/utils/syscache.h | 2 +
src/include/utils/tablesample.h | 27 ++
src/test/regress/expected/sanity_check.out | 1 +
src/test/regress/expected/tablesample.out | 170 +++++++++
src/test/regress/parallel_schedule | 2 +-
src/test/regress/serial_schedule | 1 +
src/test/regress/sql/tablesample.sql | 42 +++
61 files changed, 2467 insertions(+), 36 deletions(-)
create mode 100644 src/backend/executor/nodeSamplescan.c
create mode 100644 src/backend/utils/tablesample/Makefile
create mode 100644 src/backend/utils/tablesample/bernoulli.c
create mode 100644 src/backend/utils/tablesample/system.c
create mode 100644 src/include/catalog/pg_tablesample_method.h
create mode 100644 src/include/executor/nodeSamplescan.h
create mode 100644 src/include/utils/tablesample.h
create mode 100644 src/test/regress/expected/tablesample.out
create mode 100644 src/test/regress/sql/tablesample.sql
diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index 249d541..6a813a3 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -1096,7 +1096,7 @@ file_acquire_sample_rows(Relation onerel, int elevel,
* Found a suitable tuple, so save it, replacing one old tuple
* at random
*/
- int k = (int) (targrows * sampler_random_fract());
+ int k = (int) (targrows * sampler_random_fract(rstate.randstate));
Assert(k >= 0 && k < targrows);
heap_freetuple(rows[k]);
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 74ef792..5903384 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -2543,7 +2543,7 @@ analyze_row_processor(PGresult *res, int row, PgFdwAnalyzeState *astate)
if (astate->rowstoskip <= 0)
{
/* Choose a random reservoir element to replace. */
- pos = (int) (targrows * sampler_random_fract());
+ pos = (int) (targrows * sampler_random_fract(astate->rstate.randstate));
Assert(pos >= 0 && pos < targrows);
heap_freetuple(astate->rows[pos]);
}
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index d0b78f2..af808a0 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -269,6 +269,11 @@
</row>
<row>
+ <entry><link linkend="catalog-pg-tablesample-method"><structname>pg_tablesample_method</structname></link></entry>
+ <entry>table sampling methods</entry>
+ </row>
+
+ <row>
<entry><link linkend="catalog-pg-tablespace"><structname>pg_tablespace</structname></link></entry>
<entry>tablespaces within this database cluster</entry>
</row>
@@ -5980,6 +5985,121 @@
</sect1>
+ <sect1 id="catalog-pg-tablesample-method">
+ <title><structname>pg_tabesample_method</structname></title>
+
+ <indexterm zone="catalog-pg-tablesample-method">
+ <primary>pg_am</primary>
+ </indexterm>
+
+ <para>
+ The catalog <structname>pg_tablesample_method</structname> stores
+ information about table sampling methods which can be used in
+ <command>TABLESAMPLE</command> clause of a <command>SELECT</command>
+ statement.
+ </para>
+
+ <table>
+ <title><structname>pg_tablesample_method</> Columns</title>
+
+ <tgroup cols="4">
+ <thead>
+ <row>
+ <entry>Name</entry>
+ <entry>Type</entry>
+ <entry>References</entry>
+ <entry>Description</entry>
+ </row>
+ </thead>
+ <tbody>
+
+ <row>
+ <entry><structfield>oid</structfield></entry>
+ <entry><type>oid</type></entry>
+ <entry></entry>
+ <entry>Row identifier (hidden attribute; must be explicitly selected)</entry>
+ </row>
+
+ <row>
+ <entry><structfield>tsmname</structfield></entry>
+ <entry><type>name</type></entry>
+ <entry></entry>
+ <entry>Name of the sampling method</entry>
+ </row>
+
+ <row>
+ <entry><structfield>tsmseqscan</structfield></entry>
+ <entry><type>bool</type></entry>
+ <entry></entry>
+ <entry>Does the sampling method scan the whole table sequentially?
+ </entry>
+ </row>
+
+ <row>
+ <entry><structfield>tsmpagemode</structfield></entry>
+ <entry><type>bool</type></entry>
+ <entry></entry>
+ <entry>Does the sampling method always read whole pages?
+ </entry>
+ </row>
+
+ <row>
+ <entry><structfield>tsminit</structfield></entry>
+ <entry><type>regproc</type></entry>
+ <entry><literal><link linkend="catalog-pg-proc"><structname>pg_proc</structname></link>.oid</literal></entry>
+ <entry><quote>Initialize the sampling scan</quote> function</entry>
+ </row>
+
+ <row>
+ <entry><structfield>tsmnextblock</structfield></entry>
+ <entry><type>regproc</type></entry>
+ <entry><literal><link linkend="catalog-pg-proc"><structname>pg_proc</structname></link>.oid</literal></entry>
+ <entry><quote>Get next block number</quote> function</entry>
+ </row>
+
+ <row>
+ <entry><structfield>tsmnexttuple</structfield></entry>
+ <entry><type>regproc</type></entry>
+ <entry><literal><link linkend="catalog-pg-proc"><structname>pg_proc</structname></link>.oid</literal></entry>
+ <entry><quote>Get next tuple offset</quote> function</entry>
+ </row>
+
+ <row>
+ <entry><structfield>tsmexaminetuple</structfield></entry>
+ <entry><type>regproc</type></entry>
+ <entry><literal><link linkend="catalog-pg-proc"><structname>pg_proc</structname></link>.oid</literal></entry>
+ <entry>Function which examines the tuple contents and decides if to
+ return in, or zero if none</entry>
+ </row>
+
+ <row>
+ <entry><structfield>tsmend</structfield></entry>
+ <entry><type>regproc</type></entry>
+ <entry><literal><link linkend="catalog-pg-proc"><structname>pg_proc</structname></link>.oid</literal></entry>
+ <entry><quote>End the sampling scan</quote> function</entry>
+ </row>
+
+ <row>
+ <entry><structfield>tsmreset</structfield></entry>
+ <entry><type>regproc</type></entry>
+ <entry><literal><link linkend="catalog-pg-proc"><structname>pg_proc</structname></link>.oid</literal></entry>
+ <entry><quote>Restart the state of sampling scan</quote> function</entry>
+ </row>
+
+ <row>
+ <entry><structfield>tsmcost</structfield></entry>
+ <entry><type>regproc</type></entry>
+ <entry><literal><link linkend="catalog-pg-proc"><structname>pg_proc</structname></link>.oid</literal></entry>
+ <entry>Costing function</entry>
+ </row>
+
+ </tbody>
+ </tgroup>
+ </table>
+
+ </sect1>
+
+
<sect1 id="catalog-pg-tablespace">
<title><structname>pg_tablespace</structname></title>
diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml
index 2295f63..f70d8c4 100644
--- a/doc/src/sgml/ref/select.sgml
+++ b/doc/src/sgml/ref/select.sgml
@@ -49,7 +49,7 @@ SELECT [ ALL | DISTINCT [ ON ( <replaceable class="parameter">expression</replac
<phrase>where <replaceable class="parameter">from_item</replaceable> can be one of:</phrase>
- [ ONLY ] <replaceable class="parameter">table_name</replaceable> [ * ] [ [ AS ] <replaceable class="parameter">alias</replaceable> [ ( <replaceable class="parameter">column_alias</replaceable> [, ...] ) ] ]
+ [ ONLY ] <replaceable class="parameter">table_name</replaceable> [ * ] [ [ AS ] <replaceable class="parameter">alias</replaceable> [ ( <replaceable class="parameter">column_alias</replaceable> [, ...] ) ] ] [ TABLESAMPLE <replaceable class="parameter">sampling_method</replaceable> ( <replaceable class="parameter">argument</replaceable> [, ...] ) [ REPEATABLE ( <replaceable class="parameter">seed</replaceable> ) ] ]
[ LATERAL ] ( <replaceable class="parameter">select</replaceable> ) [ AS ] <replaceable class="parameter">alias</replaceable> [ ( <replaceable class="parameter">column_alias</replaceable> [, ...] ) ]
<replaceable class="parameter">with_query_name</replaceable> [ [ AS ] <replaceable class="parameter">alias</replaceable> [ ( <replaceable class="parameter">column_alias</replaceable> [, ...] ) ] ]
[ LATERAL ] <replaceable class="parameter">function_name</replaceable> ( [ <replaceable class="parameter">argument</replaceable> [, ...] ] )
@@ -317,6 +317,48 @@ TABLE [ ONLY ] <replaceable class="parameter">table_name</replaceable> [ * ]
</varlistentry>
<varlistentry>
+ <term>TABLESAMPLE <replaceable class="parameter">sampling_method</replaceable> ( <replaceable class="parameter">argument</replaceable> [, ...] ) [ REPEATABLE ( <replaceable class="parameter">seed</replaceable> ) ]</term>
+ <listitem>
+ <para>
+ Table sample clause after
+ <replaceable class="parameter">table_name</replaceable> indicates that
+ a <replaceable class="parameter">sampling_method</replaceable> should
+ be used to retrieve subset of rows in the table.
+ The <replaceable class="parameter">sampling_method</replaceable> can be
+ any sampling method installed in the database. There are currently two
+ sampling methods available in the standard
+ <productname>PostgreSQL</productname> distrribution:
+ <itemizedlist>
+ <listitem>
+ <para><literal>SYSTEM</literal></para>
+ </listitem>
+ <listitem>
+ <para><literal>BERNOULLI</literal></para>
+ </listitem>
+ </itemizedlist>
+ Both of those sampling methods currently accept only single argument
+ which is the percent (floating point from 0 to 100) of the rows to
+ be returned.
+ The <literal>SYSTEM</literal> sampling method does block level
+ sampling with each block having same chance of being selected and
+ returns all rows from each selected block.
+ The <literal>BERNOULLI</literal> scans whole table and returns
+ individual rows with equal probability. Additional sampling methods
+ may be installed in the database via extensions.
+ </para>
+ <para>
+ The optional parameter <literal>REPEATABLE</literal> acceps any number
+ or expression producing a number and is used as random seed for
+ sampling. Note that subsequent commands may return different results
+ even if same <literal>REPEATABLE</literal> clause was specified. This
+ happens because <acronym>DML</acronym> statements and maintenance
+ operations such as <command>VACUUM</> may affect physical distribution
+ of data.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><replaceable class="parameter">alias</replaceable></term>
<listitem>
<para>
diff --git a/src/backend/access/Makefile b/src/backend/access/Makefile
index 21721b4..238057a 100644
--- a/src/backend/access/Makefile
+++ b/src/backend/access/Makefile
@@ -8,6 +8,7 @@ subdir = src/backend/access
top_builddir = ../../..
include $(top_builddir)/src/Makefile.global
-SUBDIRS = brin common gin gist hash heap index nbtree rmgrdesc spgist transam
+SUBDIRS = brin common gin gist hash heap index nbtree rmgrdesc spgist \
+ transam
include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index cb6f8a3..76c2d3a 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -79,8 +79,9 @@ bool synchronize_seqscans = true;
static HeapScanDesc heap_beginscan_internal(Relation relation,
Snapshot snapshot,
int nkeys, ScanKey key,
- bool allow_strat, bool allow_sync,
- bool is_bitmapscan, bool temp_snap);
+ bool allow_strat, bool allow_sync, bool allow_pagemode,
+ bool is_bitmapscan, bool is_samplescan,
+ bool temp_snap);
static HeapTuple heap_prepare_insert(Relation relation, HeapTuple tup,
TransactionId xid, CommandId cid, int options);
static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
@@ -293,9 +294,10 @@ initscan(HeapScanDesc scan, ScanKey key, bool is_rescan)
/*
* Currently, we don't have a stats counter for bitmap heap scans (but the
- * underlying bitmap index scans will be counted).
+ * underlying bitmap index scans will be counted) or sample scans (we only
+ * update stats for tuple fetches there)
*/
- if (!scan->rs_bitmapscan)
+ if (!scan->rs_bitmapscan && !scan->rs_samplescan)
pgstat_count_heap_scan(scan->rs_rd);
}
@@ -314,7 +316,7 @@ heap_setscanlimits(HeapScanDesc scan, BlockNumber startBlk, BlockNumber numBlks)
* In page-at-a-time mode it performs additional work, namely determining
* which tuples on the page are visible.
*/
-static void
+void
heapgetpage(HeapScanDesc scan, BlockNumber page)
{
Buffer buffer;
@@ -1289,7 +1291,7 @@ heap_openrv_extended(const RangeVar *relation, LOCKMODE lockmode,
* heap_beginscan - begin relation scan
*
* heap_beginscan_strat offers an extended API that lets the caller control
- * whether a nondefault buffer access strategy can be used, and whether
+ * whether a nondefault buffer access strategy can be used and whether
* syncscan can be chosen (possibly resulting in the scan not starting from
* block zero). Both of these default to TRUE with plain heap_beginscan.
*
@@ -1297,6 +1299,9 @@ heap_openrv_extended(const RangeVar *relation, LOCKMODE lockmode,
* HeapScanDesc for a bitmap heap scan. Although that scan technology is
* really quite unlike a standard seqscan, there is just enough commonality
* to make it worth using the same data structure.
+ *
+ * heap_beginscan_ss is alternate entry point for setting up a
+ * HeapScanDesc for a TABLESAMPLE scan.
* ----------------
*/
HeapScanDesc
@@ -1304,7 +1309,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
int nkeys, ScanKey key)
{
return heap_beginscan_internal(relation, snapshot, nkeys, key,
- true, true, false, false);
+ true, true, true, false, false, false);
}
HeapScanDesc
@@ -1314,7 +1319,7 @@ heap_beginscan_catalog(Relation relation, int nkeys, ScanKey key)
Snapshot snapshot = RegisterSnapshot(GetCatalogSnapshot(relid));
return heap_beginscan_internal(relation, snapshot, nkeys, key,
- true, true, false, true);
+ true, true, true, false, false, true);
}
HeapScanDesc
@@ -1323,7 +1328,8 @@ heap_beginscan_strat(Relation relation, Snapshot snapshot,
bool allow_strat, bool allow_sync)
{
return heap_beginscan_internal(relation, snapshot, nkeys, key,
- allow_strat, allow_sync, false, false);
+ allow_strat, allow_sync, true,
+ false, false, false);
}
HeapScanDesc
@@ -1331,14 +1337,24 @@ heap_beginscan_bm(Relation relation, Snapshot snapshot,
int nkeys, ScanKey key)
{
return heap_beginscan_internal(relation, snapshot, nkeys, key,
- false, false, true, false);
+ false, false, true, true, false, false);
+}
+
+HeapScanDesc
+heap_beginscan_ss(Relation relation, Snapshot snapshot,
+ int nkeys, ScanKey key,
+ bool allow_strat, bool allow_pagemode)
+{
+ return heap_beginscan_internal(relation, snapshot, nkeys, key,
+ allow_strat, false, allow_pagemode,
+ false, true, false);
}
static HeapScanDesc
heap_beginscan_internal(Relation relation, Snapshot snapshot,
int nkeys, ScanKey key,
- bool allow_strat, bool allow_sync,
- bool is_bitmapscan, bool temp_snap)
+ bool allow_strat, bool allow_sync, bool allow_pagemode,
+ bool is_bitmapscan, bool is_samplescan, bool temp_snap)
{
HeapScanDesc scan;
@@ -1360,6 +1376,7 @@ heap_beginscan_internal(Relation relation, Snapshot snapshot,
scan->rs_snapshot = snapshot;
scan->rs_nkeys = nkeys;
scan->rs_bitmapscan = is_bitmapscan;
+ scan->rs_samplescan = is_samplescan;
scan->rs_strategy = NULL; /* set in initscan */
scan->rs_allow_strat = allow_strat;
scan->rs_allow_sync = allow_sync;
@@ -1368,7 +1385,7 @@ heap_beginscan_internal(Relation relation, Snapshot snapshot,
/*
* we can use page-at-a-time mode if it's an MVCC-safe snapshot
*/
- scan->rs_pageatatime = IsMVCCSnapshot(snapshot);
+ scan->rs_pageatatime = allow_pagemode && IsMVCCSnapshot(snapshot);
/*
* For a seqscan in a serializable transaction, acquire a predicate lock
diff --git a/src/backend/catalog/Makefile b/src/backend/catalog/Makefile
index a403c64..34db3e6 100644
--- a/src/backend/catalog/Makefile
+++ b/src/backend/catalog/Makefile
@@ -39,7 +39,7 @@ POSTGRES_BKI_SRCS = $(addprefix $(top_srcdir)/src/include/catalog/,\
pg_ts_config.h pg_ts_config_map.h pg_ts_dict.h \
pg_ts_parser.h pg_ts_template.h pg_extension.h \
pg_foreign_data_wrapper.h pg_foreign_server.h pg_user_mapping.h \
- pg_foreign_table.h pg_policy.h \
+ pg_foreign_table.h pg_policy.h pg_tablesample_method.h \
pg_default_acl.h pg_seclabel.h pg_shseclabel.h pg_collation.h pg_range.h \
toasting.h indexing.h \
)
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 952cf20..65e329e 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -1150,7 +1150,7 @@ acquire_sample_rows(Relation onerel, int elevel,
* Found a suitable tuple, so save it, replacing one
* old tuple at random
*/
- int k = (int) (targrows * sampler_random_fract());
+ int k = (int) (targrows * sampler_random_fract(rstate.randstate));
Assert(k >= 0 && k < targrows);
heap_freetuple(rows[k]);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 315a528..aee3c4f 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -732,6 +732,7 @@ ExplainPreScanNode(PlanState *planstate, Bitmapset **rels_used)
case T_WorkTableScan:
case T_ForeignScan:
case T_CustomScan:
+ case T_SampleScan:
*rels_used = bms_add_member(*rels_used,
((Scan *) plan)->scanrelid);
break;
@@ -957,6 +958,15 @@ ExplainNode(PlanState *planstate, List *ancestors,
else
pname = sname;
break;
+ case T_SampleScan:
+ {
+ /* Fetch the tablesample method info from RTE */
+ RangeTblEntry *rte;
+ rte = rt_fetch(((SampleScan *) plan)->scanrelid, es->rtable);
+ custom_name = get_tablesample_method_name(rte->tablesample->tsmid);
+ pname = psprintf("Sample Scan (%s)", custom_name);
+ }
+ break;
case T_Material:
pname = sname = "Materialize";
break;
@@ -1074,6 +1084,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
case T_WorkTableScan:
case T_ForeignScan:
case T_CustomScan:
+ case T_SampleScan:
ExplainScanTarget((Scan *) plan, es);
break;
case T_IndexScan:
@@ -1326,6 +1337,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
case T_CteScan:
case T_WorkTableScan:
case T_SubqueryScan:
+ case T_SampleScan:
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
@@ -2224,6 +2236,7 @@ ExplainTargetRel(Plan *plan, Index rti, ExplainState *es)
case T_TidScan:
case T_ForeignScan:
case T_CustomScan:
+ case T_SampleScan:
case T_ModifyTable:
/* Assert it's on a real relation */
Assert(rte->rtekind == RTE_RELATION);
diff --git a/src/backend/executor/Makefile b/src/backend/executor/Makefile
index af707b0..75f799c 100644
--- a/src/backend/executor/Makefile
+++ b/src/backend/executor/Makefile
@@ -21,7 +21,7 @@ OBJS = execAmi.o execCurrent.o execGrouping.o execJunk.o execMain.o \
nodeLimit.o nodeLockRows.o \
nodeMaterial.o nodeMergeAppend.o nodeMergejoin.o nodeModifyTable.o \
nodeNestloop.o nodeFunctionscan.o nodeRecursiveunion.o nodeResult.o \
- nodeSeqscan.o nodeSetOp.o nodeSort.o nodeUnique.o \
+ nodeSamplescan.o nodeSeqscan.o nodeSetOp.o nodeSort.o nodeUnique.o \
nodeValuesscan.o nodeCtescan.o nodeWorktablescan.o \
nodeGroup.o nodeSubplan.o nodeSubqueryscan.o nodeTidscan.o \
nodeForeignscan.o nodeWindowAgg.o tstoreReceiver.o spi.o
diff --git a/src/backend/executor/execAmi.c b/src/backend/executor/execAmi.c
index 6ebad2f..4948a26 100644
--- a/src/backend/executor/execAmi.c
+++ b/src/backend/executor/execAmi.c
@@ -39,6 +39,7 @@
#include "executor/nodeNestloop.h"
#include "executor/nodeRecursiveunion.h"
#include "executor/nodeResult.h"
+#include "executor/nodeSamplescan.h"
#include "executor/nodeSeqscan.h"
#include "executor/nodeSetOp.h"
#include "executor/nodeSort.h"
@@ -155,6 +156,10 @@ ExecReScan(PlanState *node)
ExecReScanSeqScan((SeqScanState *) node);
break;
+ case T_SampleScanState:
+ ExecReScanSampleScan((SampleScanState *) node);
+ break;
+
case T_IndexScanState:
ExecReScanIndexScan((IndexScanState *) node);
break;
@@ -480,6 +485,9 @@ ExecSupportsBackwardScan(Plan *node)
}
return false;
+ case T_SampleScan:
+ return false;
+
case T_Material:
case T_Sort:
/* these don't evaluate tlist */
diff --git a/src/backend/executor/execCurrent.c b/src/backend/executor/execCurrent.c
index d87be96..bcd287f 100644
--- a/src/backend/executor/execCurrent.c
+++ b/src/backend/executor/execCurrent.c
@@ -261,6 +261,7 @@ search_plan_tree(PlanState *node, Oid table_oid)
* Relation scan nodes can all be treated alike
*/
case T_SeqScanState:
+ case T_SampleScanState:
case T_IndexScanState:
case T_IndexOnlyScanState:
case T_BitmapHeapScanState:
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 9892499..03c2feb 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -102,6 +102,7 @@
#include "executor/nodeNestloop.h"
#include "executor/nodeRecursiveunion.h"
#include "executor/nodeResult.h"
+#include "executor/nodeSamplescan.h"
#include "executor/nodeSeqscan.h"
#include "executor/nodeSetOp.h"
#include "executor/nodeSort.h"
@@ -190,6 +191,11 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
estate, eflags);
break;
+ case T_SampleScan:
+ result = (PlanState *) ExecInitSampleScan((SampleScan *) node,
+ estate, eflags);
+ break;
+
case T_IndexScan:
result = (PlanState *) ExecInitIndexScan((IndexScan *) node,
estate, eflags);
@@ -406,6 +412,10 @@ ExecProcNode(PlanState *node)
result = ExecSeqScan((SeqScanState *) node);
break;
+ case T_SampleScanState:
+ result = ExecSampleScan((SampleScanState *) node);
+ break;
+
case T_IndexScanState:
result = ExecIndexScan((IndexScanState *) node);
break;
@@ -644,6 +654,10 @@ ExecEndNode(PlanState *node)
ExecEndSeqScan((SeqScanState *) node);
break;
+ case T_SampleScanState:
+ ExecEndSampleScan((SampleScanState *) node);
+ break;
+
case T_IndexScanState:
ExecEndIndexScan((IndexScanState *) node);
break;
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
new file mode 100644
index 0000000..065f9f5
--- /dev/null
+++ b/src/backend/executor/nodeSamplescan.c
@@ -0,0 +1,562 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeSamplescan.c
+ * Support routines for sample scans of relations (table sampling).
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ * src/backend/executor/nodeSamplescan.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/relscan.h"
+#include "catalog/pg_tablesample_method.h"
+#include "executor/executor.h"
+#include "executor/nodeSamplescan.h"
+#include "miscadmin.h"
+#include "parser/parsetree.h"
+#include "pgstat.h"
+#include "storage/bufmgr.h"
+#include "storage/predicate.h"
+#include "utils/rel.h"
+#include "utils/syscache.h"
+#include "utils/tqual.h"
+
+static void InitScanRelation(SampleScanState *node, EState *estate,
+ int eflags, TableSampleClause *tablesample);
+static TupleTableSlot *SampleNext(SampleScanState *node);
+static HeapTuple samplenexttup(SampleScanState *node, HeapScanDesc scan);
+
+
+/*
+ * Initialize the sampling method - loads function info and
+ * calls the tsminit function.
+ *
+ * We need special handling for this because the tsminit function
+ * is allowed to take optional additional arguments.
+ */
+static void
+InitSamplingMethod(SampleScanState *scanstate, TableSampleClause *tablesample)
+{
+ FunctionCallInfoData fcinfo;
+ int i;
+ List *args = tablesample->args;
+ ListCell *arg;
+ ExprContext *econtext = scanstate->ss.ps.ps_ExprContext;
+
+ /* Load functions */
+ fmgr_info(tablesample->tsminit, &(scanstate->tsminit));
+ fmgr_info(tablesample->tsmnextblock, &(scanstate->tsmnextblock));
+ fmgr_info(tablesample->tsmnexttuple, &(scanstate->tsmnexttuple));
+ if (OidIsValid(tablesample->tsmexaminetuple))
+ fmgr_info(tablesample->tsmexaminetuple, &(scanstate->tsmexaminetuple));
+ else
+ scanstate->tsmexaminetuple.fn_oid = InvalidOid;
+ fmgr_info(tablesample->tsmend, &(scanstate->tsmend));
+ fmgr_info(tablesample->tsmreset, &(scanstate->tsmreset));
+
+ InitFunctionCallInfoData(fcinfo, &scanstate->tsminit,
+ list_length(args) + 2,
+ InvalidOid, NULL, NULL);
+
+ /* First arg is always SampleScanState */
+ fcinfo.arg[0] = PointerGetDatum(scanstate);
+ fcinfo.argnull[0] = false;
+
+ /*
+ * Second arg is always REPEATABLE
+ * When tablesample->repeatable is NULL then REPEATABLE clause was not
+ * specified.
+ * When specified, the expression cannot evaluate to NULL.
+ */
+ if (tablesample->repeatable)
+ {
+ ExprState *argstate = ExecInitExpr((Expr *) tablesample->repeatable,
+ (PlanState *) scanstate);
+ fcinfo.arg[1] = ExecEvalExpr(argstate, econtext,
+ &fcinfo.argnull[1], NULL);
+ if (fcinfo.argnull[1])
+ ereport(ERROR,
+ (errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED),
+ errmsg("REPEATABLE clause must be NOT NULL numeric value")));
+ }
+ else
+ {
+ fcinfo.arg[1] = UInt32GetDatum(random());
+ fcinfo.argnull[1] = false;
+ }
+
+
+ i = 2;
+ foreach(arg, args)
+ {
+ Expr *argexpr = (Expr *) lfirst(arg);
+ ExprState *argstate = ExecInitExpr(argexpr, (PlanState *) scanstate);
+
+ if (argstate == NULL)
+ {
+ fcinfo.argnull[i] = true;
+ fcinfo.arg[i] = (Datum) 0;;
+ }
+
+ fcinfo.arg[i] = ExecEvalExpr(argstate, econtext,
+ &fcinfo.argnull[i], NULL);
+ i++;
+ }
+ Assert(i == fcinfo.nargs);
+
+ (void) FunctionCallInvoke(&fcinfo);
+}
+
+
+/* ----------------------------------------------------------------
+ * Scan Support
+ * ----------------------------------------------------------------
+ */
+
+/* ----------------------------------------------------------------
+ * SampleNext
+ *
+ * This is a workhorse for ExecSampleScan
+ * ----------------------------------------------------------------
+ */
+static TupleTableSlot *
+SampleNext(SampleScanState *node)
+{
+ TupleTableSlot *slot;
+ HeapScanDesc scan;
+ HeapTuple tuple;
+
+ /*
+ * get information from the scan state
+ */
+ slot = node->ss.ss_ScanTupleSlot;
+ scan = node->ss.ss_currentScanDesc;
+
+ tuple = samplenexttup(node, scan);
+
+ if (tuple)
+ ExecStoreTuple(tuple, /* tuple to store */
+ slot, /* slot to store in */
+ scan->rs_cbuf, /* buffer associated with this tuple */
+ false); /* don't pfree this pointer */
+ else
+ ExecClearTuple(slot);
+
+ return slot;
+}
+
+/*
+ * Check visibility of the tuple.
+ */
+static bool
+SampleTupleVisible(HeapTuple tuple, OffsetNumber tupoffset, HeapScanDesc scan)
+{
+ /*
+ * If this scan is reading whole pages at a time, there is already
+ * visibilty info present in rs_vistuples so we can just search it
+ * for the tupoffset.
+ */
+ if (scan->rs_pageatatime)
+ {
+ int start = 0,
+ end = scan->rs_ntuples - 1;
+
+ /*
+ * Do the binary search over rs_vistuples, it's already sorted by
+ * OffsetNumber so we don't need to do any sorting ourselves here.
+ *
+ * We could use bsearch() here but it's slower for integers because
+ * of the function call overhead and because it needs boiler plate code
+ * it would not save us anything code-wise anyway.
+ */
+ while (start <= end)
+ {
+ int mid = start + (end - start) / 2;
+ OffsetNumber curoffset = scan->rs_vistuples[mid];
+
+ if (curoffset == tupoffset)
+ return true;
+ else if (curoffset > tupoffset)
+ end = mid - 1;
+ else
+ start = mid + 1;
+ }
+
+ return false;
+ }
+ else
+ {
+ /* No pagemode, we have to check the tuple itself. */
+ Snapshot snapshot = scan->rs_snapshot;
+ Buffer buffer = scan->rs_cbuf;
+
+ bool visible = HeapTupleSatisfiesVisibility(tuple, snapshot, buffer);
+
+ CheckForSerializableConflictOut(visible, scan->rs_rd, tuple, buffer,
+ snapshot);
+
+ return visible;
+ }
+}
+
+/*
+ * Read next tuple using the correct sampling method.
+ */
+static HeapTuple
+samplenexttup(SampleScanState *node, HeapScanDesc scan)
+{
+ HeapTuple tuple = &(scan->rs_ctup);
+ bool pagemode = scan->rs_pageatatime;
+ BlockNumber blockno;
+ Page page;
+ bool page_all_visible;
+ ItemId itemid;
+ OffsetNumber tupoffset,
+ maxoffset;
+
+ if (!scan->rs_inited)
+ {
+ /*
+ * return null immediately if relation is empty
+ */
+ if (scan->rs_nblocks == 0)
+ {
+ Assert(!BufferIsValid(scan->rs_cbuf));
+ tuple->t_data = NULL;
+ return NULL;
+ }
+ blockno = DatumGetInt32(FunctionCall1(&node->tsmnextblock,
+ PointerGetDatum(node)));
+ if (!BlockNumberIsValid(blockno))
+ {
+ tuple->t_data = NULL;
+ return NULL;
+ }
+
+ heapgetpage(scan, blockno);
+ scan->rs_inited = true;
+ }
+ else
+ {
+ /* continue from previously returned page/tuple */
+ blockno = scan->rs_cblock; /* current page */
+ }
+
+ /*
+ * When pagemode is disabled, the scan will do visibility checks for each
+ * tuple it finds so the buffer needs to be locked.
+ */
+ if (!pagemode)
+ LockBuffer(scan->rs_cbuf, BUFFER_LOCK_SHARE);
+
+ page = (Page) BufferGetPage(scan->rs_cbuf);
+ page_all_visible = PageIsAllVisible(page);
+ maxoffset = PageGetMaxOffsetNumber(page);
+
+ for (;;)
+ {
+ CHECK_FOR_INTERRUPTS();
+
+ tupoffset = DatumGetUInt16(FunctionCall3(&node->tsmnexttuple,
+ PointerGetDatum(node),
+ UInt32GetDatum(blockno),
+ UInt16GetDatum(maxoffset)));
+
+ if (OffsetNumberIsValid(tupoffset))
+ {
+ bool visible;
+ bool found;
+
+ /* Skip invalid tuple pointers. */
+ itemid = PageGetItemId(page, tupoffset);
+ if (!ItemIdIsNormal(itemid))
+ continue;
+
+ tuple->t_data = (HeapTupleHeader) PageGetItem((Page) page, itemid);
+ tuple->t_len = ItemIdGetLength(itemid);
+ ItemPointerSet(&(tuple->t_self), blockno, tupoffset);
+
+ if (page_all_visible)
+ visible = true;
+ else
+ visible = SampleTupleVisible(tuple, tupoffset, scan);
+
+ /*
+ * Let the sampling method examine the actual tuple and decide if we
+ * should return it.
+ *
+ * Note that we let it examine even invisible tuples for
+ * statistical purposes, but not return them since user should
+ * never see invisible tuples.
+ */
+ if (OidIsValid(node->tsmexaminetuple.fn_oid))
+ {
+ found = DatumGetBool(FunctionCall4(&node->tsmexaminetuple,
+ PointerGetDatum(node),
+ UInt32GetDatum(blockno),
+ PointerGetDatum(tuple),
+ BoolGetDatum(visible)));
+ /* Should not happen if sampling method is well written. */
+ if (found && !visible)
+ elog(ERROR, "Sampling method wanted to return invisible tuple");
+ }
+ else
+ found = visible;
+
+ /* Found visible tuple, return it. */
+ if (found)
+ {
+ if (!pagemode)
+ LockBuffer(scan->rs_cbuf, BUFFER_LOCK_UNLOCK);
+ break;
+ }
+ else
+ {
+ /* Try next tuple from same page. */
+ continue;
+ }
+ }
+
+
+ if (!pagemode)
+ LockBuffer(scan->rs_cbuf, BUFFER_LOCK_UNLOCK);
+
+ blockno = DatumGetInt32(FunctionCall1(&node->tsmnextblock,
+ PointerGetDatum(node)));
+
+ /*
+ * Report our new scan position for synchronization purposes. We
+ * don't do that when moving backwards, however. That would just
+ * mess up any other forward-moving scanners.
+ *
+ * Note: we do this before checking for end of scan so that the
+ * final state of the position hint is back at the start of the
+ * rel. That's not strictly necessary, but otherwise when you run
+ * the same query multiple times the starting position would shift
+ * a little bit backwards on every invocation, which is confusing.
+ * We don't guarantee any specific ordering in general, though.
+ */
+ if (scan->rs_syncscan)
+ ss_report_location(scan->rs_rd, BlockNumberIsValid(blockno) ?
+ blockno : scan->rs_startblock);
+
+ /*
+ * Reached end of scan.
+ */
+ if (!BlockNumberIsValid(blockno))
+ {
+ if (BufferIsValid(scan->rs_cbuf))
+ ReleaseBuffer(scan->rs_cbuf);
+ scan->rs_cbuf = InvalidBuffer;
+ scan->rs_cblock = InvalidBlockNumber;
+ tuple->t_data = NULL;
+ scan->rs_inited = false;
+ return NULL;
+ }
+
+ heapgetpage(scan, blockno);
+
+ if (!pagemode)
+ LockBuffer(scan->rs_cbuf, BUFFER_LOCK_SHARE);
+
+ page = (Page) BufferGetPage(scan->rs_cbuf);
+ page_all_visible = PageIsAllVisible(page);
+ maxoffset = PageGetMaxOffsetNumber(page);
+ }
+
+ pgstat_count_heap_getnext(scan->rs_rd);
+
+ return &(scan->rs_ctup);
+}
+
+/*
+ * SampleRecheck -- access method routine to recheck a tuple in EvalPlanQual
+ */
+static bool
+SampleRecheck(SampleScanState *node, TupleTableSlot *slot)
+{
+ /* No need to recheck for SampleScan */
+ return true;
+}
+
+/* ----------------------------------------------------------------
+ * ExecSampleScan(node)
+ *
+ * Scans the relation using the sampling method and returns
+ * the next qualifying tuple.
+ * We call the ExecScan() routine and pass it the appropriate
+ * access method functions.
+ * ----------------------------------------------------------------
+ */
+TupleTableSlot *
+ExecSampleScan(SampleScanState *node)
+{
+ return ExecScan((ScanState *) node,
+ (ExecScanAccessMtd) SampleNext,
+ (ExecScanRecheckMtd) SampleRecheck);
+}
+
+/* ----------------------------------------------------------------
+ * InitScanRelation
+ *
+ * Set up to access the scan relation.
+ * ----------------------------------------------------------------
+ */
+static void
+InitScanRelation(SampleScanState *node, EState *estate, int eflags,
+ TableSampleClause *tablesample)
+{
+ Relation currentRelation;
+
+ /*
+ * get the relation object id from the relid'th entry in the range table,
+ * open that relation and acquire appropriate lock on it.
+ */
+ currentRelation = ExecOpenScanRelation(estate,
+ ((SampleScan *) node->ss.ps.plan)->scanrelid,
+ eflags);
+
+ node->ss.ss_currentRelation = currentRelation;
+
+ /*
+ * Even though we aren't going to do a conventional seqscan, it is useful
+ * to create a HeapScanDesc --- many of the fields in it are usable.
+ */
+ node->ss.ss_currentScanDesc =
+ heap_beginscan_ss(currentRelation, estate->es_snapshot, 0, NULL,
+ tablesample->tsmseqscan, tablesample->tsmpagemode);
+
+ /* and report the scan tuple slot's rowtype */
+ ExecAssignScanType(&node->ss, RelationGetDescr(currentRelation));
+}
+
+
+/* ----------------------------------------------------------------
+ * ExecInitSampleScan
+ * ----------------------------------------------------------------
+ */
+SampleScanState *
+ExecInitSampleScan(SampleScan *node, EState *estate, int eflags)
+{
+ SampleScanState *scanstate;
+ RangeTblEntry *rte = rt_fetch(node->scanrelid,
+ estate->es_range_table);
+
+ Assert(outerPlan(node) == NULL);
+ Assert(innerPlan(node) == NULL);
+ Assert(rte->tablesample != NULL);
+
+ /*
+ * create state structure
+ */
+ scanstate = makeNode(SampleScanState);
+ scanstate->ss.ps.plan = (Plan *) node;
+ scanstate->ss.ps.state = estate;
+
+ /*
+ * Miscellaneous initialization
+ *
+ * create expression context for node
+ */
+ ExecAssignExprContext(estate, &scanstate->ss.ps);
+
+ /*
+ * initialize child expressions
+ */
+ scanstate->ss.ps.targetlist = (List *)
+ ExecInitExpr((Expr *) node->plan.targetlist,
+ (PlanState *) scanstate);
+ scanstate->ss.ps.qual = (List *)
+ ExecInitExpr((Expr *) node->plan.qual,
+ (PlanState *) scanstate);
+
+ /*
+ * tuple table initialization
+ */
+ ExecInitResultTupleSlot(estate, &scanstate->ss.ps);
+ ExecInitScanTupleSlot(estate, &scanstate->ss);
+
+ /*
+ * initialize scan relation
+ */
+ InitScanRelation(scanstate, estate, eflags, rte->tablesample);
+
+ scanstate->ss.ps.ps_TupFromTlist = false;
+
+ /*
+ * Initialize result tuple type and projection info.
+ */
+ ExecAssignResultTypeFromTL(&scanstate->ss.ps);
+ ExecAssignScanProjectionInfo(&scanstate->ss);
+
+ InitSamplingMethod(scanstate, rte->tablesample);
+
+ return scanstate;
+}
+
+/* ----------------------------------------------------------------
+ * ExecEndSampleScan
+ *
+ * frees any storage allocated through C routines.
+ * ----------------------------------------------------------------
+ */
+void
+ExecEndSampleScan(SampleScanState *node)
+{
+ /*
+ * Tell sampling function that we finished thes can.
+ */
+ FunctionCall1(&node->tsmend, PointerGetDatum(node));
+
+ /*
+ * Free the exprcontext
+ */
+ ExecFreeExprContext(&node->ss.ps);
+
+ /*
+ * clean out the tuple table
+ */
+ ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
+
+ /*
+ * close heap scan
+ */
+ heap_endscan(node->ss.ss_currentScanDesc);
+
+ /*
+ * close the heap relation.
+ */
+ ExecCloseScanRelation(node->ss.ss_currentRelation);
+}
+
+/* ----------------------------------------------------------------
+ * Join Support
+ * ----------------------------------------------------------------
+ */
+
+/* ----------------------------------------------------------------
+ * ExecReScanSampleScan
+ *
+ * Rescans the relation.
+ *
+ * ----------------------------------------------------------------
+ */
+void
+ExecReScanSampleScan(SampleScanState *node)
+{
+ heap_rescan(node->ss.ss_currentScanDesc, NULL);
+
+ /*
+ * Tell sampling function to reset its state for rescan.
+ */
+ FunctionCall1(&node->tsmreset, PointerGetDatum(node));
+
+ ExecScanReScan(&node->ss);
+}
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 029761e..1a4c85b 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -630,6 +630,22 @@ _copyCustomScan(const CustomScan *from)
}
/*
+ * _copySampleScan
+ */
+static SampleScan *
+_copySampleScan(const SampleScan *from)
+{
+ SampleScan *newnode = makeNode(SampleScan);
+
+ /*
+ * copy node superclass fields
+ */
+ CopyScanFields((const Scan *) from, (Scan *) newnode);
+
+ return newnode;
+}
+
+/*
* CopyJoinFields
*
* This function copies the fields of the Join node. It is used by
@@ -2015,6 +2031,7 @@ _copyRangeTblEntry(const RangeTblEntry *from)
COPY_SCALAR_FIELD(rtekind);
COPY_SCALAR_FIELD(relid);
COPY_SCALAR_FIELD(relkind);
+ COPY_NODE_FIELD(tablesample);
COPY_NODE_FIELD(subquery);
COPY_SCALAR_FIELD(security_barrier);
COPY_SCALAR_FIELD(jointype);
@@ -2147,6 +2164,40 @@ _copyCommonTableExpr(const CommonTableExpr *from)
return newnode;
}
+static RangeTableSample *
+_copyRangeTableSample(const RangeTableSample *from)
+{
+ RangeTableSample *newnode = makeNode(RangeTableSample);
+
+ COPY_NODE_FIELD(relation);
+ COPY_STRING_FIELD(method);
+ COPY_NODE_FIELD(repeatable);
+ COPY_NODE_FIELD(args);
+
+ return newnode;
+}
+
+static TableSampleClause *
+_copyTableSampleClause(const TableSampleClause *from)
+{
+ TableSampleClause *newnode = makeNode(TableSampleClause);
+
+ COPY_SCALAR_FIELD(tsmid);
+ COPY_SCALAR_FIELD(tsmseqscan);
+ COPY_SCALAR_FIELD(tsmpagemode);
+ COPY_SCALAR_FIELD(tsminit);
+ COPY_SCALAR_FIELD(tsmnextblock);
+ COPY_SCALAR_FIELD(tsmnexttuple);
+ COPY_SCALAR_FIELD(tsmexaminetuple);
+ COPY_SCALAR_FIELD(tsmend);
+ COPY_SCALAR_FIELD(tsmreset);
+ COPY_SCALAR_FIELD(tsmcost);
+ COPY_NODE_FIELD(repeatable);
+ COPY_NODE_FIELD(args);
+
+ return newnode;
+}
+
static A_Expr *
_copyAExpr(const A_Expr *from)
{
@@ -4084,6 +4135,9 @@ copyObject(const void *from)
case T_CustomScan:
retval = _copyCustomScan(from);
break;
+ case T_SampleScan:
+ retval = _copySampleScan(from);
+ break;
case T_Join:
retval = _copyJoin(from);
break;
@@ -4732,6 +4786,12 @@ copyObject(const void *from)
case T_CommonTableExpr:
retval = _copyCommonTableExpr(from);
break;
+ case T_RangeTableSample:
+ retval = _copyRangeTableSample(from);
+ break;
+ case T_TableSampleClause:
+ retval = _copyTableSampleClause(from);
+ break;
case T_FuncWithArgs:
retval = _copyFuncWithArgs(from);
break;
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 190e50a..27626b5 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -2318,6 +2318,7 @@ _equalRangeTblEntry(const RangeTblEntry *a, const RangeTblEntry *b)
COMPARE_SCALAR_FIELD(rtekind);
COMPARE_SCALAR_FIELD(relid);
COMPARE_SCALAR_FIELD(relkind);
+ COMPARE_NODE_FIELD(tablesample);
COMPARE_NODE_FIELD(subquery);
COMPARE_SCALAR_FIELD(security_barrier);
COMPARE_SCALAR_FIELD(jointype);
@@ -2437,6 +2438,36 @@ _equalCommonTableExpr(const CommonTableExpr *a, const CommonTableExpr *b)
}
static bool
+_equalRangeTableSample(const RangeTableSample *a, const RangeTableSample *b)
+{
+ COMPARE_NODE_FIELD(relation);
+ COMPARE_STRING_FIELD(method);
+ COMPARE_NODE_FIELD(repeatable);
+ COMPARE_NODE_FIELD(args);
+
+ return true;
+}
+
+static bool
+_equalTableSampleClause(const TableSampleClause *a, const TableSampleClause *b)
+{
+ COMPARE_SCALAR_FIELD(tsmid);
+ COMPARE_SCALAR_FIELD(tsmseqscan);
+ COMPARE_SCALAR_FIELD(tsmpagemode);
+ COMPARE_SCALAR_FIELD(tsminit);
+ COMPARE_SCALAR_FIELD(tsmnextblock);
+ COMPARE_SCALAR_FIELD(tsmnexttuple);
+ COMPARE_SCALAR_FIELD(tsmexaminetuple);
+ COMPARE_SCALAR_FIELD(tsmend);
+ COMPARE_SCALAR_FIELD(tsmreset);
+ COMPARE_SCALAR_FIELD(tsmcost);
+ COMPARE_NODE_FIELD(repeatable);
+ COMPARE_NODE_FIELD(args);
+
+ return true;
+}
+
+static bool
_equalXmlSerialize(const XmlSerialize *a, const XmlSerialize *b)
{
COMPARE_SCALAR_FIELD(xmloption);
@@ -3155,6 +3186,12 @@ equal(const void *a, const void *b)
case T_CommonTableExpr:
retval = _equalCommonTableExpr(a, b);
break;
+ case T_RangeTableSample:
+ retval = _equalRangeTableSample(a, b);
+ break;
+ case T_TableSampleClause:
+ retval = _equalTableSampleClause(a, b);
+ break;
case T_FuncWithArgs:
retval = _equalFuncWithArgs(a, b);
break;
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index d6f1f5b..7742189 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -3219,6 +3219,18 @@ raw_expression_tree_walker(Node *node,
return walker(((WithClause *) node)->ctes, context);
case T_CommonTableExpr:
return walker(((CommonTableExpr *) node)->ctequery, context);
+ case T_RangeTableSample:
+ {
+ RangeTableSample *rts = (RangeTableSample *) node;
+
+ if (walker(rts->relation, context))
+ return true;
+ if (walker(rts->repeatable, context))
+ return true;
+ if (walker(rts->args, context))
+ return true;
+ }
+ break;
default:
elog(ERROR, "unrecognized node type: %d",
(int) nodeTag(node));
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 385b289..e26dbf0 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -580,6 +580,14 @@ _outCustomScan(StringInfo str, const CustomScan *node)
}
static void
+_outSampleScan(StringInfo str, const SampleScan *node)
+{
+ WRITE_NODE_TYPE("SAMPLESCAN");
+
+ _outScanInfo(str, (const Scan *) node);
+}
+
+static void
_outJoin(StringInfo str, const Join *node)
{
WRITE_NODE_TYPE("JOIN");
@@ -2404,6 +2412,36 @@ _outCommonTableExpr(StringInfo str, const CommonTableExpr *node)
}
static void
+_outRangeTableSample(StringInfo str, const RangeTableSample *node)
+{
+ WRITE_NODE_TYPE("RANGETABLESAMPLE");
+
+ WRITE_NODE_FIELD(relation);
+ WRITE_STRING_FIELD(method);
+ WRITE_NODE_FIELD(repeatable);
+ WRITE_NODE_FIELD(args);
+}
+
+static void
+_outTableSampleClause(StringInfo str, const TableSampleClause *node)
+{
+ WRITE_NODE_TYPE("TABLESAMPLECLAUSE");
+
+ WRITE_OID_FIELD(tsmid);
+ WRITE_BOOL_FIELD(tsmseqscan);
+ WRITE_BOOL_FIELD(tsmpagemode);
+ WRITE_OID_FIELD(tsminit);
+ WRITE_OID_FIELD(tsmnextblock);
+ WRITE_OID_FIELD(tsmnexttuple);
+ WRITE_OID_FIELD(tsmexaminetuple);
+ WRITE_OID_FIELD(tsmend);
+ WRITE_OID_FIELD(tsmreset);
+ WRITE_OID_FIELD(tsmcost);
+ WRITE_NODE_FIELD(repeatable);
+ WRITE_NODE_FIELD(args);
+}
+
+static void
_outSetOperationStmt(StringInfo str, const SetOperationStmt *node)
{
WRITE_NODE_TYPE("SETOPERATIONSTMT");
@@ -2433,6 +2471,7 @@ _outRangeTblEntry(StringInfo str, const RangeTblEntry *node)
case RTE_RELATION:
WRITE_OID_FIELD(relid);
WRITE_CHAR_FIELD(relkind);
+ WRITE_NODE_FIELD(tablesample);
break;
case RTE_SUBQUERY:
WRITE_NODE_FIELD(subquery);
@@ -2931,6 +2970,9 @@ _outNode(StringInfo str, const void *obj)
case T_CustomScan:
_outCustomScan(str, obj);
break;
+ case T_SampleScan:
+ _outSampleScan(str, obj);
+ break;
case T_Join:
_outJoin(str, obj);
break;
@@ -3272,6 +3314,12 @@ _outNode(StringInfo str, const void *obj)
case T_CommonTableExpr:
_outCommonTableExpr(str, obj);
break;
+ case T_RangeTableSample:
+ _outRangeTableSample(str, obj);
+ break;
+ case T_TableSampleClause:
+ _outTableSampleClause(str, obj);
+ break;
case T_SetOperationStmt:
_outSetOperationStmt(str, obj);
break;
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 563209c..05ed9a8 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -350,6 +350,46 @@ _readCommonTableExpr(void)
}
/*
+ * _readRangeTableSample
+ */
+static RangeTableSample *
+_readRangeTableSample(void)
+{
+ READ_LOCALS(RangeTableSample);
+
+ READ_NODE_FIELD(relation);
+ READ_STRING_FIELD(method);
+ READ_NODE_FIELD(repeatable);
+ READ_NODE_FIELD(args);
+
+ READ_DONE();
+}
+
+/*
+ * _readTableSampleClause
+ */
+static TableSampleClause *
+_readTableSampleClause(void)
+{
+ READ_LOCALS(TableSampleClause);
+
+ READ_OID_FIELD(tsmid);
+ READ_BOOL_FIELD(tsmseqscan);
+ READ_BOOL_FIELD(tsmpagemode);
+ READ_OID_FIELD(tsminit);
+ READ_OID_FIELD(tsmnextblock);
+ READ_OID_FIELD(tsmnexttuple);
+ READ_OID_FIELD(tsmexaminetuple);
+ READ_OID_FIELD(tsmend);
+ READ_OID_FIELD(tsmreset);
+ READ_OID_FIELD(tsmcost);
+ READ_NODE_FIELD(repeatable);
+ READ_NODE_FIELD(args);
+
+ READ_DONE();
+}
+
+/*
* _readSetOperationStmt
*/
static SetOperationStmt *
@@ -1218,6 +1258,7 @@ _readRangeTblEntry(void)
case RTE_RELATION:
READ_OID_FIELD(relid);
READ_CHAR_FIELD(relkind);
+ READ_NODE_FIELD(tablesample);
break;
case RTE_SUBQUERY:
READ_NODE_FIELD(subquery);
@@ -1313,6 +1354,10 @@ parseNodeString(void)
return_value = _readRowMarkClause();
else if (MATCH("COMMONTABLEEXPR", 15))
return_value = _readCommonTableExpr();
+ else if (MATCH("RANGETABLESAMPLE", 16))
+ return_value = _readRangeTableSample();
+ else if (MATCH("TABLESAMPLECLAUSE", 17))
+ return_value = _readTableSampleClause();
else if (MATCH("SETOPERATIONSTMT", 16))
return_value = _readSetOperationStmt();
else if (MATCH("ALIAS", 5))
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 58d78e6..5f12477 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -71,6 +71,10 @@ static void set_plain_rel_size(PlannerInfo *root, RelOptInfo *rel,
RangeTblEntry *rte);
static void set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
RangeTblEntry *rte);
+static void set_tablesample_rel_size(PlannerInfo *root, RelOptInfo *rel,
+ RangeTblEntry *rte);
+static void set_tablesample_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
+ RangeTblEntry *rte);
static void set_foreign_size(PlannerInfo *root, RelOptInfo *rel,
RangeTblEntry *rte);
static void set_foreign_pathlist(PlannerInfo *root, RelOptInfo *rel,
@@ -265,6 +269,11 @@ set_rel_size(PlannerInfo *root, RelOptInfo *rel,
/* Foreign table */
set_foreign_size(root, rel, rte);
}
+ else if (rte->tablesample != NULL)
+ {
+ /* Sampled relation */
+ set_tablesample_rel_size(root, rel, rte);
+ }
else
{
/* Plain relation */
@@ -332,6 +341,11 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
/* Foreign table */
set_foreign_pathlist(root, rel, rte);
}
+ else if (rte->tablesample != NULL)
+ {
+ /* Build sample scan on relation */
+ set_tablesample_rel_pathlist(root, rel, rte);
+ }
else
{
/* Plain relation */
@@ -418,6 +432,41 @@ set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
}
/*
+ * set_tablesample_rel_size
+ * Set size estimates for a sampled relation.
+ */
+static void
+set_tablesample_rel_size(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
+{
+ /* Mark rel with estimated output rows, width, etc */
+ set_baserel_size_estimates(root, rel);
+}
+
+/*
+ * set_tablesample_rel_pathlist
+ * Build access paths for a sampled relation
+ *
+ * There is only one possible path - sampling scan
+ */
+static void
+set_tablesample_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
+{
+ Relids required_outer;
+ Path *path;
+
+ /*
+ * We don't support pushing join clauses into the quals of a seqscan, but
+ * it could still have required parameterization due to LATERAL refs in
+ * its tlist.
+ */
+ required_outer = rel->lateral_relids;
+
+ /* We only do sample scan if it was requested */
+ path = create_samplescan_path(root, rel, required_outer);
+ rel->pathlist = list_make1(path);
+}
+
+/*
* set_foreign_size
* Set size estimates for a foreign table RTE
*/
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 1a0d358..51583a1 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -90,6 +90,7 @@
#include "utils/lsyscache.h"
#include "utils/selfuncs.h"
#include "utils/spccache.h"
+#include "utils/tablesample.h"
#include "utils/tuplesort.h"
@@ -220,6 +221,73 @@ cost_seqscan(Path *path, PlannerInfo *root,
}
/*
+ * cost_samplescan
+ * Determines and returns the cost of scanning a relation using sampling.
+ *
+ * From planner/optimizer perspective, we dont't care all that much about cost
+ * itself since there is always only one scan path to consider when sampling
+ * scan is present, but number of rows estimation is still important.
+ *
+ * 'baserel' is the relation to be scanned
+ * 'param_info' is the ParamPathInfo if this is a parameterized path, else NULL
+ */
+void
+cost_samplescan(Path *path, PlannerInfo *root, RelOptInfo *baserel)
+{
+ Cost startup_cost = 0;
+ Cost run_cost = 0;
+ double spc_seq_page_cost,
+ spc_random_page_cost,
+ spc_page_cost;
+ QualCost qpqual_cost;
+ Cost cpu_per_tuple;
+ BlockNumber pages;
+ double tuples;
+ RangeTblEntry *rte = planner_rt_fetch(baserel->relid, root);
+ TableSampleClause *tablesample = rte->tablesample;
+
+ /* Should only be applied to base relations */
+ Assert(baserel->relid > 0);
+ Assert(baserel->rtekind == RTE_RELATION);
+
+ /* Mark the path with the correct row estimate */
+ if (path->param_info)
+ path->rows = path->param_info->ppi_rows;
+ else
+ path->rows = baserel->rows;
+
+ /* Call the sampling method's costing function. */
+ OidFunctionCall6(tablesample->tsmcost, PointerGetDatum(root),
+ PointerGetDatum(path), PointerGetDatum(baserel),
+ PointerGetDatum(tablesample->args),
+ PointerGetDatum(&pages), PointerGetDatum(&tuples));
+
+ /* fetch estimated page cost for tablespace containing table */
+ get_tablespace_page_costs(baserel->reltablespace,
+ &spc_random_page_cost,
+ &spc_seq_page_cost);
+
+
+ spc_page_cost = tablesample->tsmseqscan ? spc_seq_page_cost :
+ spc_random_page_cost;
+
+ /*
+ * disk costs
+ */
+ run_cost += spc_page_cost * pages;
+
+ /* CPU costs */
+ get_restriction_qual_cost(root, baserel, path->param_info, &qpqual_cost);
+
+ startup_cost += qpqual_cost.startup;
+ cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple;
+ run_cost += cpu_per_tuple * tuples;
+
+ path->startup_cost = startup_cost;
+ path->total_cost = startup_cost + run_cost;
+}
+
+/*
* cost_index
* Determines and returns the cost of scanning a relation using an index.
*
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index cb69c03..3fc84e2 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -58,6 +58,8 @@ static Material *create_material_plan(PlannerInfo *root, MaterialPath *best_path
static Plan *create_unique_plan(PlannerInfo *root, UniquePath *best_path);
static SeqScan *create_seqscan_plan(PlannerInfo *root, Path *best_path,
List *tlist, List *scan_clauses);
+static SampleScan *create_samplescan_plan(PlannerInfo *root, Path *best_path,
+ List *tlist, List *scan_clauses);
static Scan *create_indexscan_plan(PlannerInfo *root, IndexPath *best_path,
List *tlist, List *scan_clauses, bool indexonly);
static BitmapHeapScan *create_bitmap_scan_plan(PlannerInfo *root,
@@ -100,6 +102,7 @@ static List *order_qual_clauses(PlannerInfo *root, List *clauses);
static void copy_path_costsize(Plan *dest, Path *src);
static void copy_plan_costsize(Plan *dest, Plan *src);
static SeqScan *make_seqscan(List *qptlist, List *qpqual, Index scanrelid);
+static SampleScan *make_samplescan(List *qptlist, List *qpqual, Index scanrelid);
static IndexScan *make_indexscan(List *qptlist, List *qpqual, Index scanrelid,
Oid indexid, List *indexqual, List *indexqualorig,
List *indexorderby, List *indexorderbyorig,
@@ -228,6 +231,7 @@ create_plan_recurse(PlannerInfo *root, Path *best_path)
switch (best_path->pathtype)
{
case T_SeqScan:
+ case T_SampleScan:
case T_IndexScan:
case T_IndexOnlyScan:
case T_BitmapHeapScan:
@@ -343,6 +347,13 @@ create_scan_plan(PlannerInfo *root, Path *best_path)
scan_clauses);
break;
+ case T_SampleScan:
+ plan = (Plan *) create_samplescan_plan(root,
+ best_path,
+ tlist,
+ scan_clauses);
+ break;
+
case T_IndexScan:
plan = (Plan *) create_indexscan_plan(root,
(IndexPath *) best_path,
@@ -546,6 +557,7 @@ disuse_physical_tlist(PlannerInfo *root, Plan *plan, Path *path)
switch (path->pathtype)
{
case T_SeqScan:
+ case T_SampleScan:
case T_IndexScan:
case T_IndexOnlyScan:
case T_BitmapHeapScan:
@@ -1133,6 +1145,45 @@ create_seqscan_plan(PlannerInfo *root, Path *best_path,
}
/*
+ * create_samplescan_plan
+ * Returns a samplecan plan for the base relation scanned by 'best_path'
+ * with restriction clauses 'scan_clauses' and targetlist 'tlist'.
+ */
+static SampleScan *
+create_samplescan_plan(PlannerInfo *root, Path *best_path,
+ List *tlist, List *scan_clauses)
+{
+ SampleScan *scan_plan;
+ Index scan_relid = best_path->parent->relid;
+
+ /* it should be a base rel with tablesample clause... */
+ Assert(scan_relid > 0);
+ Assert(best_path->parent->rtekind == RTE_RELATION);
+ Assert(best_path->pathtype == T_SampleScan);
+
+ /* Sort clauses into best execution order */
+ scan_clauses = order_qual_clauses(root, scan_clauses);
+
+ /* Reduce RestrictInfo list to bare expressions; ignore pseudoconstants */
+ scan_clauses = extract_actual_clauses(scan_clauses, false);
+
+ /* Replace any outer-relation variables with nestloop params */
+ if (best_path->param_info)
+ {
+ scan_clauses = (List *)
+ replace_nestloop_params(root, (Node *) scan_clauses);
+ }
+
+ scan_plan = make_samplescan(tlist,
+ scan_clauses,
+ scan_relid);
+
+ copy_path_costsize(&scan_plan->plan, best_path);
+
+ return scan_plan;
+}
+
+/*
* create_indexscan_plan
* Returns an indexscan plan for the base relation scanned by 'best_path'
* with restriction clauses 'scan_clauses' and targetlist 'tlist'.
@@ -3321,6 +3372,24 @@ make_seqscan(List *qptlist,
return node;
}
+static SampleScan *
+make_samplescan(List *qptlist,
+ List *qpqual,
+ Index scanrelid)
+{
+ SampleScan *node = makeNode(SampleScan);
+ Plan *plan = &node->plan;
+
+ /* cost should be inserted by caller */
+ plan->targetlist = qptlist;
+ plan->qual = qpqual;
+ plan->lefttree = NULL;
+ plan->righttree = NULL;
+ node->scanrelid = scanrelid;
+
+ return node;
+}
+
static IndexScan *
make_indexscan(List *qptlist,
List *qpqual,
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 94b12ab..a2ae940 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -445,6 +445,17 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
fix_scan_list(root, splan->plan.qual, rtoffset);
}
break;
+ case T_SampleScan:
+ {
+ SampleScan *splan = (SampleScan *) plan;
+
+ splan->scanrelid += rtoffset;
+ splan->plan.targetlist =
+ fix_scan_list(root, splan->plan.targetlist, rtoffset);
+ splan->plan.qual =
+ fix_scan_list(root, splan->plan.qual, rtoffset);
+ }
+ break;
case T_IndexScan:
{
IndexScan *splan = (IndexScan *) plan;
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index acfd0bc..9971b54 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -2167,6 +2167,7 @@ finalize_plan(PlannerInfo *root, Plan *plan, Bitmapset *valid_params,
break;
case T_SeqScan:
+ case T_SampleScan:
context.paramids = bms_add_members(context.paramids, scan_params);
break;
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index faca30b..ea7a47b 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -706,6 +706,26 @@ create_seqscan_path(PlannerInfo *root, RelOptInfo *rel, Relids required_outer)
}
/*
+ * create_samplescan_path
+ * Like seqscan but uses sampling function while scanning.
+ */
+Path *
+create_samplescan_path(PlannerInfo *root, RelOptInfo *rel, Relids required_outer)
+{
+ Path *pathnode = makeNode(Path);
+
+ pathnode->pathtype = T_SampleScan;
+ pathnode->parent = rel;
+ pathnode->param_info = get_baserel_parampathinfo(root, rel,
+ required_outer);
+ pathnode->pathkeys = NIL; /* samplescan has unordered result */
+
+ cost_samplescan(pathnode, root, rel);
+
+ return pathnode;
+}
+
+/*
* create_index_path
* Creates a path node for an index scan.
*
@@ -1778,6 +1798,8 @@ reparameterize_path(PlannerInfo *root, Path *path,
case T_SubqueryScan:
return create_subqueryscan_path(root, rel, path->pathkeys,
required_outer);
+ case T_SampleScan:
+ return (Path *) create_samplescan_path(root, rel, required_outer);
default:
break;
}
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 88ec83c..8ee6b40 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -448,6 +448,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
%type <range> relation_expr
%type <range> relation_expr_opt_alias
%type <target> target_el single_set_clause set_target insert_column_item
+%type <node> relation_expr_tablesample tablesample_clause opt_repeatable_clause
%type <str> generic_option_name
%type <node> generic_option_arg
@@ -615,8 +616,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
STATEMENT STATISTICS STDIN STDOUT STORAGE STRICT_P STRIP_P SUBSTRING
SYMMETRIC SYSID SYSTEM_P
- TABLE TABLES TABLESPACE TEMP TEMPLATE TEMPORARY TEXT_P THEN TIME TIMESTAMP
- TO TRAILING TRANSACTION TREAT TRIGGER TRIM TRUE_P
+ TABLE TABLES TABLESAMPLE TABLESPACE TEMP TEMPLATE TEMPORARY TEXT_P THEN
+ TIME TIMESTAMP TO TRAILING TRANSACTION TREAT TRIGGER TRIM TRUE_P
TRUNCATE TRUSTED TYPE_P TYPES_P
UNBOUNDED UNCOMMITTED UNENCRYPTED UNION UNIQUE UNKNOWN UNLISTEN UNLOGGED
@@ -10197,6 +10198,10 @@ table_ref: relation_expr opt_alias_clause
$1->alias = $2;
$$ = (Node *) $1;
}
+ | relation_expr_tablesample
+ {
+ $$ = (Node *) $1;
+ }
| func_table func_alias_clause
{
RangeFunction *n = (RangeFunction *) $1;
@@ -10522,6 +10527,32 @@ relation_expr_opt_alias: relation_expr %prec UMINUS
}
;
+
+relation_expr_tablesample: relation_expr opt_alias_clause tablesample_clause
+ {
+ RangeTableSample *n = (RangeTableSample *) $3;
+ n->relation = $1;
+ n->relation->alias = $2;
+ $$ = (Node *) n;
+ }
+ ;
+
+tablesample_clause:
+ TABLESAMPLE ColId '(' expr_list ')' opt_repeatable_clause
+ {
+ RangeTableSample *n = makeNode(RangeTableSample);
+ n->method = $2;
+ n->args = $4;
+ n->repeatable = $6;
+ $$ = (Node *) n;
+ }
+ ;
+
+opt_repeatable_clause:
+ REPEATABLE '(' a_expr ')' { $$ = (Node *) $3; }
+ | /*EMPTY*/ { $$ = NULL; }
+ ;
+
/*
* func_table represents a function invocation in a FROM list. It can be
* a plain function call, like "foo(...)", or a ROWS FROM expression with
@@ -13612,6 +13643,7 @@ type_func_name_keyword:
| OVERLAPS
| RIGHT
| SIMILAR
+ | TABLESAMPLE
| VERBOSE
;
diff --git a/src/backend/parser/parse_clause.c b/src/backend/parser/parse_clause.c
index 8d90b50..44a3021 100644
--- a/src/backend/parser/parse_clause.c
+++ b/src/backend/parser/parse_clause.c
@@ -16,6 +16,7 @@
#include "postgres.h"
#include "access/heapam.h"
+#include "access/htup_details.h"
#include "catalog/heap.h"
#include "catalog/pg_type.h"
#include "commands/defrem.h"
@@ -29,6 +30,7 @@
#include "parser/parse_coerce.h"
#include "parser/parse_collate.h"
#include "parser/parse_expr.h"
+#include "parser/parse_func.h"
#include "parser/parse_oper.h"
#include "parser/parse_relation.h"
#include "parser/parse_target.h"
@@ -36,6 +38,7 @@
#include "utils/guc.h"
#include "utils/lsyscache.h"
#include "utils/rel.h"
+#include "utils/syscache.h"
/* Convenience macro for the most common makeNamespaceItem() case */
@@ -414,6 +417,29 @@ transformJoinOnClause(ParseState *pstate, JoinExpr *j, List *namespace)
return result;
}
+static RangeTblEntry *
+transformTableSampleEntry(ParseState *pstate, RangeTableSample *r)
+{
+ RangeTblEntry *rte;
+ TableSampleClause *tablesample = NULL;
+
+ /* We first need to build a range table entry */
+ rte = transformTableEntry(pstate, r->relation);
+
+ if (rte->relkind != RELKIND_RELATION &&
+ rte->relkind != RELKIND_MATVIEW)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("TABLESAMPLE clause can only be used on tables and materialized views"),
+ parser_errposition(pstate, r->relation->location)));
+
+ tablesample = ParseTableSample(pstate, r->method, r->repeatable, r->args,
+ r->relation->location);
+ rte->tablesample = tablesample;
+
+ return rte;
+}
+
/*
* transformTableEntry --- transform a RangeVar (simple relation reference)
*/
@@ -1122,6 +1148,27 @@ transformFromClauseItem(ParseState *pstate, Node *n,
return (Node *) j;
}
+ else if (IsA(n, RangeTableSample))
+ {
+ /* Tablesample reference */
+ RangeTableSample *rv = (RangeTableSample *) n;
+ RangeTblRef *rtr;
+ RangeTblEntry *rte = NULL;
+ int rtindex;
+
+
+ rte = transformTableSampleEntry(pstate, rv);
+
+ /* assume new rte is at end */
+ rtindex = list_length(pstate->p_rtable);
+ Assert(rte == rt_fetch(rtindex, pstate->p_rtable));
+ *top_rte = rte;
+ *top_rti = rtindex;
+ *namespace = list_make1(makeDefaultNSItem(rte));
+ rtr = makeNode(RangeTblRef);
+ rtr->rtindex = rtindex;
+ return (Node *) rtr;
+ }
else
elog(ERROR, "unrecognized node type: %d", (int) nodeTag(n));
return NULL; /* can't get here, keep compiler quiet */
diff --git a/src/backend/parser/parse_func.c b/src/backend/parser/parse_func.c
index 1385776..ab87635 100644
--- a/src/backend/parser/parse_func.c
+++ b/src/backend/parser/parse_func.c
@@ -18,6 +18,7 @@
#include "catalog/pg_aggregate.h"
#include "catalog/pg_proc.h"
#include "catalog/pg_type.h"
+#include "catalog/pg_tablesample_method.h"
#include "funcapi.h"
#include "lib/stringinfo.h"
#include "nodes/makefuncs.h"
@@ -26,6 +27,7 @@
#include "parser/parse_clause.h"
#include "parser/parse_coerce.h"
#include "parser/parse_func.h"
+#include "parser/parse_expr.h"
#include "parser/parse_relation.h"
#include "parser/parse_target.h"
#include "parser/parse_type.h"
@@ -767,6 +769,147 @@ ParseFuncOrColumn(ParseState *pstate, List *funcname, List *fargs,
}
+/*
+ * ParseTableSample
+ *
+ * Parse TABLESAMPLE clause and process the arguments
+ */
+TableSampleClause *
+ParseTableSample(ParseState *pstate, char *samplemethod, Node *repeatable,
+ List *sampleargs, int location)
+{
+ HeapTuple tuple;
+ Form_pg_tablesample_method tsm;
+ Form_pg_proc procform;
+ TableSampleClause *tablesample;
+ List *fargs;
+ ListCell *larg;
+ int nargs, initnargs;
+ Oid init_arg_types[FUNC_MAX_ARGS];
+
+ /* Load the tablesample method */
+ tuple = SearchSysCache1(TABLESAMPLEMETHODNAME, PointerGetDatum(samplemethod));
+ if (!HeapTupleIsValid(tuple))
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_OBJECT),
+ errmsg("tablesample method \"%s\" does not exist",
+ samplemethod),
+ parser_errposition(pstate, location)));
+
+ tablesample = makeNode(TableSampleClause);
+ tablesample->tsmid = HeapTupleGetOid(tuple);
+
+ tsm = (Form_pg_tablesample_method) GETSTRUCT(tuple);
+
+ tablesample->tsmseqscan = tsm->tsmseqscan;
+ tablesample->tsmpagemode = tsm->tsmpagemode;
+ tablesample->tsminit = tsm->tsminit;
+ tablesample->tsmnextblock = tsm->tsmnextblock;
+ tablesample->tsmnexttuple = tsm->tsmnexttuple;
+ tablesample->tsmexaminetuple = tsm->tsmexaminetuple;
+ tablesample->tsmend = tsm->tsmend;
+ tablesample->tsmreset = tsm->tsmreset;
+ tablesample->tsmcost = tsm->tsmcost;
+
+ ReleaseSysCache(tuple);
+
+ /* Validate the parameters against init function definition. */
+ tuple = SearchSysCache1(PROCOID,
+ ObjectIdGetDatum(tablesample->tsminit));
+
+ if (!HeapTupleIsValid(tuple)) /* should not happen */
+ elog(ERROR, "cache lookup failed for function %u",
+ tablesample->tsminit);
+
+ procform = (Form_pg_proc) GETSTRUCT(tuple);
+ initnargs = procform->pronargs;
+ Assert(initnargs >= 3);
+
+ /*
+ * First parameter is used to pass the SampleScanState, second is
+ * seed (REPEATABLE), skip the processing for them here, just assert
+ * that the types are correct.
+ */
+ Assert(procform->proargtypes.values[0] == INTERNALOID);
+ Assert(procform->proargtypes.values[1] == INT4OID);
+ initnargs -= 2;
+ memcpy(init_arg_types, procform->proargtypes.values + 2,
+ initnargs * sizeof(Oid));
+
+ /* Now we are done with the catalog */
+ ReleaseSysCache(tuple);
+
+ /* Process repeatable (seed) */
+ if (repeatable != NULL)
+ {
+ Node *arg = repeatable;
+
+ if (arg && IsA(arg, A_Const))
+ {
+ A_Const *con = (A_Const *) arg;
+
+ if (con->val.type == T_Null)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("REPEATABLE clause must be NOT NULL numeric value"),
+ parser_errposition(pstate, con->location)));
+
+ }
+
+ arg = transformExpr(pstate, arg, EXPR_KIND_FROM_FUNCTION);
+ arg = coerce_to_specific_type(pstate, arg, INT4OID, "REPEATABLE");
+ tablesample->repeatable = arg;
+ }
+ else
+ tablesample->repeatable = NULL;
+
+ /* Check user provided expected number of arguments. */
+ if (list_length(sampleargs) != initnargs)
+ ereport(ERROR,
+ (errcode(ERRCODE_TOO_MANY_ARGUMENTS),
+ errmsg_plural("tablesample method \"%s\" expects %d argument got %d",
+ "tablesample method \"%s\" expects %d arguments got %d",
+ initnargs,
+ samplemethod,
+ initnargs, list_length(sampleargs)),
+ parser_errposition(pstate, location)));
+
+ /* Transform the arguments, typecasting them as needed. */
+ fargs = NIL;
+ nargs = 0;
+ foreach(larg, sampleargs)
+ {
+ Node *inarg = (Node *) lfirst(larg);
+ Node *arg = transformExpr(pstate, inarg, EXPR_KIND_FROM_FUNCTION);
+ Oid argtype = exprType(arg);
+
+ if (argtype != init_arg_types[nargs])
+ {
+ if (!can_coerce_type(1, &argtype, &init_arg_types[nargs],
+ COERCION_IMPLICIT))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("wrong parameter %d for tablesample method \"%s\"",
+ nargs + 1, samplemethod),
+ errdetail("Expected type %s got %s.",
+ format_type_be(init_arg_types[nargs]),
+ format_type_be(argtype)),
+ parser_errposition(pstate, exprLocation(inarg))));
+
+ arg = coerce_type(pstate, arg, argtype, init_arg_types[nargs], -1,
+ COERCION_IMPLICIT, COERCE_IMPLICIT_CAST, -1);
+ }
+
+ fargs = lappend(fargs, arg);
+ nargs++;
+ }
+
+ /* Pass the arguments down */
+ tablesample->args = fargs;
+
+ return tablesample;
+}
+
/* func_match_argtypes()
*
* Given a list of candidate functions (having the right name and number
diff --git a/src/backend/rewrite/rewriteHandler.c b/src/backend/rewrite/rewriteHandler.c
index 9d2c280..385ae9c 100644
--- a/src/backend/rewrite/rewriteHandler.c
+++ b/src/backend/rewrite/rewriteHandler.c
@@ -2160,6 +2160,9 @@ view_query_is_auto_updatable(Query *viewquery, bool check_cols)
base_rte->relkind != RELKIND_VIEW))
return gettext_noop("Views that do not select from a single table or view are not automatically updatable.");
+ if (base_rte->tablesample)
+ return gettext_noop("Views containing TABLESAMPLE are not automatically updatable.");
+
/*
* Check that the view has at least one updatable column. This is required
* for INSERT/UPDATE but not for DELETE.
diff --git a/src/backend/utils/Makefile b/src/backend/utils/Makefile
index 8374533..9daa2ae 100644
--- a/src/backend/utils/Makefile
+++ b/src/backend/utils/Makefile
@@ -9,7 +9,8 @@ top_builddir = ../../..
include $(top_builddir)/src/Makefile.global
OBJS = fmgrtab.o
-SUBDIRS = adt cache error fmgr hash init mb misc mmgr resowner sort time
+SUBDIRS = adt cache error fmgr hash init mb misc mmgr resowner sort time \
+ tablesample
# location of Catalog.pm
catalogdir = $(top_srcdir)/src/backend/catalog
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index 29b5b1b..a6bd34c 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -32,6 +32,7 @@
#include "catalog/pg_opclass.h"
#include "catalog/pg_operator.h"
#include "catalog/pg_proc.h"
+#include "catalog/pg_tablesample_method.h"
#include "catalog/pg_trigger.h"
#include "catalog/pg_type.h"
#include "commands/defrem.h"
@@ -344,6 +345,8 @@ static void make_ruledef(StringInfo buf, HeapTuple ruletup, TupleDesc rulettc,
int prettyFlags);
static void make_viewdef(StringInfo buf, HeapTuple ruletup, TupleDesc rulettc,
int prettyFlags, int wrapColumn);
+static void get_tablesample_def(TableSampleClause *tablesample,
+ deparse_context *context);
static void get_query_def(Query *query, StringInfo buf, List *parentnamespace,
TupleDesc resultDesc,
int prettyFlags, int wrapColumn, int startIndent);
@@ -4185,6 +4188,50 @@ make_viewdef(StringInfo buf, HeapTuple ruletup, TupleDesc rulettc,
heap_close(ev_relation, AccessShareLock);
}
+/* ----------
+ * get_tablesample_def - Convert TableSampleClause back to SQL
+ * ----------
+ */
+static void
+get_tablesample_def(TableSampleClause *tablesample, deparse_context *context)
+{
+ StringInfo buf = context->buf;
+ HeapTuple tuple;
+ Form_pg_tablesample_method tsm;
+ char *tsmname;
+ int nargs;
+ ListCell *l;
+
+ /* Load the tablesample method */
+ tuple = SearchSysCache1(TABLESAMPLEMETHODOID, ObjectIdGetDatum(tablesample->tsmid));
+ if (!HeapTupleIsValid(tuple))
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_OBJECT),
+ errmsg("cache lookup failed for tablesample method %u",
+ tablesample->tsmid)));
+
+ tsm = (Form_pg_tablesample_method) GETSTRUCT(tuple);
+ tsmname = NameStr(tsm->tsmname);
+ appendStringInfo(buf, " TABLESAMPLE %s (", quote_identifier(tsmname));
+
+ ReleaseSysCache(tuple);
+
+ nargs = 0;
+ foreach(l, tablesample->args)
+ {
+ if (nargs++ > 0)
+ appendStringInfoString(buf, ", ");
+ get_rule_expr((Node *) lfirst(l), context, true);
+ }
+ appendStringInfoChar(buf, ')');
+
+ if (tablesample->repeatable != NULL)
+ {
+ appendStringInfoString(buf, " REPEATABLE (");
+ get_rule_expr(tablesample->repeatable, context, true);
+ appendStringInfoChar(buf, ')');
+ }
+}
/* ----------
* get_query_def - Parse back one query parsetree
@@ -8453,6 +8500,9 @@ get_from_clause_item(Node *jtnode, Query *query, deparse_context *context)
only_marker(rte),
generate_relation_name(rte->relid,
context->namespaces));
+
+ if (rte->tablesample)
+ get_tablesample_def(rte->tablesample, context);
break;
case RTE_SUBQUERY:
/* Subquery RTE */
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index 818c2f6..311364c 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -29,6 +29,7 @@
#include "catalog/pg_proc.h"
#include "catalog/pg_range.h"
#include "catalog/pg_statistic.h"
+#include "catalog/pg_tablesample_method.h"
#include "catalog/pg_type.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
@@ -2910,3 +2911,29 @@ get_range_subtype(Oid rangeOid)
else
return InvalidOid;
}
+
+/* ---------- PG_TABLESAMPLE_METHOD CACHE ---------- */
+
+/*
+ * get_tablesample_method_name - given a tablesample method OID,
+ * look up the name or NULL if not found
+ */
+char *
+get_tablesample_method_name(Oid tsmid)
+{
+ HeapTuple tuple;
+
+ tuple = SearchSysCache1(TABLESAMPLEMETHODOID, ObjectIdGetDatum(tsmid));
+ if (HeapTupleIsValid(tuple))
+ {
+ Form_pg_tablesample_method tup =
+ (Form_pg_tablesample_method) GETSTRUCT(tuple);
+ char *result;
+
+ result = pstrdup(NameStr(tup->tsmname));
+ ReleaseSysCache(tuple);
+ return result;
+ }
+ else
+ return NULL;
+}
diff --git a/src/backend/utils/cache/syscache.c b/src/backend/utils/cache/syscache.c
index bd27168..3a8f01e 100644
--- a/src/backend/utils/cache/syscache.c
+++ b/src/backend/utils/cache/syscache.c
@@ -55,6 +55,7 @@
#include "catalog/pg_shdescription.h"
#include "catalog/pg_shseclabel.h"
#include "catalog/pg_statistic.h"
+#include "catalog/pg_tablesample_method.h"
#include "catalog/pg_tablespace.h"
#include "catalog/pg_ts_config.h"
#include "catalog/pg_ts_config_map.h"
@@ -642,6 +643,28 @@ static const struct cachedesc cacheinfo[] = {
},
128
},
+ {TableSampleMethodRelationId, /* TABLESAMPLEMETHODNAME */
+ TableSampleMethodNameIndexId,
+ 1,
+ {
+ Anum_pg_tablesample_method_tsmname,
+ 0,
+ 0,
+ 0,
+ },
+ 2
+ },
+ {TableSampleMethodRelationId, /* TABLESAMPLEMETHODOID */
+ TableSampleMethodOidIndexId,
+ 1,
+ {
+ ObjectIdAttributeNumber,
+ 0,
+ 0,
+ 0,
+ },
+ 2
+ },
{TableSpaceRelationId, /* TABLESPACEOID */
TablespaceOidIndexId,
1,
diff --git a/src/backend/utils/misc/sampling.c b/src/backend/utils/misc/sampling.c
index 1eeabaf..f213c46 100644
--- a/src/backend/utils/misc/sampling.c
+++ b/src/backend/utils/misc/sampling.c
@@ -46,6 +46,8 @@ BlockSampler_Init(BlockSampler bs, BlockNumber nblocks, int samplesize,
bs->n = samplesize;
bs->t = 0; /* blocks scanned so far */
bs->m = 0; /* blocks selected so far */
+
+ sampler_random_init_state(randseed, bs->randstate);
}
bool
@@ -92,7 +94,7 @@ BlockSampler_Next(BlockSampler bs)
* less than k, which means that we cannot fail to select enough blocks.
*----------
*/
- V = sampler_random_fract();
+ V = sampler_random_fract(bs->randstate);
p = 1.0 - (double) k / (double) K;
while (V < p)
{
@@ -126,8 +128,14 @@ BlockSampler_Next(BlockSampler bs)
void
reservoir_init_selection_state(ReservoirState rs, int n)
{
+ /*
+ * Reservoir sampling is not used anywhere where it would need to return
+ * repeatable results so we can initialize it randomly.
+ */
+ sampler_random_init_state(random(), rs->randstate);
+
/* Initial value of W (for use when Algorithm Z is first applied) */
- *rs = exp(-log(sampler_random_fract()) / n);
+ rs->W = exp(-log(sampler_random_fract(rs->randstate)) / n);
}
double
@@ -142,7 +150,7 @@ reservoir_get_next_S(ReservoirState rs, double t, int n)
double V,
quot;
- V = sampler_random_fract(); /* Generate V */
+ V = sampler_random_fract(rs->randstate); /* Generate V */
S = 0;
t += 1;
/* Note: "num" in Vitter's code is always equal to t - n */
@@ -158,7 +166,7 @@ reservoir_get_next_S(ReservoirState rs, double t, int n)
else
{
/* Now apply Algorithm Z */
- double W = *rs;
+ double W = rs->W;
double term = t - (double) n + 1;
for (;;)
@@ -174,7 +182,7 @@ reservoir_get_next_S(ReservoirState rs, double t, int n)
tmp;
/* Generate U and X */
- U = sampler_random_fract();
+ U = sampler_random_fract(rs->randstate);
X = t * (W - 1.0);
S = floor(X); /* S is tentatively set to floor(X) */
/* Test if U <= h(S)/cg(X) in the manner of (6.3) */
@@ -203,11 +211,11 @@ reservoir_get_next_S(ReservoirState rs, double t, int n)
y *= numer / denom;
denom -= 1;
}
- W = exp(-log(sampler_random_fract()) / n); /* Generate W in advance */
+ W = exp(-log(sampler_random_fract(rs->randstate)) / n); /* Generate W in advance */
if (exp(log(y) / n) <= (t + X) / t)
break;
}
- *rs = W;
+ rs->W = W;
}
return S;
}
@@ -217,10 +225,17 @@ reservoir_get_next_S(ReservoirState rs, double t, int n)
* Random number generator used by sampling
*----------
*/
+void
+sampler_random_init_state(long seed, SamplerRandomState randstate)
+{
+ randstate[0] = 0x330e;
+ randstate[1] = (unsigned short) seed;
+ randstate[2] = (unsigned short) (seed >> 16);
+}
/* Select a random value R uniformly distributed in (0 - 1) */
double
-sampler_random_fract()
+sampler_random_fract(SamplerRandomState randstate)
{
- return ((double) random() + 1) / ((double) MAX_RANDOM_VALUE + 2);
+ return pg_erand48(randstate);
}
diff --git a/src/backend/utils/tablesample/Makefile b/src/backend/utils/tablesample/Makefile
new file mode 100644
index 0000000..df92939
--- /dev/null
+++ b/src/backend/utils/tablesample/Makefile
@@ -0,0 +1,17 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+# Makefile for utils/tablesample
+#
+# IDENTIFICATION
+# src/backend/utils/tablesample/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/backend/utils/tablesample
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS = system.o bernoulli.o
+
+include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/utils/tablesample/bernoulli.c b/src/backend/utils/tablesample/bernoulli.c
new file mode 100644
index 0000000..36f4bcb
--- /dev/null
+++ b/src/backend/utils/tablesample/bernoulli.c
@@ -0,0 +1,224 @@
+/*-------------------------------------------------------------------------
+ *
+ * bernoulli.c
+ * interface routines for BERNOULLI tablesample method
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/utils/tablesample/bernoulli.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "fmgr.h"
+
+#include "access/relscan.h"
+#include "nodes/execnodes.h"
+#include "nodes/relation.h"
+#include "optimizer/clauses.h"
+#include "storage/bufmgr.h"
+#include "utils/sampling.h"
+#include "utils/tablesample.h"
+
+
+/* State */
+typedef struct
+{
+ uint32 seed; /* random seed */
+ BlockNumber startblock; /* starting block, we use ths for syncscan support */
+ BlockNumber nblocks; /* number of blocks */
+ BlockNumber blockno; /* current block */
+ float4 probability; /* probabilty that tuple will be returned (0.0-1.0) */
+ OffsetNumber lt; /* last tuple returned from current block */
+ SamplerRandomState randstate; /* random generator state */
+} BernoulliSamplerData;
+
+/*
+ * Initialize the state.
+ */
+Datum
+tsm_bernoulli_init(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ uint32 seed = PG_GETARG_UINT32(1);
+ float4 percent = PG_ARGISNULL(2) ? -1 : PG_GETARG_FLOAT4(2);
+ HeapScanDesc scan = scanstate->ss.ss_currentScanDesc;
+ BernoulliSamplerData *sampler;
+
+ if (percent < 0 || percent > 100)
+ ereport(ERROR,
+ (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+ errmsg("invalid sample size"),
+ errhint("Sample size must be numeric value between 0 and 100 (inclusive).")));
+
+ sampler = palloc0(sizeof(BernoulliSamplerData));
+
+ /* Remember initial values for reinit */
+ sampler->seed = seed;
+ sampler->startblock = scan->rs_startblock;
+ sampler->nblocks = scan->rs_nblocks;
+ sampler->blockno = InvalidBlockNumber;
+ sampler->probability = percent / 100;
+ sampler->lt = InvalidOffsetNumber;
+ sampler_random_init_state(sampler->seed, sampler->randstate);
+
+ scanstate->tsmdata = (void *) sampler;
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Get next block number to read or InvalidBlockNumber if we are at the
+ * end of the relation.
+ */
+Datum
+tsm_bernoulli_nextblock(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ BernoulliSamplerData *sampler =
+ (BernoulliSamplerData *) scanstate->tsmdata;
+
+ /*
+ * Bernoulli sampling scans all blocks on the table and supports
+ * syncscan so loop from startblock to startblock instead of
+ * from 0 to nblocks.
+ */
+ if (sampler->blockno == InvalidBlockNumber)
+ sampler->blockno = sampler->startblock;
+ else
+ {
+ sampler->blockno++;
+
+ if (sampler->blockno >= sampler->nblocks)
+ sampler->blockno = 0;
+
+ if (sampler->blockno == sampler->startblock)
+ PG_RETURN_UINT32(InvalidBlockNumber);
+ }
+
+ PG_RETURN_UINT32(sampler->blockno);
+}
+
+/*
+ * Get next tuple from current block.
+ *
+ * This method implements the main logic in bernoulli sampling.
+ * The algorithm simply generates new random number (in 0.0-1.0 range) and if
+ * it falls within user specified probability (in the same range) return the
+ * tuple offset.
+ *
+ * If we reach end of the block return InvalidOffsetNumber which tells
+ * SampleScan to go to next block.
+ */
+Datum
+tsm_bernoulli_nexttuple(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ OffsetNumber maxoffset = PG_GETARG_UINT16(2);
+ BernoulliSamplerData *sampler =
+ (BernoulliSamplerData *) scanstate->tsmdata;
+ OffsetNumber tupoffset = sampler->lt;
+ float4 probability = sampler->probability;
+
+ if (tupoffset == InvalidOffsetNumber)
+ tupoffset = FirstOffsetNumber;
+ else
+ tupoffset++;
+
+ /*
+ * Loop over tuple offsets until the random generator returns value that
+ * is within the probability of returning the tuple or until we reach
+ * end of the block.
+ *
+ * (This is our implementation of bernoulli trial)
+ */
+ while (sampler_random_fract(sampler->randstate) > probability)
+ {
+ tupoffset++;
+
+ if (tupoffset > maxoffset)
+ break;
+ }
+
+ if (tupoffset > maxoffset)
+ /* Tell SampleScan that we want next block. */
+ tupoffset = InvalidOffsetNumber;
+
+ sampler->lt = tupoffset;
+
+ PG_RETURN_UINT16(tupoffset);
+}
+
+/*
+ * Cleanup method.
+ */
+Datum
+tsm_bernoulli_end(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+
+ pfree(scanstate->tsmdata);
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Reset state (called by ReScan).
+ */
+Datum
+tsm_bernoulli_reset(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ BernoulliSamplerData *sampler =
+ (BernoulliSamplerData *) scanstate->tsmdata;
+
+ sampler->blockno = InvalidBlockNumber;
+ sampler->lt = InvalidOffsetNumber;
+ sampler_random_init_state(sampler->seed, sampler->randstate);
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Costing function.
+ */
+Datum
+tsm_bernoulli_cost(PG_FUNCTION_ARGS)
+{
+ PlannerInfo *root = (PlannerInfo *) PG_GETARG_POINTER(0);
+ Path *path = (Path *) PG_GETARG_POINTER(1);
+ RelOptInfo *baserel = (RelOptInfo *) PG_GETARG_POINTER(2);
+ List *args = (List *) PG_GETARG_POINTER(3);
+ BlockNumber *pages = (BlockNumber *) PG_GETARG_POINTER(4);
+ double *tuples = (double *) PG_GETARG_POINTER(5);
+ Node *pctnode;
+ float4 samplesize;
+
+ *pages = baserel->pages;
+
+ pctnode = linitial(args);
+ pctnode = estimate_expression_value(root, pctnode);
+
+ if (IsA(pctnode, RelabelType))
+ pctnode = (Node *) ((RelabelType *) pctnode)->arg;
+
+ if (IsA(pctnode, Const))
+ {
+ samplesize = DatumGetFloat4(((Const *) pctnode)->constvalue);
+ samplesize /= 100.0;
+ }
+ else
+ {
+ /* Default samplesize if the estimation didn't return Const. */
+ samplesize = 0.1f;
+ }
+
+ *tuples = path->rows * samplesize;
+ path->rows = *tuples;
+
+ PG_RETURN_VOID();
+}
diff --git a/src/backend/utils/tablesample/system.c b/src/backend/utils/tablesample/system.c
new file mode 100644
index 0000000..07d1f3a
--- /dev/null
+++ b/src/backend/utils/tablesample/system.c
@@ -0,0 +1,185 @@
+/*-------------------------------------------------------------------------
+ *
+ * system.c
+ * interface routines for system tablesample method
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/utils/tablesample/system.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "fmgr.h"
+
+#include "nodes/execnodes.h"
+#include "nodes/relation.h"
+#include "optimizer/clauses.h"
+#include "storage/bufmgr.h"
+#include "utils/sampling.h"
+#include "utils/tablesample.h"
+
+
+/*
+ * State
+ */
+typedef struct
+{
+ BlockSamplerData bs;
+ uint32 seed; /* random seed */
+ BlockNumber tblocks; /* total blocks in relation */
+ int samplesize; /* number of blocks to return */
+ OffsetNumber lt; /* last tuple returned from current block */
+} SystemSamplerData;
+
+
+/*
+ * Initializes the state.
+ */
+Datum
+tsm_system_init(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ uint32 seed = PG_GETARG_UINT32(1);
+ float4 percent = PG_ARGISNULL(2) ? -1 : PG_GETARG_FLOAT4(2);
+ SystemSamplerData *sampler;
+
+ if (percent < 0 || percent > 100)
+ ereport(ERROR,
+ (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+ errmsg("invalid sample size"),
+ errhint("Sample size must be numeric value between 0 and 100 (inclusive).")));
+
+ sampler = palloc0(sizeof(SystemSamplerData));
+
+ /* Remember initial values for reinit */
+ sampler->seed = seed;
+ sampler->tblocks =
+ RelationGetNumberOfBlocks(scanstate->ss.ss_currentRelation);
+ sampler->samplesize = 1 + (int) (sampler->tblocks * (percent / 100.0));
+ sampler->lt = InvalidOffsetNumber;
+
+ BlockSampler_Init(&sampler->bs, sampler->tblocks, sampler->samplesize,
+ sampler->seed);
+
+ scanstate->tsmdata = (void *) sampler;
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Get next block number or InvalidBlockNumber when we're done.
+ *
+ * Uses the same logic as VACUUM for picking the random blocks.
+ */
+Datum
+tsm_system_nextblock(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ SystemSamplerData *sampler = (SystemSamplerData *) scanstate->tsmdata;
+ BlockNumber blockno;
+
+ if (!BlockSampler_HasMore(&sampler->bs))
+ PG_RETURN_UINT32(InvalidBlockNumber);
+
+ blockno = BlockSampler_Next(&sampler->bs);
+
+ PG_RETURN_UINT32(blockno);
+}
+
+/*
+ * Get next tuple offset in current block or InvalidOffsetNumber if we are done
+ * with this block.
+ */
+Datum
+tsm_system_nexttuple(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ OffsetNumber maxoffset = PG_GETARG_UINT16(2);
+ SystemSamplerData *sampler = (SystemSamplerData *) scanstate->tsmdata;
+ OffsetNumber tupoffset = sampler->lt;
+
+ if (tupoffset == InvalidOffsetNumber)
+ tupoffset = FirstOffsetNumber;
+ else
+ tupoffset++;
+
+ if (tupoffset > maxoffset)
+ tupoffset = InvalidOffsetNumber;
+
+ sampler->lt = tupoffset;
+
+ PG_RETURN_UINT16(tupoffset);
+}
+
+/*
+ * Cleanup method.
+ */
+Datum
+tsm_system_end(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+
+ pfree(scanstate->tsmdata);
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Reset state (called by ReScan).
+ */
+Datum
+tsm_system_reset(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ SystemSamplerData *sampler = (SystemSamplerData *) scanstate->tsmdata;
+
+ sampler->lt = InvalidOffsetNumber;
+ BlockSampler_Init(&sampler->bs, sampler->tblocks, sampler->samplesize,
+ sampler->seed);
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Costing function.
+ */
+Datum
+tsm_system_cost(PG_FUNCTION_ARGS)
+{
+ PlannerInfo *root = (PlannerInfo *) PG_GETARG_POINTER(0);
+ Path *path = (Path *) PG_GETARG_POINTER(1);
+ RelOptInfo *baserel = (RelOptInfo *) PG_GETARG_POINTER(2);
+ List *args = (List *) PG_GETARG_POINTER(3);
+ BlockNumber *pages = (BlockNumber *) PG_GETARG_POINTER(4);
+ double *tuples = (double *) PG_GETARG_POINTER(5);
+ Node *pctnode;
+ float4 samplesize;
+
+ pctnode = linitial(args);
+ pctnode = estimate_expression_value(root, pctnode);
+
+ if (IsA(pctnode, RelabelType))
+ pctnode = (Node *) ((RelabelType *) pctnode)->arg;
+
+ if (IsA(pctnode, Const))
+ {
+ samplesize = DatumGetFloat4(((Const *) pctnode)->constvalue);
+ samplesize /= 100.0;
+ }
+ else
+ {
+ /* Default samplesize if the estimation didn't return Const. */
+ samplesize = 0.1f;
+ }
+
+ *pages = baserel->pages * samplesize;
+ *tuples = path->rows * samplesize;
+ path->rows = *tuples;
+
+ PG_RETURN_VOID();
+}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 888cce7..69cc702 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -113,8 +113,12 @@ extern HeapScanDesc heap_beginscan_strat(Relation relation, Snapshot snapshot,
bool allow_strat, bool allow_sync);
extern HeapScanDesc heap_beginscan_bm(Relation relation, Snapshot snapshot,
int nkeys, ScanKey key);
+extern HeapScanDesc heap_beginscan_ss(Relation relation, Snapshot snapshot,
+ int nkeys, ScanKey key,
+ bool allow_strat, bool allow_pagemode);
extern void heap_setscanlimits(HeapScanDesc scan, BlockNumber startBlk,
BlockNumber endBlk);
+extern void heapgetpage(HeapScanDesc scan, BlockNumber page);
extern void heap_rescan(HeapScanDesc scan, ScanKey key);
extern void heap_endscan(HeapScanDesc scan);
extern HeapTuple heap_getnext(HeapScanDesc scan, ScanDirection direction);
diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
index 9bb6362..e2b2b4f 100644
--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -29,6 +29,7 @@ typedef struct HeapScanDescData
int rs_nkeys; /* number of scan keys */
ScanKey rs_key; /* array of scan key descriptors */
bool rs_bitmapscan; /* true if this is really a bitmap scan */
+ bool rs_samplescan; /* true if this is really a sample scan */
bool rs_pageatatime; /* verify visibility page-at-a-time? */
bool rs_allow_strat; /* allow or disallow use of access strategy */
bool rs_allow_sync; /* allow or disallow use of syncscan */
diff --git a/src/include/catalog/indexing.h b/src/include/catalog/indexing.h
index a680229..e01bd0c 100644
--- a/src/include/catalog/indexing.h
+++ b/src/include/catalog/indexing.h
@@ -305,6 +305,11 @@ DECLARE_UNIQUE_INDEX(pg_policy_oid_index, 3257, on pg_policy using btree(oid oid
DECLARE_UNIQUE_INDEX(pg_policy_polrelid_polname_index, 3258, on pg_policy using btree(polrelid oid_ops, polname name_ops));
#define PolicyPolrelidPolnameIndexId 3258
+DECLARE_UNIQUE_INDEX(pg_tablesample_method_name_index, 3291, on pg_tablesample_method using btree(tsmname name_ops));
+#define TableSampleMethodNameIndexId 3291
+DECLARE_UNIQUE_INDEX(pg_tablesample_method_oid_index, 3292, on pg_tablesample_method using btree(oid oid_ops));
+#define TableSampleMethodOidIndexId 3292
+
/* last step of initialization script: build the indexes declared above */
BUILD_INDICES
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index d90ecc5..91aab0d 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -5182,6 +5182,31 @@ DESCR("rank of hypothetical row without gaps");
DATA(insert OID = 3993 ( dense_rank_final PGNSP PGUID 12 1 0 2276 0 f f f f f f i 2 0 20 "2281 2276" "{2281,2276}" "{i,v}" _null_ _null_ hypothetical_dense_rank_final _null_ _null_ _null_ ));
DESCR("aggregate final function");
+DATA(insert OID = 3295 ( tsm_system_init PGNSP PGUID 12 1 0 0 0 f f f f t f v 3 0 2278 "2281 23 700" _null_ _null_ _null_ _null_ tsm_system_init _null_ _null_ _null_ ));
+DESCR("tsm_system_init(internal)");
+DATA(insert OID = 3296 ( tsm_system_nextblock PGNSP PGUID 12 1 0 0 0 f f f f t f v 2 0 23 "2281 16" _null_ _null_ _null_ _null_ tsm_system_nextblock _null_ _null_ _null_ ));
+DESCR("tsm_system_nextblock(internal)");
+DATA(insert OID = 3297 ( tsm_system_nexttuple PGNSP PGUID 12 1 0 0 0 f f f f t f v 4 0 21 "2281 23 21 16" _null_ _null_ _null_ _null_ tsm_system_nexttuple _null_ _null_ _null_ ));
+DESCR("tsm_system_nexttuple(internal)");
+DATA(insert OID = 3298 ( tsm_system_end PGNSP PGUID 12 1 0 0 0 f f f f t f v 1 0 2278 "2281" _null_ _null_ _null_ _null_ tsm_system_end _null_ _null_ _null_ ));
+DESCR("tsm_system_end(internal)");
+DATA(insert OID = 3299 ( tsm_system_reset PGNSP PGUID 12 1 0 0 0 f f f f t f v 1 0 2278 "2281" _null_ _null_ _null_ _null_ tsm_system_reset _null_ _null_ _null_ ));
+DESCR("tsm_system_reset(internal)");
+DATA(insert OID = 3300 ( tsm_system_cost PGNSP PGUID 12 1 0 0 0 f f f f t f v 7 0 2278 "2281 2281 2281 2281 2281 2281 2281" _null_ _null_ _null_ _null_ tsm_system_cost _null_ _null_ _null_ ));
+DESCR("tsm_system_cost(internal)");
+
+DATA(insert OID = 3301 ( tsm_bernoulli_init PGNSP PGUID 12 1 0 0 0 f f f f t f v 3 0 2278 "2281 23 700" _null_ _null_ _null_ _null_ tsm_bernoulli_init _null_ _null_ _null_ ));
+DESCR("tsm_bernoulli_init(internal)");
+DATA(insert OID = 3302 ( tsm_bernoulli_nextblock PGNSP PGUID 12 1 0 0 0 f f f f t f v 2 0 23 "2281 16" _null_ _null_ _null_ _null_ tsm_bernoulli_nextblock _null_ _null_ _null_ ));
+DESCR("tsm_bernoulli_nextblock(internal)");
+DATA(insert OID = 3303 ( tsm_bernoulli_nexttuple PGNSP PGUID 12 1 0 0 0 f f f f t f v 4 0 21 "2281 23 21 16" _null_ _null_ _null_ _null_ tsm_bernoulli_nexttuple _null_ _null_ _null_ ));
+DESCR("tsm_bernoulli_nexttuple(internal)");
+DATA(insert OID = 3304 ( tsm_bernoulli_end PGNSP PGUID 12 1 0 0 0 f f f f t f v 1 0 2278 "2281" _null_ _null_ _null_ _null_ tsm_bernoulli_end _null_ _null_ _null_ ));
+DESCR("tsm_bernoulli_end(internal)");
+DATA(insert OID = 3306 ( tsm_bernoulli_reset PGNSP PGUID 12 1 0 0 0 f f f f t f v 1 0 2278 "2281" _null_ _null_ _null_ _null_ tsm_bernoulli_reset _null_ _null_ _null_ ));
+DESCR("tsm_bernoulli_reset(internal)");
+DATA(insert OID = 3307 ( tsm_bernoulli_cost PGNSP PGUID 12 1 0 0 0 f f f f t f v 7 0 2278 "2281 2281 2281 2281 2281 2281 2281" _null_ _null_ _null_ _null_ tsm_bernoulli_cost _null_ _null_ _null_ ));
+DESCR("tsm_bernoulli_cost(internal)");
/*
* Symbolic values for provolatile column: these indicate whether the result
diff --git a/src/include/catalog/pg_tablesample_method.h b/src/include/catalog/pg_tablesample_method.h
new file mode 100644
index 0000000..a58e1cf
--- /dev/null
+++ b/src/include/catalog/pg_tablesample_method.h
@@ -0,0 +1,78 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_tablesample_method.h
+ * definition of the table scan methods.
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/catalog/pg_tablesample_method.h
+ *
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PG_TABLESAMPLE_METHOD_H
+#define PG_TABLESAMPLE_METHOD_H
+
+#include "catalog/genbki.h"
+#include "catalog/objectaddress.h"
+
+/* ----------------
+ * pg_tablesample_method definition. cpp turns this into
+ * typedef struct FormData_pg_tablesample_method
+ * ----------------
+ */
+#define TableSampleMethodRelationId 3290
+
+CATALOG(pg_tablesample_method,3290)
+{
+ NameData tsmname; /* tablesample method name */
+ bool tsmseqscan; /* does this method scan whole table sequentially? */
+ bool tsmpagemode; /* does this method scan page at a time? */
+ regproc tsminit; /* init scan function */
+ regproc tsmnextblock; /* function returning next block to sample
+ or InvalidBlockOffset if finished */
+ regproc tsmnexttuple; /* function returning next tuple offset from current block
+ or InvalidOffsetNumber if end of the block was reacher */
+ regproc tsmexaminetuple; /* optional function which can examine tuple contents and
+ decide if tuple should be returned or not */
+ regproc tsmend; /* end scan function*/
+ regproc tsmreset; /* reset state - used by rescan */
+ regproc tsmcost; /* costing function */
+} FormData_pg_tablesample_method;
+
+/* ----------------
+ * Form_pg_tablesample_method corresponds to a pointer to a tuple with
+ * the format of pg_tablesample_method relation.
+ * ----------------
+ */
+typedef FormData_pg_tablesample_method *Form_pg_tablesample_method;
+
+/* ----------------
+ * compiler constants for pg_tablesample_method
+ * ----------------
+ */
+#define Natts_pg_tablesample_method 10
+#define Anum_pg_tablesample_method_tsmname 1
+#define Anum_pg_tablesample_method_tsmseqscan 2
+#define Anum_pg_tablesample_method_tsmpagemode 3
+#define Anum_pg_tablesample_method_tsminit 4
+#define Anum_pg_tablesample_method_tsmnextblock 5
+#define Anum_pg_tablesample_method_tsmnexttuple 6
+#define Anum_pg_tablesample_method_tsmexaminetuple 7
+#define Anum_pg_tablesample_method_tsmend 8
+#define Anum_pg_tablesample_method_tsmreset 9
+#define Anum_pg_tablesample_method_tsmcost 10
+
+/* ----------------
+ * initial contents of pg_tablesample_method
+ * ----------------
+ */
+
+DATA(insert OID = 3293 ( system false true tsm_system_init tsm_system_nextblock tsm_system_nexttuple - tsm_system_end tsm_system_reset tsm_system_cost ));
+DESCR("SYSTEM table sampling method");
+DATA(insert OID = 3294 ( bernoulli true false tsm_bernoulli_init tsm_bernoulli_nextblock tsm_bernoulli_nexttuple - tsm_bernoulli_end tsm_bernoulli_reset tsm_bernoulli_cost ));
+DESCR("BERNOULLI table sampling method");
+
+#endif /* PG_TABLESAMPLE_METHOD_H */
diff --git a/src/include/executor/nodeSamplescan.h b/src/include/executor/nodeSamplescan.h
new file mode 100644
index 0000000..4b769da
--- /dev/null
+++ b/src/include/executor/nodeSamplescan.h
@@ -0,0 +1,24 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeSamplescan.h
+ *
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/executor/nodeSamplescan.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef NODESAMPLESCAN_H
+#define NODESAMPLESCAN_H
+
+#include "nodes/execnodes.h"
+
+extern SampleScanState *ExecInitSampleScan(SampleScan *node, EState *estate, int eflags);
+extern TupleTableSlot *ExecSampleScan(SampleScanState *node);
+extern void ExecEndSampleScan(SampleScanState *node);
+extern void ExecReScanSampleScan(SampleScanState *node);
+
+#endif /* NODESAMPLESCAN_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index ac75f86..20edee4 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1216,6 +1216,24 @@ typedef struct ScanState
typedef ScanState SeqScanState;
/*
+ * SampleScan
+ */
+typedef struct SampleScanState
+{
+ ScanState ss;
+
+ /* Sampling method functions. */
+ FmgrInfo tsminit;
+ FmgrInfo tsmnextblock;
+ FmgrInfo tsmnexttuple;
+ FmgrInfo tsmexaminetuple;
+ FmgrInfo tsmend;
+ FmgrInfo tsmreset;
+
+ void *tsmdata; /* for use by table scan method */
+} SampleScanState;
+
+/*
* These structs store information about index quals that don't have simple
* constant right-hand sides. See comments for ExecIndexBuildScanKeys()
* for discussion.
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 38469ef..caaedbf 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -61,6 +61,7 @@ typedef enum NodeTag
T_ValuesScan,
T_CteScan,
T_WorkTableScan,
+ T_SampleScan,
T_ForeignScan,
T_CustomScan,
T_Join,
@@ -97,6 +98,7 @@ typedef enum NodeTag
T_BitmapOrState,
T_ScanState,
T_SeqScanState,
+ T_SampleScanState,
T_IndexScanState,
T_IndexOnlyScanState,
T_BitmapIndexScanState,
@@ -414,6 +416,8 @@ typedef enum NodeTag
T_WithClause,
T_CommonTableExpr,
T_RoleSpec,
+ T_RangeTableSample,
+ T_TableSampleClause,
/*
* TAGS FOR REPLICATION GRAMMAR PARSE NODES (replnodes.h)
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 0e257ac..a4288d1 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -334,6 +334,26 @@ typedef struct FuncCall
} FuncCall;
/*
+ * TableSampleClause - a sampling method information
+ */
+typedef struct TableSampleClause
+{
+ NodeTag type;
+ Oid tsmid;
+ bool tsmseqscan;
+ bool tsmpagemode;
+ Oid tsminit;
+ Oid tsmnextblock;
+ Oid tsmnexttuple;
+ Oid tsmexaminetuple;
+ Oid tsmend;
+ Oid tsmreset;
+ Oid tsmcost;
+ Node *repeatable;
+ List *args;
+} TableSampleClause;
+
+/*
* A_Star - '*' representing all columns of a table or compound field
*
* This can appear within ColumnRef.fields, A_Indirection.indirection, and
@@ -534,6 +554,21 @@ typedef struct RangeFunction
} RangeFunction;
/*
+ * RangeTableSample - represents <table> TABLESAMPLE <method> (<params>) REPEATABLE (<num>)
+ *
+ * We are more generic than SQL Standard so we pass generic function
+ * arguments to the sampling method.
+ */
+typedef struct RangeTableSample
+{
+ NodeTag type;
+ RangeVar *relation;
+ char *method; /* sampling method */
+ Node *repeatable;
+ List *args; /* arguments for sampling method */
+} RangeTableSample;
+
+/*
* ColumnDef - column definition (used in various creates)
*
* If the column has a default value, we may have the value expression
@@ -769,6 +804,7 @@ typedef struct RangeTblEntry
*/
Oid relid; /* OID of the relation */
char relkind; /* relation kind (see pg_class.relkind) */
+ TableSampleClause *tablesample; /* sampling method and parameters */
/*
* Fields valid for a subquery RTE (else NULL):
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 21cbfa8..ddc3708 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -279,6 +279,12 @@ typedef struct Scan
typedef Scan SeqScan;
/* ----------------
+ * table sample scan node
+ * ----------------
+ */
+typedef Scan SampleScan;
+
+/* ----------------
* index scan node
*
* indexqualorig is an implicitly-ANDed list of index qual expressions, each
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 9c2000b..24003ae 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -68,6 +68,7 @@ extern double index_pages_fetched(double tuples_fetched, BlockNumber pages,
double index_pages, PlannerInfo *root);
extern void cost_seqscan(Path *path, PlannerInfo *root, RelOptInfo *baserel,
ParamPathInfo *param_info);
+extern void cost_samplescan(Path *path, PlannerInfo *root, RelOptInfo *baserel);
extern void cost_index(IndexPath *path, PlannerInfo *root,
double loop_count);
extern void cost_bitmap_heap_scan(Path *path, PlannerInfo *root, RelOptInfo *baserel,
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 9923f0e..89c8ded 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -32,6 +32,8 @@ extern bool add_path_precheck(RelOptInfo *parent_rel,
extern Path *create_seqscan_path(PlannerInfo *root, RelOptInfo *rel,
Relids required_outer);
+extern Path *create_samplescan_path(PlannerInfo *root, RelOptInfo *rel,
+ Relids required_outer);
extern IndexPath *create_index_path(PlannerInfo *root,
IndexOptInfo *index,
List *indexclauses,
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index 7c243ec..ae90df8 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -366,6 +366,7 @@ PG_KEYWORD("sysid", SYSID, UNRESERVED_KEYWORD)
PG_KEYWORD("system", SYSTEM_P, UNRESERVED_KEYWORD)
PG_KEYWORD("table", TABLE, RESERVED_KEYWORD)
PG_KEYWORD("tables", TABLES, UNRESERVED_KEYWORD)
+PG_KEYWORD("tablesample", TABLESAMPLE, TYPE_FUNC_NAME_KEYWORD)
PG_KEYWORD("tablespace", TABLESPACE, UNRESERVED_KEYWORD)
PG_KEYWORD("temp", TEMP, UNRESERVED_KEYWORD)
PG_KEYWORD("template", TEMPLATE, UNRESERVED_KEYWORD)
diff --git a/src/include/parser/parse_func.h b/src/include/parser/parse_func.h
index 3264691..40c007c 100644
--- a/src/include/parser/parse_func.h
+++ b/src/include/parser/parse_func.h
@@ -33,6 +33,11 @@ typedef enum
extern Node *ParseFuncOrColumn(ParseState *pstate, List *funcname, List *fargs,
FuncCall *fn, int location);
+extern TableSampleClause *ParseTableSample(ParseState *pstate,
+ char *samplemethod,
+ Node *repeatable, List *args,
+ int location);
+
extern FuncDetailCode func_get_detail(List *funcname,
List *fargs, List *fargnames,
int nargs, Oid *argtypes,
diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h
index 2f5ede1..78175c1 100644
--- a/src/include/utils/lsyscache.h
+++ b/src/include/utils/lsyscache.h
@@ -152,6 +152,7 @@ extern void free_attstatsslot(Oid atttype,
float4 *numbers, int nnumbers);
extern char *get_namespace_name(Oid nspid);
extern Oid get_range_subtype(Oid rangeOid);
+extern char *get_tablesample_method_name(Oid tsmid);
#define type_is_array(typid) (get_element_type(typid) != InvalidOid)
/* type_is_array_domain accepts both plain arrays and domains over arrays */
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index 9e17d87..fd40366 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -63,7 +63,6 @@ typedef struct RelationAmInfo
FmgrInfo amcanreturn;
} RelationAmInfo;
-
/*
* Here are the contents of a relation cache entry.
*/
diff --git a/src/include/utils/sampling.h b/src/include/utils/sampling.h
index e3e7f9c..4ac208d 100644
--- a/src/include/utils/sampling.h
+++ b/src/include/utils/sampling.h
@@ -15,7 +15,12 @@
#include "storage/bufmgr.h"
-extern double sampler_random_fract(void);
+/* Random generator for sampling code */
+typedef unsigned short SamplerRandomState[3];
+
+extern void sampler_random_init_state(long seed,
+ SamplerRandomState randstate);
+extern double sampler_random_fract(SamplerRandomState randstate);
/* Block sampling methods */
/* Data structure for Algorithm S from Knuth 3.4.2 */
@@ -25,6 +30,7 @@ typedef struct
int n; /* desired sample size */
BlockNumber t; /* current block number */
int m; /* blocks selected so far */
+ SamplerRandomState randstate; /* random generator state */
} BlockSamplerData;
typedef BlockSamplerData *BlockSampler;
@@ -35,7 +41,12 @@ extern bool BlockSampler_HasMore(BlockSampler bs);
extern BlockNumber BlockSampler_Next(BlockSampler bs);
/* Reservoid sampling methods */
-typedef double ReservoirStateData;
+typedef struct
+{
+ double W;
+ SamplerRandomState randstate; /* random generator state */
+} ReservoirStateData;
+
typedef ReservoirStateData *ReservoirState;
extern void reservoir_init_selection_state(ReservoirState rs, int n);
diff --git a/src/include/utils/syscache.h b/src/include/utils/syscache.h
index ba0b090..6b628f6 100644
--- a/src/include/utils/syscache.h
+++ b/src/include/utils/syscache.h
@@ -79,6 +79,8 @@ enum SysCacheIdentifier
RELOID,
RULERELNAME,
STATRELATTINH,
+ TABLESAMPLEMETHODNAME,
+ TABLESAMPLEMETHODOID,
TABLESPACEOID,
TSCONFIGMAP,
TSCONFIGNAMENSP,
diff --git a/src/include/utils/tablesample.h b/src/include/utils/tablesample.h
new file mode 100644
index 0000000..1a24cb6
--- /dev/null
+++ b/src/include/utils/tablesample.h
@@ -0,0 +1,27 @@
+/*--------------------------------------------------------------------------
+ * tablesample.h
+ * Header file for builtin table sampling methods.
+ *
+ * Copyright (c) 2006-2014, PostgreSQL Global Development Group
+ *
+ * src/include/utils/tablesample.h
+ *--------------------------------------------------------------------------
+ */
+#ifndef TABLESAMPLE_H
+#define TABLESAMPLE_H
+
+extern Datum tsm_system_init(PG_FUNCTION_ARGS);
+extern Datum tsm_system_nextblock(PG_FUNCTION_ARGS);
+extern Datum tsm_system_nexttuple(PG_FUNCTION_ARGS);
+extern Datum tsm_system_end(PG_FUNCTION_ARGS);
+extern Datum tsm_system_reset(PG_FUNCTION_ARGS);
+extern Datum tsm_system_cost(PG_FUNCTION_ARGS);
+
+extern Datum tsm_bernoulli_init(PG_FUNCTION_ARGS);
+extern Datum tsm_bernoulli_nextblock(PG_FUNCTION_ARGS);
+extern Datum tsm_bernoulli_nexttuple(PG_FUNCTION_ARGS);
+extern Datum tsm_bernoulli_end(PG_FUNCTION_ARGS);
+extern Datum tsm_bernoulli_reset(PG_FUNCTION_ARGS);
+extern Datum tsm_bernoulli_cost(PG_FUNCTION_ARGS);
+
+#endif /* TABLESAMPLE_H */
diff --git a/src/test/regress/expected/sanity_check.out b/src/test/regress/expected/sanity_check.out
index c7be273..5946edf 100644
--- a/src/test/regress/expected/sanity_check.out
+++ b/src/test/regress/expected/sanity_check.out
@@ -127,6 +127,7 @@ pg_shdepend|t
pg_shdescription|t
pg_shseclabel|t
pg_statistic|t
+pg_tablesample_method|t
pg_tablespace|t
pg_trigger|t
pg_ts_config|t
diff --git a/src/test/regress/expected/tablesample.out b/src/test/regress/expected/tablesample.out
new file mode 100644
index 0000000..6a89689
--- /dev/null
+++ b/src/test/regress/expected/tablesample.out
@@ -0,0 +1,170 @@
+CREATE TABLE test_tablesample (id INT, name text) WITH (fillfactor=10); -- force smaller pages so we don't have to load too much data to get multiple pages
+INSERT INTO test_tablesample SELECT i, repeat(i::text, 200) FROM generate_series(0, 9) s(i) ORDER BY i;
+SELECT t.id FROM test_tablesample AS t TABLESAMPLE SYSTEM (50) REPEATABLE (10);
+ id
+----
+ 0
+ 1
+ 2
+ 3
+ 4
+ 5
+ 9
+(7 rows)
+
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (100.0/11) REPEATABLE (9999);
+ id
+----
+ 6
+ 7
+ 8
+(3 rows)
+
+SELECT count(*) FROM test_tablesample TABLESAMPLE SYSTEM (100);
+ count
+-------
+ 10
+(1 row)
+
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50) REPEATABLE (100);
+ id
+----
+ 0
+ 1
+ 2
+ 6
+ 7
+ 8
+ 9
+(7 rows)
+
+SELECT id FROM test_tablesample TABLESAMPLE BERNOULLI (50) REPEATABLE (100);
+ id
+----
+ 0
+ 1
+ 3
+ 4
+ 5
+(5 rows)
+
+SELECT id FROM test_tablesample TABLESAMPLE BERNOULLI (5.5) REPEATABLE (1);
+ id
+----
+ 0
+ 5
+(2 rows)
+
+CREATE VIEW test_tablesample_v1 AS SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (10*2) REPEATABLE (2);
+CREATE VIEW test_tablesample_v2 AS SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (99);
+SELECT pg_get_viewdef('test_tablesample_v1'::regclass);
+ pg_get_viewdef
+--------------------------------------------------------------------------------
+ SELECT test_tablesample.id +
+ FROM test_tablesample TABLESAMPLE system (((10 * 2))::real) REPEATABLE (2);
+(1 row)
+
+SELECT pg_get_viewdef('test_tablesample_v2'::regclass);
+ pg_get_viewdef
+-----------------------------------------------------------
+ SELECT test_tablesample.id +
+ FROM test_tablesample TABLESAMPLE system ((99)::real);
+(1 row)
+
+BEGIN;
+DECLARE tablesample_cur CURSOR FOR SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50) REPEATABLE (100);
+FETCH FIRST FROM tablesample_cur;
+ id
+----
+ 0
+(1 row)
+
+FETCH NEXT FROM tablesample_cur;
+ id
+----
+ 1
+(1 row)
+
+FETCH NEXT FROM tablesample_cur;
+ id
+----
+ 2
+(1 row)
+
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50) REPEATABLE (10);
+ id
+----
+ 0
+ 1
+ 2
+ 3
+ 4
+ 5
+ 9
+(7 rows)
+
+FETCH NEXT FROM tablesample_cur;
+ id
+----
+ 6
+(1 row)
+
+FETCH NEXT FROM tablesample_cur;
+ id
+----
+ 7
+(1 row)
+
+FETCH NEXT FROM tablesample_cur;
+ id
+----
+ 8
+(1 row)
+
+FETCH FIRST FROM tablesample_cur;
+ id
+----
+ 0
+(1 row)
+
+FETCH NEXT FROM tablesample_cur;
+ id
+----
+ 1
+(1 row)
+
+FETCH NEXT FROM tablesample_cur;
+ id
+----
+ 2
+(1 row)
+
+FETCH NEXT FROM tablesample_cur;
+ id
+----
+ 6
+(1 row)
+
+FETCH NEXT FROM tablesample_cur;
+ id
+----
+ 7
+(1 row)
+
+FETCH NEXT FROM tablesample_cur;
+ id
+----
+ 8
+(1 row)
+
+CLOSE tablesample_cur;
+END;
+-- should fail
+SELECT id FROM test_tablesample_v1 TABLESAMPLE BERNOULLI (1);
+ERROR: TABLESAMPLE clause can only be used on tables and materialized views
+LINE 1: SELECT id FROM test_tablesample_v1 TABLESAMPLE BERNOULLI (1)...
+ ^
+DROP TABLE test_tablesample CASCADE;
+NOTICE: drop cascades to 2 other objects
+DETAIL: drop cascades to view test_tablesample_v1
+drop cascades to view test_tablesample_v2
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 6d3b865..300e1fb 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -83,7 +83,7 @@ test: select_into select_distinct select_distinct_on select_implicit select_havi
# ----------
# Another group of parallel tests
# ----------
-test: brin gin gist spgist privileges security_label collate matview lock replica_identity rowsecurity object_address
+test: brin gin gist spgist privileges security_label collate matview lock replica_identity rowsecurity object_address tablesample
# ----------
# Another group of parallel tests
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 8326894..d815496 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -153,3 +153,4 @@ test: with
test: xml
test: event_trigger
test: stats
+test: tablesample
diff --git a/src/test/regress/sql/tablesample.sql b/src/test/regress/sql/tablesample.sql
new file mode 100644
index 0000000..d0c069c
--- /dev/null
+++ b/src/test/regress/sql/tablesample.sql
@@ -0,0 +1,42 @@
+CREATE TABLE test_tablesample (id INT, name text) WITH (fillfactor=10); -- force smaller pages so we don't have to load too much data to get multiple pages
+
+INSERT INTO test_tablesample SELECT i, repeat(i::text, 200) FROM generate_series(0, 9) s(i) ORDER BY i;
+
+SELECT t.id FROM test_tablesample AS t TABLESAMPLE SYSTEM (50) REPEATABLE (10);
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (100.0/11) REPEATABLE (9999);
+SELECT count(*) FROM test_tablesample TABLESAMPLE SYSTEM (100);
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50) REPEATABLE (100);
+SELECT id FROM test_tablesample TABLESAMPLE BERNOULLI (50) REPEATABLE (100);
+SELECT id FROM test_tablesample TABLESAMPLE BERNOULLI (5.5) REPEATABLE (1);
+
+CREATE VIEW test_tablesample_v1 AS SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (10*2) REPEATABLE (2);
+CREATE VIEW test_tablesample_v2 AS SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (99);
+SELECT pg_get_viewdef('test_tablesample_v1'::regclass);
+SELECT pg_get_viewdef('test_tablesample_v2'::regclass);
+
+BEGIN;
+DECLARE tablesample_cur CURSOR FOR SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50) REPEATABLE (100);
+FETCH FIRST FROM tablesample_cur;
+FETCH NEXT FROM tablesample_cur;
+FETCH NEXT FROM tablesample_cur;
+
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50) REPEATABLE (10);
+
+FETCH NEXT FROM tablesample_cur;
+FETCH NEXT FROM tablesample_cur;
+FETCH NEXT FROM tablesample_cur;
+
+FETCH FIRST FROM tablesample_cur;
+FETCH NEXT FROM tablesample_cur;
+FETCH NEXT FROM tablesample_cur;
+FETCH NEXT FROM tablesample_cur;
+FETCH NEXT FROM tablesample_cur;
+FETCH NEXT FROM tablesample_cur;
+
+CLOSE tablesample_cur;
+END;
+
+-- should fail
+SELECT id FROM test_tablesample_v1 TABLESAMPLE BERNOULLI (1);
+
+DROP TABLE test_tablesample CASCADE;
--
1.9.1
0003-tablesample-ddl-v7.patchtext/x-diff; name=0003-tablesample-ddl-v7.patchDownload
>From f5d469e898305ece335dff83ecb66bc59d4df81a Mon Sep 17 00:00:00 2001
From: Petr Jelinek <pjmodos@pjmodos.net>
Date: Wed, 7 Jan 2015 23:51:44 +0100
Subject: [PATCH 3/4] tablesample-ddl v7
---
doc/src/sgml/ref/allfiles.sgml | 2 +
doc/src/sgml/ref/create_tablesamplemethod.sgml | 184 ++++++++++
doc/src/sgml/ref/drop_tablesamplemethod.sgml | 87 +++++
doc/src/sgml/reference.sgml | 2 +
src/backend/catalog/dependency.c | 15 +-
src/backend/catalog/objectaddress.c | 65 +++-
src/backend/commands/Makefile | 6 +-
src/backend/commands/dropcmds.c | 4 +
src/backend/commands/event_trigger.c | 3 +
src/backend/commands/tablecmds.c | 1 +
src/backend/commands/tablesample.c | 398 +++++++++++++++++++++
src/backend/parser/gram.y | 14 +-
src/backend/tcop/utility.c | 12 +
src/backend/utils/cache/lsyscache.c | 31 ++
src/bin/pg_dump/common.c | 5 +
src/bin/pg_dump/pg_dump.c | 177 +++++++++
src/bin/pg_dump/pg_dump.h | 11 +-
src/bin/pg_dump/pg_dump_sort.c | 11 +-
src/include/catalog/dependency.h | 1 +
src/include/catalog/pg_tablesample_method.h | 10 +
src/include/nodes/parsenodes.h | 1 +
src/include/parser/kwlist.h | 1 +
src/include/utils/lsyscache.h | 1 +
src/test/modules/Makefile | 3 +-
src/test/modules/tablesample/.gitignore | 4 +
src/test/modules/tablesample/Makefile | 21 ++
.../modules/tablesample/expected/tablesample.out | 38 ++
src/test/modules/tablesample/sql/tablesample.sql | 14 +
src/test/modules/tablesample/tsm_test--1.0.sql | 52 +++
src/test/modules/tablesample/tsm_test.c | 228 ++++++++++++
src/test/modules/tablesample/tsm_test.control | 5 +
31 files changed, 1396 insertions(+), 11 deletions(-)
create mode 100644 doc/src/sgml/ref/create_tablesamplemethod.sgml
create mode 100644 doc/src/sgml/ref/drop_tablesamplemethod.sgml
create mode 100644 src/backend/commands/tablesample.c
create mode 100644 src/test/modules/tablesample/.gitignore
create mode 100644 src/test/modules/tablesample/Makefile
create mode 100644 src/test/modules/tablesample/expected/tablesample.out
create mode 100644 src/test/modules/tablesample/sql/tablesample.sql
create mode 100644 src/test/modules/tablesample/tsm_test--1.0.sql
create mode 100644 src/test/modules/tablesample/tsm_test.c
create mode 100644 src/test/modules/tablesample/tsm_test.control
diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index 5b4692f..d31a2db 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -78,6 +78,7 @@ Complete list of usable sgml source files in this directory.
<!ENTITY createServer SYSTEM "create_server.sgml">
<!ENTITY createTable SYSTEM "create_table.sgml">
<!ENTITY createTableAs SYSTEM "create_table_as.sgml">
+<!ENTITY createTablesampleMethod SYSTEM "create_tablesamplemethod.sgml">
<!ENTITY createTableSpace SYSTEM "create_tablespace.sgml">
<!ENTITY createTrigger SYSTEM "create_trigger.sgml">
<!ENTITY createTSConfig SYSTEM "create_tsconfig.sgml">
@@ -119,6 +120,7 @@ Complete list of usable sgml source files in this directory.
<!ENTITY dropSequence SYSTEM "drop_sequence.sgml">
<!ENTITY dropServer SYSTEM "drop_server.sgml">
<!ENTITY dropTable SYSTEM "drop_table.sgml">
+<!ENTITY dropTablesampleMethod SYSTEM "drop_tablesamplemethod.sgml">
<!ENTITY dropTableSpace SYSTEM "drop_tablespace.sgml">
<!ENTITY dropTrigger SYSTEM "drop_trigger.sgml">
<!ENTITY dropTSConfig SYSTEM "drop_tsconfig.sgml">
diff --git a/doc/src/sgml/ref/create_tablesamplemethod.sgml b/doc/src/sgml/ref/create_tablesamplemethod.sgml
new file mode 100644
index 0000000..ff105d2
--- /dev/null
+++ b/doc/src/sgml/ref/create_tablesamplemethod.sgml
@@ -0,0 +1,184 @@
+<!--
+doc/src/sgml/ref/create_tablesamplemethod.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="SQL-CREATETABLESAMPLEMETHOD">
+ <indexterm zone="sql-createtablesamplemethod">
+ <primary>CREATE TABLESAMPLE METHOD</primary>
+ </indexterm>
+
+ <refmeta>
+ <refentrytitle>CREATE TABLESAMPLE METHOD</refentrytitle>
+ <manvolnum>7</manvolnum>
+ <refmiscinfo>SQL - Language Statements</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+ <refname>CREATE TABLESAMPLE METHOD</refname>
+ <refpurpose>define custom tablesample method</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+<synopsis>
+CREATE TABLESAMPLE METHOD <replaceable class="parameter">name</replaceable> (
+ INIT = <replaceable class="parameter">init_function</replaceable> ,
+ NEXTBLOCK = <replaceable class="parameter">nextblock_function</replaceable> ,
+ NEXTTUPLE = <replaceable class="parameter">nexttuple_function</replaceable> ,
+ END = <replaceable class="parameter">end_function</replaceable> ,
+ RESET = <replaceable class="parameter">reset_function</replaceable> ,
+ COST = <replaceable class="parameter">cost_function</replaceable>
+ [ , EXAMINETUPLE = <replaceable class="parameter">examinetuple_function</replaceable> ]
+ [ , SEQSCAN = <replaceable class="parameter">seqscan</replaceable> ]
+ [ , PAGEMODE = <replaceable class="parameter">pagemode</replaceable> ]
+)
+</synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+ <title>Description</title>
+
+ <para>
+ <command>CREATE TABLESAMPLE METHOD</command> creates a tablesample method.
+ A tablesample method provides alrorithm for reading sample part of a table
+ when used in <command>TABLESAMPLE</> clause of a <command>SELECT</>
+ statement.
+ </para>
+
+ <para>
+ You must be a superuser to use <command>CREATE TABLESAMPLE METHOD</command>.
+ </para>
+ </refsect1>
+
+ <refsect1>
+ <title>Parameters</title>
+
+ <variablelist>
+ <varlistentry>
+ <term><replaceable class="parameter">name</replaceable></term>
+ <listitem>
+ <para>
+ The name of the tablesample method to be created. This name must be
+ unique within the database.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable class="parameter">init_function</replaceable></term>
+ <listitem>
+ <para>
+ The name of the init function for the tablesample method.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable class="parameter">nextblock_function</replaceable></term>
+ <listitem>
+ <para>
+ The name of the get-next-block function for the tablesample method.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable class="parameter">nexttuple_function</replaceable></term>
+ <listitem>
+ <para>
+ The name of the get-next-tuple function for the tablesample method.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable class="parameter">end_function</replaceable></term>
+ <listitem>
+ <para>
+ The name of the end function for the tablesample method.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable class="parameter">reset_function</replaceable></term>
+ <listitem>
+ <para>
+ The name of the reset function for the tablesample method.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable class="parameter">cost_function</replaceable></term>
+ <listitem>
+ <para>
+ The name of the costing function for the tablesample method.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable class="parameter">examinetuple_function</replaceable></term>
+ <listitem>
+ <para>
+ The name of the function for inspecting the tuple contents in order
+ to make decision if it should be returned or not. This parameter
+ is optional.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable class="parameter">seqscan</replaceable></term>
+ <listitem>
+ <para>
+ True if the sampling method will do sequential scan of the whole table.
+ Used for cost estimation and syncscan. The default value if not specified
+ is False.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable class="parameter">pagemode</replaceable></term>
+ <listitem>
+ <para>
+ True if the sampling method will read whole page at a time. The default
+ value if not specified is False.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ </variablelist>
+
+ <para>
+ The function names can be schema-qualified if necessary. Argument types
+ are not given, since the argument list for each type of function is
+ predetermined. All functions are required.
+ </para>
+
+ <para>
+ The arguments can appear in any order, not only the one shown above.
+ </para>
+ </refsect1>
+
+ <refsect1>
+ <title>Compatibility</title>
+
+ <para>
+ There is no
+ <command>CREATE TABLESAMPLE METHOD</command> statement in the SQL
+ standard.
+ </para>
+ </refsect1>
+
+ <refsect1>
+ <title>See Also</title>
+
+ <simplelist type="inline">
+ <member><xref linkend="sql-droptablesamplemethod"></member>
+ <member><xref linkend="sql-select"></member>
+ </simplelist>
+ </refsect1>
+</refentry>
diff --git a/doc/src/sgml/ref/drop_tablesamplemethod.sgml b/doc/src/sgml/ref/drop_tablesamplemethod.sgml
new file mode 100644
index 0000000..dffd2ec
--- /dev/null
+++ b/doc/src/sgml/ref/drop_tablesamplemethod.sgml
@@ -0,0 +1,87 @@
+<!--
+doc/src/sgml/ref/drop_tablesamplemethod.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="SQL-DROPTABLESAMPLEMETHOD">
+ <indexterm zone="sql-droptablesamplemethod">
+ <primary>DROP TABLESAMPLE METHOD</primary>
+ </indexterm>
+
+ <refmeta>
+ <refentrytitle>DROP TABLESAMPLE METHOD</refentrytitle>
+ <manvolnum>7</manvolnum>
+ <refmiscinfo>SQL - Language Statements</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+ <refname>DROP TABLESAMPLE METHOD</refname>
+ <refpurpose>remove a custom tablesample method</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+<synopsis>
+DROP TABLESAMPLE METHOD [ IF EXISTS ] <replaceable class="parameter">name</replaceable>
+</synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+ <title>Description</title>
+
+ <para>
+ <command>DROP TABLESAMPLE METHOD</command> drop an existing tablesample
+ method.
+ </para>
+
+ <para>
+ You must be a superuser to use <command>CREATE TABLESAMPLE METHOD</command>.
+ </para>
+ </refsect1>
+
+ <refsect1>
+ <title>Parameters</title>
+
+ <variablelist>
+
+ <varlistentry>
+ <term><literal>IF EXISTS</literal></term>
+ <listitem>
+ <para>
+ Do not throw an error if the tablesample method does not exist.
+ A notice is issued in this case.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable class="parameter">name</replaceable></term>
+ <listitem>
+ <para>
+ The name of an existing tablesample method to be removed.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ </variablelist>
+
+ </refsect1>
+
+ <refsect1>
+ <title>Compatibility</title>
+
+ <para>
+ There is no
+ <command>DROP TABLESAMPLE METHOD</command> statement in the SQL
+ standard.
+ </para>
+ </refsect1>
+
+ <refsect1>
+ <title>See Also</title>
+
+ <simplelist type="inline">
+ <member><xref linkend="sql-createtablesamplemethod"></member>
+ <member><xref linkend="sql-select"></member>
+ </simplelist>
+ </refsect1>
+</refentry>
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index 65ad795..4f55893 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -106,6 +106,7 @@
&createServer;
&createTable;
&createTableAs;
+ &createTablesampleMethod;
&createTableSpace;
&createTSConfig;
&createTSDictionary;
@@ -147,6 +148,7 @@
&dropSequence;
&dropServer;
&dropTable;
+ &dropTablesampleMethod;
&dropTableSpace;
&dropTSConfig;
&dropTSDictionary;
diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index bacb242..6acb5b3 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -46,6 +46,7 @@
#include "catalog/pg_policy.h"
#include "catalog/pg_proc.h"
#include "catalog/pg_rewrite.h"
+#include "catalog/pg_tablesample_method.h"
#include "catalog/pg_tablespace.h"
#include "catalog/pg_trigger.h"
#include "catalog/pg_ts_config.h"
@@ -157,7 +158,8 @@ static const Oid object_classes[MAX_OCLASS] = {
DefaultAclRelationId, /* OCLASS_DEFACL */
ExtensionRelationId, /* OCLASS_EXTENSION */
EventTriggerRelationId, /* OCLASS_EVENT_TRIGGER */
- PolicyRelationId /* OCLASS_POLICY */
+ PolicyRelationId, /* OCLASS_POLICY */
+ TableSampleMethodRelationId /* OCLASS_TABLESAMPLEMETHOD */
};
@@ -1265,6 +1267,10 @@ doDeletion(const ObjectAddress *object, int flags)
RemovePolicyById(object->objectId);
break;
+ case OCLASS_TABLESAMPLEMETHOD:
+ RemoveTablesampleMethodById(object->objectId);
+ break;
+
default:
elog(ERROR, "unrecognized object class: %u",
object->classId);
@@ -1794,6 +1800,10 @@ find_expr_references_walker(Node *node,
case RTE_RELATION:
add_object_address(OCLASS_CLASS, rte->relid, 0,
context->addrs);
+ if (rte->tablesample)
+ add_object_address(OCLASS_TABLESAMPLEMETHOD,
+ rte->tablesample->tsmid, 0,
+ context->addrs);
break;
default:
break;
@@ -2373,6 +2383,9 @@ getObjectClass(const ObjectAddress *object)
case PolicyRelationId:
return OCLASS_POLICY;
+
+ case TableSampleMethodRelationId:
+ return OCLASS_TABLESAMPLEMETHOD;
}
/* shouldn't get here */
diff --git a/src/backend/catalog/objectaddress.c b/src/backend/catalog/objectaddress.c
index e82a448..3f47076 100644
--- a/src/backend/catalog/objectaddress.c
+++ b/src/backend/catalog/objectaddress.c
@@ -44,6 +44,7 @@
#include "catalog/pg_proc.h"
#include "catalog/pg_policy.h"
#include "catalog/pg_rewrite.h"
+#include "catalog/pg_tablesample_method.h"
#include "catalog/pg_tablespace.h"
#include "catalog/pg_trigger.h"
#include "catalog/pg_ts_config.h"
@@ -429,7 +430,19 @@ static const ObjectPropertyType ObjectProperty[] =
Anum_pg_type_typacl,
ACL_KIND_TYPE,
true
- }
+ },
+ {
+ TableSampleMethodRelationId,
+ TableSampleMethodOidIndexId,
+ TABLESAMPLEMETHODOID,
+ TABLESAMPLEMETHODNAME,
+ Anum_pg_tablesample_method_tsmname,
+ InvalidAttrNumber,
+ InvalidAttrNumber,
+ InvalidAttrNumber,
+ -1,
+ true
+ },
};
/*
@@ -528,7 +541,9 @@ ObjectTypeMap[] =
/* OCLASS_EVENT_TRIGGER */
{ "event trigger", OBJECT_EVENT_TRIGGER },
/* OCLASS_POLICY */
- { "policy", OBJECT_POLICY }
+ { "policy", OBJECT_POLICY },
+ /* OCLASS_TABLESAMPLEMETHOD */
+ { "tablesample method", OBJECT_TABLESAMPLEMETHOD }
};
const ObjectAddress InvalidObjectAddress =
@@ -683,6 +698,7 @@ get_object_address(ObjectType objtype, List *objname, List *objargs,
case OBJECT_FDW:
case OBJECT_FOREIGN_SERVER:
case OBJECT_EVENT_TRIGGER:
+ case OBJECT_TABLESAMPLEMETHOD:
address = get_object_address_unqualified(objtype,
objname, missing_ok);
break;
@@ -921,6 +937,9 @@ get_object_address_unqualified(ObjectType objtype,
case OBJECT_EVENT_TRIGGER:
msg = gettext_noop("event trigger name cannot be qualified");
break;
+ case OBJECT_TABLESAMPLEMETHOD:
+ msg = gettext_noop("tablesample method name cannot be qualified");
+ break;
default:
elog(ERROR, "unrecognized objtype: %d", (int) objtype);
msg = NULL; /* placate compiler */
@@ -981,6 +1000,11 @@ get_object_address_unqualified(ObjectType objtype,
address.objectId = get_event_trigger_oid(name, missing_ok);
address.objectSubId = 0;
break;
+ case OBJECT_TABLESAMPLEMETHOD:
+ address.classId = TableSampleMethodRelationId;
+ address.objectId = get_tablesample_method_oid(name, missing_ok);
+ address.objectSubId = 0;
+ break;
default:
elog(ERROR, "unrecognized objtype: %d", (int) objtype);
/* placate compiler, which doesn't know elog won't return */
@@ -2044,6 +2068,7 @@ check_object_ownership(Oid roleid, ObjectType objtype, ObjectAddress address,
break;
case OBJECT_TSPARSER:
case OBJECT_TSTEMPLATE:
+ case OBJECT_TABLESAMPLEMETHOD:
/* We treat these object types as being owned by superusers */
if (!superuser_arg(roleid))
ereport(ERROR,
@@ -2982,6 +3007,21 @@ getObjectDescription(const ObjectAddress *object)
break;
}
+ case OCLASS_TABLESAMPLEMETHOD:
+ {
+ HeapTuple tup;
+
+ tup = SearchSysCache1(TABLESAMPLEMETHODOID,
+ ObjectIdGetDatum(object->objectId));
+ if (!HeapTupleIsValid(tup))
+ elog(ERROR, "cache lookup failed for tablesample method %u",
+ object->objectId);
+ appendStringInfo(&buffer, _("tablesample method %s"),
+ NameStr(((Form_pg_tablesample_method) GETSTRUCT(tup))->tsmname));
+ ReleaseSysCache(tup);
+ break;
+ }
+
default:
appendStringInfo(&buffer, "unrecognized object %u %u %d",
object->classId,
@@ -3459,6 +3499,10 @@ getObjectTypeDescription(const ObjectAddress *object)
appendStringInfoString(&buffer, "policy");
break;
+ case OCLASS_TABLESAMPLEMETHOD:
+ appendStringInfoString(&buffer, "tablesample method");
+ break;
+
default:
appendStringInfo(&buffer, "unrecognized %u", object->classId);
break;
@@ -4381,6 +4425,23 @@ getObjectIdentityParts(const ObjectAddress *object,
break;
}
+ case OCLASS_TABLESAMPLEMETHOD:
+ {
+ HeapTuple tup;
+ Form_pg_tablesample_method tsmForm;
+
+ tup = SearchSysCache1(TABLESAMPLEMETHODOID,
+ ObjectIdGetDatum(object->objectId));
+ if (!HeapTupleIsValid(tup))
+ elog(ERROR, "cache lookup failed for tablesample method %u",
+ object->objectId);
+ tsmForm = (Form_pg_tablesample_method) GETSTRUCT(tup);
+ appendStringInfoString(&buffer,
+ quote_identifier(NameStr(tsmForm->tsmname)));
+ ReleaseSysCache(tup);
+ break;
+ }
+
default:
appendStringInfo(&buffer, "unrecognized object %u %u %d",
object->classId,
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index b1ac704..04fcd8c 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -18,8 +18,8 @@ OBJS = aggregatecmds.o alter.o analyze.o async.o cluster.o comment.o \
event_trigger.o explain.o extension.o foreigncmds.o functioncmds.o \
indexcmds.o lockcmds.o matview.o operatorcmds.o opclasscmds.o \
policy.o portalcmds.o prepare.o proclang.o \
- schemacmds.o seclabel.o sequence.o tablecmds.o tablespace.o trigger.o \
- tsearchcmds.o typecmds.o user.o vacuum.o vacuumlazy.o \
- variable.o view.o
+ schemacmds.o seclabel.o sequence.o tablecmds.o tablesample.o \
+ tablespace.o trigger.o tsearchcmds.o typecmds.o user.o vacuum.o \
+ vacuumlazy.o variable.o view.o
include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/commands/dropcmds.c b/src/backend/commands/dropcmds.c
index a1b0d4d..c307dcf 100644
--- a/src/backend/commands/dropcmds.c
+++ b/src/backend/commands/dropcmds.c
@@ -429,6 +429,10 @@ does_not_exist_skipping(ObjectType objtype, List *objname, List *objargs)
}
}
break;
+ case OBJECT_TABLESAMPLEMETHOD:
+ msg = gettext_noop("tablesample method \"%s\" does not exist, skipping");
+ name = NameListToString(objname);
+ break;
default:
elog(ERROR, "unexpected object type (%d)", (int) objtype);
break;
diff --git a/src/backend/commands/event_trigger.c b/src/backend/commands/event_trigger.c
index 4bcc327..9e9ef38 100644
--- a/src/backend/commands/event_trigger.c
+++ b/src/backend/commands/event_trigger.c
@@ -97,6 +97,7 @@ static event_trigger_support_data event_trigger_support[] = {
{"SEQUENCE", true},
{"SERVER", true},
{"TABLE", true},
+ {"TABLESAMPLE METHOD", true},
{"TABLESPACE", false},
{"TRIGGER", true},
{"TEXT SEARCH CONFIGURATION", true},
@@ -1089,6 +1090,7 @@ EventTriggerSupportsObjectType(ObjectType obtype)
case OBJECT_SEQUENCE:
case OBJECT_TABCONSTRAINT:
case OBJECT_TABLE:
+ case OBJECT_TABLESAMPLEMETHOD:
case OBJECT_TRIGGER:
case OBJECT_TSCONFIGURATION:
case OBJECT_TSDICTIONARY:
@@ -1146,6 +1148,7 @@ EventTriggerSupportsObjectClass(ObjectClass objclass)
case OCLASS_DEFACL:
case OCLASS_EXTENSION:
case OCLASS_POLICY:
+ case OCLASS_TABLESAMPLEMETHOD:
return true;
case MAX_OCLASS:
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 06e4332..e527caf 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -8236,6 +8236,7 @@ ATExecAlterColumnType(AlteredTableInfo *tab, Relation rel,
case OCLASS_USER_MAPPING:
case OCLASS_DEFACL:
case OCLASS_EXTENSION:
+ case OCLASS_TABLESAMPLEMETHOD:
/*
* We don't expect any of these sorts of objects to depend on
diff --git a/src/backend/commands/tablesample.c b/src/backend/commands/tablesample.c
new file mode 100644
index 0000000..33581f6
--- /dev/null
+++ b/src/backend/commands/tablesample.c
@@ -0,0 +1,398 @@
+/*-------------------------------------------------------------------------
+ *
+ * tablesample.c
+ * Commands to manipulate tablesample methods
+ *
+ * Table sampling methods provide algorithms for doing sample scan over
+ * the table.
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ * src/backend/commands/tablesample.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <unistd.h>
+#include <dirent.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "catalog/dependency.h"
+#include "catalog/indexing.h"
+#include "catalog/objectaccess.h"
+#include "catalog/pg_proc.h"
+#include "catalog/pg_tablesample_method.h"
+#include "catalog/pg_type.h"
+#include "commands/defrem.h"
+#include "miscadmin.h"
+#include "parser/parse_func.h"
+#include "utils/builtins.h"
+#include "utils/rel.h"
+#include "utils/lsyscache.h"
+#include "utils/syscache.h"
+
+
+static Datum
+get_tablesample_method_func(DefElem *defel, int attnum)
+{
+ List *funcName = defGetQualifiedName(defel);
+ /* Big enough size for our needs. */
+ Oid *typeId = palloc0(7 * sizeof(Oid));
+ Oid retTypeId;
+ int nargs;
+ Oid procOid = InvalidOid;
+ FuncCandidateList clist;
+
+ switch (attnum)
+ {
+ case Anum_pg_tablesample_method_tsminit:
+ /*
+ * tsminit needs special handling because it is defined as function
+ * with 3 or more arguments and only first two arguments must have
+ * specific type, the rest is up to the tablesample method creator.
+ */
+ {
+ nargs = 2;
+ typeId[0] = INTERNALOID;
+ typeId[1] = INT4OID;
+ retTypeId = VOIDOID;
+
+ clist = FuncnameGetCandidates(funcName, -1, NIL, false, false, false);
+
+ while (clist)
+ {
+ if (clist->nargs >= 3 &&
+ memcmp(typeId, clist->args, nargs * sizeof(Oid)) == 0)
+ {
+ procOid = clist->oid;
+ /* Save real function signature for future errors. */
+ nargs = clist->nargs;
+ pfree(typeId);
+ typeId = clist->args;
+ break;
+ }
+ clist = clist->next;
+ }
+
+ if (!OidIsValid(procOid))
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_FUNCTION),
+ errmsg("function \"%s\" does not exist or does not have valid signature",
+ NameListToString(funcName)),
+ errhint("The tamplesample method init function "
+ "must have at least 3 input parameters "
+ "with first one of type INTERNAL and second of type INTEGER.")));
+ }
+ break;
+
+ case Anum_pg_tablesample_method_tsmnextblock:
+ nargs = 1;
+ typeId[0] = INTERNALOID;
+ retTypeId = INT4OID;
+ break;
+ case Anum_pg_tablesample_method_tsmnexttuple:
+ nargs = 3;
+ typeId[0] = INTERNALOID;
+ typeId[1] = INT4OID;
+ typeId[2] = INT2OID;
+ retTypeId = INT2OID;
+ break;
+ case Anum_pg_tablesample_method_tsmexaminetuple:
+ nargs = 4;
+ typeId[0] = INTERNALOID;
+ typeId[1] = INT4OID;
+ typeId[2] = INTERNALOID;
+ typeId[3] = BOOLOID;
+ retTypeId = BOOLOID;
+ break;
+ case Anum_pg_tablesample_method_tsmend:
+ case Anum_pg_tablesample_method_tsmreset:
+ nargs = 1;
+ typeId[0] = INTERNALOID;
+ retTypeId = VOIDOID;
+ break;
+ case Anum_pg_tablesample_method_tsmcost:
+ nargs = 7;
+ typeId[0] = INTERNALOID;
+ typeId[1] = INTERNALOID;
+ typeId[2] = INTERNALOID;
+ typeId[3] = INTERNALOID;
+ typeId[4] = INTERNALOID;
+ typeId[5] = INTERNALOID;
+ typeId[6] = INTERNALOID;
+ retTypeId = VOIDOID;
+ break;
+ default:
+ /* should not be here */
+ elog(ERROR, "unrecognized attribute for tablesample method: %d",
+ attnum);
+ nargs = 0; /* keep compiler quiet */
+ }
+
+ if (!OidIsValid(procOid))
+ procOid = LookupFuncName(funcName, nargs, typeId, false);
+ if (get_func_rettype(procOid) != retTypeId)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("function %s should return type %s",
+ func_signature_string(funcName, nargs, NIL, typeId),
+ format_type_be(retTypeId))));
+
+ return ObjectIdGetDatum(procOid);
+}
+
+/*
+ * make pg_depend entries for a new pg_tablesample_method entry
+ */
+static void
+makeTablesampleMethodDeps(HeapTuple tuple)
+{
+ Form_pg_tablesample_method tsm = (Form_pg_tablesample_method) GETSTRUCT(tuple);
+ ObjectAddress myself,
+ referenced;
+
+ myself.classId = TableSampleMethodRelationId;
+ myself.objectId = HeapTupleGetOid(tuple);
+ myself.objectSubId = 0;
+
+ /* dependency on extension */
+ recordDependencyOnCurrentExtension(&myself, false);
+
+ /* dependencies on functions */
+ referenced.classId = ProcedureRelationId;
+ referenced.objectSubId = 0;
+
+ referenced.objectId = tsm->tsminit;
+ recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);
+
+ referenced.objectId = tsm->tsmnextblock;
+ recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);
+
+ referenced.objectId = tsm->tsmnexttuple;
+ recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);
+
+ if (OidIsValid(tsm->tsmexaminetuple))
+ {
+ referenced.objectId = tsm->tsmexaminetuple;
+ recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);
+ }
+
+ referenced.objectId = tsm->tsmend;
+ recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);
+
+ referenced.objectId = tsm->tsmreset;
+ recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);
+
+ referenced.objectId = tsm->tsmcost;
+ recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);
+}
+
+/*
+ * Create a table sampling method
+ *
+ * Only superusers can create a table sampling methods.
+ */
+ObjectAddress
+DefineTablesampleMethod(List *names, List *parameters)
+{
+ char *tsmname = strVal(linitial(names));
+ Oid tsmoid;
+ ListCell *pl;
+ Relation rel;
+ Datum values[Natts_pg_tablesample_method];
+ bool nulls[Natts_pg_tablesample_method];
+ HeapTuple tuple;
+ ObjectAddress address;
+
+ /* Must be super user. */
+ if (!superuser())
+ ereport(ERROR,
+ (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+ errmsg("permission denied to create tablesample method \"%s\"",
+ tsmname),
+ errhint("Must be superuser to create a tablesample method.")));
+
+ /* Must not already exist. */
+ tsmoid = get_tablesample_method_oid(tsmname, true);
+ if (OidIsValid(tsmoid))
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_OBJECT),
+ errmsg("tablesample method \"%s\" already exists",
+ tsmname)));
+
+ /* Initialize the values. */
+ memset(values, 0, sizeof(values));
+ memset(nulls, false, sizeof(nulls));
+
+ values[Anum_pg_tablesample_method_tsmname - 1] =
+ DirectFunctionCall1(namein, CStringGetDatum(tsmname));
+
+ /*
+ * loop over the definition list and extract the information we need.
+ */
+ foreach(pl, parameters)
+ {
+ DefElem *defel = (DefElem *) lfirst(pl);
+
+ if (pg_strcasecmp(defel->defname, "seqscan") == 0)
+ {
+ values[Anum_pg_tablesample_method_tsmseqscan - 1] =
+ BoolGetDatum(defGetBoolean(defel));
+ }
+ else if (pg_strcasecmp(defel->defname, "pagemode") == 0)
+ {
+ values[Anum_pg_tablesample_method_tsmpagemode - 1] =
+ BoolGetDatum(defGetBoolean(defel));
+ }
+ else if (pg_strcasecmp(defel->defname, "init") == 0)
+ {
+ values[Anum_pg_tablesample_method_tsminit - 1] =
+ get_tablesample_method_func(defel,
+ Anum_pg_tablesample_method_tsminit);
+ }
+ else if (pg_strcasecmp(defel->defname, "nextblock") == 0)
+ {
+ values[Anum_pg_tablesample_method_tsmnextblock - 1] =
+ get_tablesample_method_func(defel,
+ Anum_pg_tablesample_method_tsmnextblock);
+ }
+ else if (pg_strcasecmp(defel->defname, "nexttuple") == 0)
+ {
+ values[Anum_pg_tablesample_method_tsmnexttuple - 1] =
+ get_tablesample_method_func(defel,
+ Anum_pg_tablesample_method_tsmnexttuple);
+ }
+ else if (pg_strcasecmp(defel->defname, "examinetuple") == 0)
+ {
+ values[Anum_pg_tablesample_method_tsmexaminetuple - 1] =
+ get_tablesample_method_func(defel,
+ Anum_pg_tablesample_method_tsmexaminetuple);
+ }
+ else if (pg_strcasecmp(defel->defname, "end") == 0)
+ {
+ values[Anum_pg_tablesample_method_tsmend - 1] =
+ get_tablesample_method_func(defel,
+ Anum_pg_tablesample_method_tsmend);
+ }
+ else if (pg_strcasecmp(defel->defname, "reset") == 0)
+ {
+ values[Anum_pg_tablesample_method_tsmreset - 1] =
+ get_tablesample_method_func(defel,
+ Anum_pg_tablesample_method_tsmreset);
+ }
+ else if (pg_strcasecmp(defel->defname, "cost") == 0)
+ {
+ values[Anum_pg_tablesample_method_tsmcost - 1] =
+ get_tablesample_method_func(defel,
+ Anum_pg_tablesample_method_tsmcost);
+ }
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("tablesample method parameter \"%s\" not recognized",
+ defel->defname)));
+ }
+
+ /*
+ * Validation.
+ */
+ if (!OidIsValid(DatumGetObjectId(values[Anum_pg_tablesample_method_tsminit - 1])))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("tablesample method init function is required")));
+
+ if (!OidIsValid(DatumGetObjectId(values[Anum_pg_tablesample_method_tsmnextblock - 1])))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("tablesample method nextblock function is required")));
+
+ if (!OidIsValid(DatumGetObjectId(values[Anum_pg_tablesample_method_tsmnexttuple - 1])))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("tablesample method nexttuple function is required")));
+
+ if (!OidIsValid(DatumGetObjectId(values[Anum_pg_tablesample_method_tsmend - 1])))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("tablesample method end function is required")));
+
+ if (!OidIsValid(DatumGetObjectId(values[Anum_pg_tablesample_method_tsmreset - 1])))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("tablesample method reset function is required")));
+
+ if (!OidIsValid(DatumGetObjectId(values[Anum_pg_tablesample_method_tsmcost - 1])))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("tablesample method cost function is required")));
+
+ /*
+ * Insert tuple into pg_tablesample_method.
+ */
+ rel = heap_open(TableSampleMethodRelationId, RowExclusiveLock);
+
+ tuple = heap_form_tuple(rel->rd_att, values, nulls);
+
+ tsmoid = simple_heap_insert(rel, tuple);
+
+ CatalogUpdateIndexes(rel, tuple);
+
+ makeTablesampleMethodDeps(tuple);
+
+ heap_freetuple(tuple);
+
+ /* Post creation hook for new tablesample method */
+ InvokeObjectPostCreateHook(TableSampleMethodRelationId, tsmoid, 0);
+
+ ObjectAddressSet(address, TableSampleMethodRelationId, tsmoid);
+
+ heap_close(rel, RowExclusiveLock);
+
+ return address;
+}
+
+/*
+ * Drop a tablesample method.
+ */
+void
+RemoveTablesampleMethodById(Oid tsmoid)
+{
+ Relation rel;
+ HeapTuple tuple;
+ Form_pg_tablesample_method tsm;
+
+ /*
+ * Find the target tuple
+ */
+ rel = heap_open(TableSampleMethodRelationId, RowExclusiveLock);
+
+ tuple = SearchSysCache1(TABLESAMPLEMETHODOID, ObjectIdGetDatum(tsmoid));
+ if (!HeapTupleIsValid(tuple))
+ elog(ERROR, "cache lookup failed for tablesample method %u",
+ tsmoid);
+
+ tsm = (Form_pg_tablesample_method) GETSTRUCT(tuple);
+ /* Can't drop builtin tablesample methods. */
+ if (tsmoid == TABLESAMPLE_METHOD_SYSTEM_OID ||
+ tsmoid == TABLESAMPLE_METHOD_BERNOULLI_OID)
+ ereport(ERROR,
+ (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+ errmsg("permission denied for tablesample method %s",
+ NameStr(tsm->tsmname))));
+
+ /*
+ * Remove the pg_tablespace tuple (this will roll back if we fail below)
+ */
+ simple_heap_delete(rel, &tuple->t_self);
+
+ ReleaseSysCache(tuple);
+
+ heap_close(rel, RowExclusiveLock);
+}
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 8ee6b40..e696ea9 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -590,7 +590,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
LEADING LEAKPROOF LEAST LEFT LEVEL LIKE LIMIT LISTEN LOAD LOCAL
LOCALTIME LOCALTIMESTAMP LOCATION LOCK_P LOCKED LOGGED
- MAPPING MATCH MATERIALIZED MAXVALUE MINUTE_P MINVALUE MODE MONTH_P MOVE
+ MAPPING MATCH MATERIALIZED MAXVALUE METHOD MINUTE_P MINVALUE MODE MONTH_P
+ MOVE
NAME_P NAMES NATIONAL NATURAL NCHAR NEXT NO NONE
NOT NOTHING NOTIFY NOTNULL NOWAIT NULL_P NULLIF
@@ -5103,6 +5104,15 @@ DefineStmt:
n->definition = list_make1(makeDefElem("from", (Node *) $5));
$$ = (Node *)n;
}
+ | CREATE TABLESAMPLE METHOD name definition
+ {
+ DefineStmt *n = makeNode(DefineStmt);
+ n->kind = OBJECT_TABLESAMPLEMETHOD;
+ n->args = NIL;
+ n->defnames = list_make1(makeString($4));
+ n->definition = $5;
+ $$ = (Node *)n;
+ }
;
definition: '(' def_list ')' { $$ = $2; }
@@ -5557,6 +5567,7 @@ drop_type: TABLE { $$ = OBJECT_TABLE; }
| TEXT_P SEARCH DICTIONARY { $$ = OBJECT_TSDICTIONARY; }
| TEXT_P SEARCH TEMPLATE { $$ = OBJECT_TSTEMPLATE; }
| TEXT_P SEARCH CONFIGURATION { $$ = OBJECT_TSCONFIGURATION; }
+ | TABLESAMPLE METHOD { $$ = OBJECT_TABLESAMPLEMETHOD; }
;
any_name_list:
@@ -13416,6 +13427,7 @@ unreserved_keyword:
| MATCH
| MATERIALIZED
| MAXVALUE
+ | METHOD
| MINUTE_P
| MINVALUE
| MODE
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index fd09d3a..cadf6b4 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -23,6 +23,7 @@
#include "access/xlog.h"
#include "catalog/catalog.h"
#include "catalog/namespace.h"
+#include "catalog/pg_tablesample_method.h"
#include "catalog/toasting.h"
#include "commands/alter.h"
#include "commands/async.h"
@@ -1136,6 +1137,11 @@ ProcessUtilitySlow(Node *parsetree,
Assert(stmt->args == NIL);
DefineCollation(stmt->defnames, stmt->definition);
break;
+ case OBJECT_TABLESAMPLEMETHOD:
+ Assert(stmt->args == NIL);
+ Assert(list_length(stmt->defnames) == 1);
+ DefineTablesampleMethod(stmt->defnames, stmt->definition);
+ break;
default:
elog(ERROR, "unrecognized define stmt type: %d",
(int) stmt->kind);
@@ -2004,6 +2010,9 @@ CreateCommandTag(Node *parsetree)
case OBJECT_POLICY:
tag = "DROP POLICY";
break;
+ case OBJECT_TABLESAMPLEMETHOD:
+ tag = "DROP TABLESAMPLE METHOD";
+ break;
default:
tag = "???";
}
@@ -2100,6 +2109,9 @@ CreateCommandTag(Node *parsetree)
case OBJECT_COLLATION:
tag = "CREATE COLLATION";
break;
+ case OBJECT_TABLESAMPLEMETHOD:
+ tag = "CREATE TABLESAMPLE METHOD";
+ break;
default:
tag = "???";
}
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index 311364c..3d2c959 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -2915,6 +2915,37 @@ get_range_subtype(Oid rangeOid)
/* ---------- PG_TABLESAMPLE_METHOD CACHE ---------- */
/*
+ * get_tablesample_method_oid - given a tablesample method name,
+ * look up the OID
+ *
+ * If missing_ok is false, throw an error if tablesample method name not found.
+ * If true, just return InvalidOid.
+ */
+Oid
+get_tablesample_method_oid(const char *tsmname, bool missing_ok)
+{
+ Oid result;
+ HeapTuple tuple;
+
+ tuple = SearchSysCache1(TABLESAMPLEMETHODNAME, PointerGetDatum(tsmname));
+ if (HeapTupleIsValid(tuple))
+ {
+ result = HeapTupleGetOid(tuple);
+ ReleaseSysCache(tuple);
+ }
+ else
+ result = InvalidOid;
+
+ if (!OidIsValid(result) && !missing_ok)
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_OBJECT),
+ errmsg("tablesample method \"%s\" does not exist",
+ tsmname)));
+
+ return result;
+}
+
+/*
* get_tablesample_method_name - given a tablesample method OID,
* look up the name or NULL if not found
*/
diff --git a/src/bin/pg_dump/common.c b/src/bin/pg_dump/common.c
index 1a0a587..8a64e4b 100644
--- a/src/bin/pg_dump/common.c
+++ b/src/bin/pg_dump/common.c
@@ -103,6 +103,7 @@ getSchemaData(Archive *fout, DumpOptions *dopt, int *numTablesPtr)
int numForeignServers;
int numDefaultACLs;
int numEventTriggers;
+ int numTSMs;
if (g_verbose)
write_msg(NULL, "reading schemas\n");
@@ -251,6 +252,10 @@ getSchemaData(Archive *fout, DumpOptions *dopt, int *numTablesPtr)
write_msg(NULL, "reading policies\n");
getPolicies(fout, tblinfo, numTables);
+ if (g_verbose)
+ write_msg(NULL, "reading tablesample methods\n");
+ getTableSampleMethods(fout, &numTSMs);
+
*numTablesPtr = numTables;
return tblinfo;
}
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 7da5c41..324ca4e 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -182,6 +182,7 @@ static void dumpSequenceData(Archive *fout, TableDataInfo *tdinfo);
static void dumpIndex(Archive *fout, DumpOptions *dopt, IndxInfo *indxinfo);
static void dumpConstraint(Archive *fout, DumpOptions *dopt, ConstraintInfo *coninfo);
static void dumpTableConstraintComment(Archive *fout, DumpOptions *dopt, ConstraintInfo *coninfo);
+static void dumpTableSampleMethod(Archive *fout, DumpOptions *dopt, TSMInfo *tbinfo);
static void dumpTSParser(Archive *fout, DumpOptions *dopt, TSParserInfo *prsinfo);
static void dumpTSDictionary(Archive *fout, DumpOptions *dopt, TSDictInfo *dictinfo);
static void dumpTSTemplate(Archive *fout, DumpOptions *dopt, TSTemplateInfo *tmplinfo);
@@ -7134,6 +7135,78 @@ getTableAttrs(Archive *fout, DumpOptions *dopt, TableInfo *tblinfo, int numTable
}
/*
+ * getTableSampleMethods:
+ * read all tablesample methods in the system catalogs and return them
+ * in the TSMInfo* structure
+ *
+ * numTSMs is set to the number of tablesample methods read in
+ */
+TSMInfo *
+getTableSampleMethods(Archive *fout, int *numTSMs)
+{
+ PGresult *res;
+ int ntups;
+ int i;
+ PQExpBuffer query;
+ TSMInfo *tsminfo;
+ int i_tableoid,
+ i_oid,
+ i_tsmname,
+ i_tsmseqscan,
+ i_tsmpagemode;
+
+ /* Before 9.5, there were no tablesample methods */
+ if (fout->remoteVersion < 90500)
+ {
+ *numTSMs = 0;
+ return NULL;
+ }
+
+ query = createPQExpBuffer();
+
+ appendPQExpBuffer(query,
+ "SELECT tableoid, oid, tsmname, tsmseqscan, tsmpagemode "
+ "FROM pg_catalog.pg_tablesample_method "
+ "WHERE oid >= '%u'::pg_catalog.oid",
+ FirstNormalObjectId);
+
+ res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+
+ ntups = PQntuples(res);
+ *numTSMs = ntups;
+
+ tsminfo = (TSMInfo *) pg_malloc(ntups * sizeof(TSMInfo));
+
+ i_tableoid = PQfnumber(res, "tableoid");
+ i_oid = PQfnumber(res, "oid");
+ i_tsmname = PQfnumber(res, "tsmname");
+ i_tsmseqscan = PQfnumber(res, "tsmseqscan");
+ i_tsmpagemode = PQfnumber(res, "tsmpagemode");
+
+ for (i = 0; i < ntups; i++)
+ {
+ tsminfo[i].dobj.objType = DO_TABLESAMPLE_METHOD;
+ tsminfo[i].dobj.catId.tableoid = atooid(PQgetvalue(res, i, i_tableoid));
+ tsminfo[i].dobj.catId.oid = atooid(PQgetvalue(res, i, i_oid));
+ AssignDumpId(&tsminfo[i].dobj);
+ tsminfo[i].dobj.name = pg_strdup(PQgetvalue(res, i, i_tsmname));
+ tsminfo[i].dobj.namespace = NULL;
+ tsminfo[i].tsmseqscan = PQgetvalue(res, i, i_tsmseqscan)[0] == 't';
+ tsminfo[i].tsmpagemode = PQgetvalue(res, i, i_tsmpagemode)[0] == 't';
+
+ /* Decide whether we want to dump it */
+ selectDumpableObject(&(tsminfo[i].dobj));
+ }
+
+ PQclear(res);
+
+ destroyPQExpBuffer(query);
+
+ return tsminfo;
+}
+
+
+/*
* Test whether a column should be printed as part of table's CREATE TABLE.
* Column number is zero-based.
*
@@ -8226,6 +8299,9 @@ dumpDumpableObject(Archive *fout, DumpOptions *dopt, DumpableObject *dobj)
case DO_POLICY:
dumpPolicy(fout, dopt, (PolicyInfo *) dobj);
break;
+ case DO_TABLESAMPLE_METHOD:
+ dumpTableSampleMethod(fout, dopt, (TSMInfo *) dobj);
+ break;
case DO_PRE_DATA_BOUNDARY:
case DO_POST_DATA_BOUNDARY:
/* never dumped, nothing to do */
@@ -12226,6 +12302,106 @@ dumpAgg(Archive *fout, DumpOptions *dopt, AggInfo *agginfo)
}
/*
+ * dumpTableSampleMethod
+ * write the declaration of one user-defined tablesample method
+ */
+static void
+dumpTableSampleMethod(Archive *fout, DumpOptions *dopt, TSMInfo *tsminfo)
+{
+ PGresult *res;
+ PQExpBuffer q;
+ PQExpBuffer delq;
+ PQExpBuffer labelq;
+ PQExpBuffer query;
+ char *tsminit;
+ char *tsmnextblock;
+ char *tsmnexttuple;
+ char *tsmexaminetuple;
+ char *tsmend;
+ char *tsmreset;
+ char *tsmcost;
+
+ /* Skip if not to be dumped */
+ if (!tsminfo->dobj.dump || dopt->dataOnly)
+ return;
+
+ q = createPQExpBuffer();
+ delq = createPQExpBuffer();
+ labelq = createPQExpBuffer();
+ query = createPQExpBuffer();
+
+ /* Make sure we are in proper schema */
+ selectSourceSchema(fout, "pg_catalog");
+
+ appendPQExpBuffer(query, "SELECT tsminit, tsmnextblock, "
+ "tsmnexttuple, tsmexaminetuple, "
+ "tsmend, tsmreset, tsmcost "
+ "FROM pg_catalog.pg_tablesample_method "
+ "WHERE oid = '%u'::pg_catalog.oid",
+ tsminfo->dobj.catId.oid);
+
+ res = ExecuteSqlQueryForSingleRow(fout, query->data);
+
+ tsminit = PQgetvalue(res, 0, PQfnumber(res, "tsminit"));
+ tsmnexttuple = PQgetvalue(res, 0, PQfnumber(res, "tsmnexttuple"));
+ tsmnextblock = PQgetvalue(res, 0, PQfnumber(res, "tsmnextblock"));
+ tsmexaminetuple = PQgetvalue(res, 0, PQfnumber(res, "tsmexaminetuple"));
+ tsmend = PQgetvalue(res, 0, PQfnumber(res, "tsmend"));
+ tsmreset = PQgetvalue(res, 0, PQfnumber(res, "tsmreset"));
+ tsmcost = PQgetvalue(res, 0, PQfnumber(res, "tsmcost"));
+
+ appendPQExpBuffer(q, "CREATE TABLESAMPLE METHOD %s (\n",
+ fmtId(tsminfo->dobj.name));
+
+ appendPQExpBuffer(q, " INIT = %s,\n", tsminit);
+ appendPQExpBuffer(q, " NEXTTUPLE = %s,\n", tsmnexttuple);
+ appendPQExpBuffer(q, " NEXTBLOCK = %s,\n", tsmnextblock);
+ appendPQExpBuffer(q, " END = %s,\n", tsmend);
+ appendPQExpBuffer(q, " RESET = %s,\n", tsmreset);
+ appendPQExpBuffer(q, " COST = %s", tsmcost);
+
+ if (strcmp(tsmexaminetuple, "-") != 0)
+ appendPQExpBuffer(q, ",\n EXAMINETUPLE = %s", tsmexaminetuple);
+
+ if (tsminfo->tsmseqscan)
+ appendPQExpBufferStr(q, ",\n SEQSCAN = true");
+
+ if (tsminfo->tsmpagemode)
+ appendPQExpBufferStr(q, ",\n PAGEMODE = true");
+
+ appendPQExpBufferStr(q, "\n);");
+
+ appendPQExpBuffer(delq, "DROP TABLESAMPLE METHOD %s",
+ fmtId(tsminfo->dobj.name));
+
+ appendPQExpBuffer(labelq, "TABLESAMPLE METHOD %s",
+ fmtId(tsminfo->dobj.name));
+
+ if (dopt->binary_upgrade)
+ binary_upgrade_extension_member(q, &tsminfo->dobj, labelq->data);
+
+ ArchiveEntry(fout, tsminfo->dobj.catId, tsminfo->dobj.dumpId,
+ tsminfo->dobj.name,
+ NULL,
+ NULL,
+ "",
+ false, "TABLESAMPLE METHOD", SECTION_PRE_DATA,
+ q->data, delq->data, NULL,
+ NULL, 0,
+ NULL, NULL);
+
+ /* Dump Parser Comments */
+ dumpComment(fout, dopt, labelq->data,
+ NULL, "",
+ tsminfo->dobj.catId, 0, tsminfo->dobj.dumpId);
+
+ PQclear(res);
+ destroyPQExpBuffer(q);
+ destroyPQExpBuffer(delq);
+ destroyPQExpBuffer(labelq);
+}
+
+/*
* dumpTSParser
* write out a single text search parser
*/
@@ -15659,6 +15835,7 @@ addBoundaryDependencies(DumpableObject **dobjs, int numObjs,
case DO_FDW:
case DO_FOREIGN_SERVER:
case DO_BLOB:
+ case DO_TABLESAMPLE_METHOD:
/* Pre-data objects: must come before the pre-data boundary */
addObjectDependency(preDataBound, dobj->dumpId);
break;
diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h
index a9d3c10..87bef24 100644
--- a/src/bin/pg_dump/pg_dump.h
+++ b/src/bin/pg_dump/pg_dump.h
@@ -76,7 +76,8 @@ typedef enum
DO_POST_DATA_BOUNDARY,
DO_EVENT_TRIGGER,
DO_REFRESH_MATVIEW,
- DO_POLICY
+ DO_POLICY,
+ DO_TABLESAMPLE_METHOD
} DumpableObjectType;
typedef struct _dumpableObject
@@ -383,6 +384,13 @@ typedef struct _inhInfo
Oid inhparent; /* OID of its parent */
} InhInfo;
+typedef struct _tsmInfo
+{
+ DumpableObject dobj;
+ bool tsmseqscan;
+ bool tsmpagemode;
+} TSMInfo;
+
typedef struct _prsInfo
{
DumpableObject dobj;
@@ -536,6 +544,7 @@ extern ProcLangInfo *getProcLangs(Archive *fout, int *numProcLangs);
extern CastInfo *getCasts(Archive *fout, DumpOptions *dopt, int *numCasts);
extern void getTableAttrs(Archive *fout, DumpOptions *dopt, TableInfo *tbinfo, int numTables);
extern bool shouldPrintColumn(DumpOptions *dopt, TableInfo *tbinfo, int colno);
+extern TSMInfo *getTableSampleMethods(Archive *fout, int *numTSMs);
extern TSParserInfo *getTSParsers(Archive *fout, int *numTSParsers);
extern TSDictInfo *getTSDictionaries(Archive *fout, int *numTSDicts);
extern TSTemplateInfo *getTSTemplates(Archive *fout, int *numTSTemplates);
diff --git a/src/bin/pg_dump/pg_dump_sort.c b/src/bin/pg_dump/pg_dump_sort.c
index c5ed593..9567cf6 100644
--- a/src/bin/pg_dump/pg_dump_sort.c
+++ b/src/bin/pg_dump/pg_dump_sort.c
@@ -73,7 +73,8 @@ static const int oldObjectTypePriority[] =
13, /* DO_POST_DATA_BOUNDARY */
20, /* DO_EVENT_TRIGGER */
15, /* DO_REFRESH_MATVIEW */
- 21 /* DO_POLICY */
+ 21, /* DO_POLICY */
+ 5 /* DO_TABLESAMPLE_METHOD */
};
/*
@@ -122,7 +123,8 @@ static const int newObjectTypePriority[] =
25, /* DO_POST_DATA_BOUNDARY */
32, /* DO_EVENT_TRIGGER */
33, /* DO_REFRESH_MATVIEW */
- 34 /* DO_POLICY */
+ 34, /* DO_POLICY */
+ 17 /* DO_TABLESAMPLE_METHOD */
};
static DumpId preDataBoundId;
@@ -1460,6 +1462,11 @@ describeDumpableObject(DumpableObject *obj, char *buf, int bufsize)
"POLICY (ID %d OID %u)",
obj->dumpId, obj->catId.oid);
return;
+ case DO_TABLESAMPLE_METHOD:
+ snprintf(buf, bufsize,
+ "TABLESAMPLE METHOD %s (ID %d OID %u)",
+ obj->name, obj->dumpId, obj->catId.oid);
+ return;
case DO_PRE_DATA_BOUNDARY:
snprintf(buf, bufsize,
"PRE-DATA BOUNDARY (ID %d)",
diff --git a/src/include/catalog/dependency.h b/src/include/catalog/dependency.h
index 6481ac8..30653f8 100644
--- a/src/include/catalog/dependency.h
+++ b/src/include/catalog/dependency.h
@@ -148,6 +148,7 @@ typedef enum ObjectClass
OCLASS_EXTENSION, /* pg_extension */
OCLASS_EVENT_TRIGGER, /* pg_event_trigger */
OCLASS_POLICY, /* pg_policy */
+ OCLASS_TABLESAMPLEMETHOD, /* pg_tablesample_method */
MAX_OCLASS /* MUST BE LAST */
} ObjectClass;
diff --git a/src/include/catalog/pg_tablesample_method.h b/src/include/catalog/pg_tablesample_method.h
index a58e1cf..82c15f3 100644
--- a/src/include/catalog/pg_tablesample_method.h
+++ b/src/include/catalog/pg_tablesample_method.h
@@ -72,7 +72,17 @@ typedef FormData_pg_tablesample_method *Form_pg_tablesample_method;
DATA(insert OID = 3293 ( system false true tsm_system_init tsm_system_nextblock tsm_system_nexttuple - tsm_system_end tsm_system_reset tsm_system_cost ));
DESCR("SYSTEM table sampling method");
+#define TABLESAMPLE_METHOD_SYSTEM_OID 3293
DATA(insert OID = 3294 ( bernoulli true false tsm_bernoulli_init tsm_bernoulli_nextblock tsm_bernoulli_nexttuple - tsm_bernoulli_end tsm_bernoulli_reset tsm_bernoulli_cost ));
DESCR("BERNOULLI table sampling method");
+#define TABLESAMPLE_METHOD_BERNOULLI_OID 3294
+
+/* ----------------
+ * functions for manipulation of pg_tablesample_method
+ * ----------------
+ */
+
+extern ObjectAddress DefineTablesampleMethod(List *names, List *parameters);
+extern void RemoveTablesampleMethodById(Oid tsmoid);
#endif /* PG_TABLESAMPLE_METHOD_H */
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index a4288d1..359119a 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -1291,6 +1291,7 @@ typedef enum ObjectType
OBJECT_SEQUENCE,
OBJECT_TABCONSTRAINT,
OBJECT_TABLE,
+ OBJECT_TABLESAMPLEMETHOD,
OBJECT_TABLESPACE,
OBJECT_TRIGGER,
OBJECT_TSCONFIGURATION,
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index ae90df8..902c189 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -236,6 +236,7 @@ PG_KEYWORD("mapping", MAPPING, UNRESERVED_KEYWORD)
PG_KEYWORD("match", MATCH, UNRESERVED_KEYWORD)
PG_KEYWORD("materialized", MATERIALIZED, UNRESERVED_KEYWORD)
PG_KEYWORD("maxvalue", MAXVALUE, UNRESERVED_KEYWORD)
+PG_KEYWORD("method", METHOD, UNRESERVED_KEYWORD)
PG_KEYWORD("minute", MINUTE_P, UNRESERVED_KEYWORD)
PG_KEYWORD("minvalue", MINVALUE, UNRESERVED_KEYWORD)
PG_KEYWORD("mode", MODE, UNRESERVED_KEYWORD)
diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h
index 78175c1..b1594bb 100644
--- a/src/include/utils/lsyscache.h
+++ b/src/include/utils/lsyscache.h
@@ -152,6 +152,7 @@ extern void free_attstatsslot(Oid atttype,
float4 *numbers, int nnumbers);
extern char *get_namespace_name(Oid nspid);
extern Oid get_range_subtype(Oid rangeOid);
+extern Oid get_tablesample_method_oid(const char *tsmname, bool missing_ok);
extern char *get_tablesample_method_name(Oid tsmid);
#define type_is_array(typid) (get_element_type(typid) != InvalidOid)
diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile
index 93d93af..37ea524 100644
--- a/src/test/modules/Makefile
+++ b/src/test/modules/Makefile
@@ -9,7 +9,8 @@ SUBDIRS = \
worker_spi \
dummy_seclabel \
test_shm_mq \
- test_parser
+ test_parser \
+ tablesample
all: submake-errcodes
diff --git a/src/test/modules/tablesample/.gitignore b/src/test/modules/tablesample/.gitignore
new file mode 100644
index 0000000..5dcb3ff
--- /dev/null
+++ b/src/test/modules/tablesample/.gitignore
@@ -0,0 +1,4 @@
+# Generated subdirectories
+/log/
+/results/
+/tmp_check/
diff --git a/src/test/modules/tablesample/Makefile b/src/test/modules/tablesample/Makefile
new file mode 100644
index 0000000..469b004
--- /dev/null
+++ b/src/test/modules/tablesample/Makefile
@@ -0,0 +1,21 @@
+# src/test/modules/tsm_test/Makefile
+
+MODULE_big = tsm_test
+OBJS = tsm_test.o $(WIN32RES)
+PGFILEDESC = "tsm_test - example of a custom tablesample method"
+
+EXTENSION = tsm_test
+DATA = tsm_test--1.0.sql
+
+REGRESS = tablesample
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/tablesample
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/src/test/modules/tablesample/expected/tablesample.out b/src/test/modules/tablesample/expected/tablesample.out
new file mode 100644
index 0000000..ad62e32
--- /dev/null
+++ b/src/test/modules/tablesample/expected/tablesample.out
@@ -0,0 +1,38 @@
+CREATE EXTENSION tsm_test;
+CREATE TABLE test_tsm AS SELECT md5(i::text) a, 0.5::float b FROM generate_series(1,10) g(i);
+SELECT * FROM test_tsm TABLESAMPLE tsm_test('b') REPEATABLE (1);
+ a | b
+----------------------------------+-----
+ c4ca4238a0b923820dcc509a6f75849b | 0.5
+ c81e728d9d4c2f636f067f89cc14862c | 0.5
+ a87ff679a2f3e71d9181a67b7542122c | 0.5
+ 1679091c5a880faf6fb5e6087eb1b2dc | 0.5
+ 8f14e45fceea167a5a36dedd4bea2543 | 0.5
+ d3d9446802a44259755d38e6d163e820 | 0.5
+(6 rows)
+
+CREATE VIEW test_tsm_v AS SELECT * FROM test_tsm TABLESAMPLE tsm_test('b') REPEATABLE (9999);
+SELECT * FROM test_tsm_v;
+ a | b
+----------------------------------+-----
+ c4ca4238a0b923820dcc509a6f75849b | 0.5
+ e4da3b7fbbce2345d7772b0674a318d5 | 0.5
+ 1679091c5a880faf6fb5e6087eb1b2dc | 0.5
+ 8f14e45fceea167a5a36dedd4bea2543 | 0.5
+ c9f0f895fb98ab9159f51fd0297e236d | 0.5
+(5 rows)
+
+DROP TABLESAMPLE METHOD tsm_test;
+ERROR: cannot drop tablesample method tsm_test because extension tsm_test requires it
+HINT: You can drop extension tsm_test instead.
+DROP EXTENSION tsm_test;
+ERROR: cannot drop extension tsm_test because other objects depend on it
+DETAIL: view test_tsm_v depends on tablesample method tsm_test
+HINT: Use DROP ... CASCADE to drop the dependent objects too.
+DROP EXTENSION tsm_test CASCADE;
+NOTICE: drop cascades to view test_tsm_v
+SELECT * FROM pg_tablesample_method WHERE tsmname = 'tsm_test';
+ tsmname | tsmseqscan | tsmpagemode | tsminit | tsmnextblock | tsmnexttuple | tsmexaminetuple | tsmend | tsmreset | tsmcost
+---------+------------+-------------+---------+--------------+--------------+-----------------+--------+----------+---------
+(0 rows)
+
diff --git a/src/test/modules/tablesample/sql/tablesample.sql b/src/test/modules/tablesample/sql/tablesample.sql
new file mode 100644
index 0000000..b1104d6
--- /dev/null
+++ b/src/test/modules/tablesample/sql/tablesample.sql
@@ -0,0 +1,14 @@
+CREATE EXTENSION tsm_test;
+
+CREATE TABLE test_tsm AS SELECT md5(i::text) a, 0.5::float b FROM generate_series(1,10) g(i);
+
+SELECT * FROM test_tsm TABLESAMPLE tsm_test('b') REPEATABLE (1);
+
+CREATE VIEW test_tsm_v AS SELECT * FROM test_tsm TABLESAMPLE tsm_test('b') REPEATABLE (9999);
+SELECT * FROM test_tsm_v;
+
+DROP TABLESAMPLE METHOD tsm_test;
+DROP EXTENSION tsm_test;
+DROP EXTENSION tsm_test CASCADE;
+
+SELECT * FROM pg_tablesample_method WHERE tsmname = 'tsm_test';
diff --git a/src/test/modules/tablesample/tsm_test--1.0.sql b/src/test/modules/tablesample/tsm_test--1.0.sql
new file mode 100644
index 0000000..e5a9ae8
--- /dev/null
+++ b/src/test/modules/tablesample/tsm_test--1.0.sql
@@ -0,0 +1,52 @@
+/* src/test/modules/tablesample/tsm_test--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION tsm_test" to load this file. \quit
+
+CREATE FUNCTION tsm_test_init(internal, int4, text)
+RETURNS void
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION tsm_test_nextblock(internal)
+RETURNS int4
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION tsm_test_nexttuple(internal, int4, int2)
+RETURNS int2
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION tsm_test_examinetuple(internal, int4, internal, bool)
+RETURNS bool
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION tsm_test_end(internal)
+RETURNS void
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION tsm_test_reset(internal)
+RETURNS void
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION tsm_test_cost(internal, internal, internal, internal, internal, internal, internal)
+RETURNS void
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+
+CREATE TABLESAMPLE METHOD tsm_test (
+ SEQSCAN = true,
+ PAGEMODE = true,
+ INIT = tsm_test_init,
+ NEXTBLOCK = tsm_test_nextblock,
+ NEXTTUPLE = tsm_test_nexttuple,
+ EXAMINETUPLE = tsm_test_examinetuple,
+ END = tsm_test_end,
+ RESET = tsm_test_reset,
+ COST = tsm_test_cost
+);
diff --git a/src/test/modules/tablesample/tsm_test.c b/src/test/modules/tablesample/tsm_test.c
new file mode 100644
index 0000000..be4dcb9
--- /dev/null
+++ b/src/test/modules/tablesample/tsm_test.c
@@ -0,0 +1,228 @@
+/*-------------------------------------------------------------------------
+ *
+ * tsm_test.c
+ * Simple example of a custom tablesample method
+ *
+ * Copyright (c) 2007-2014, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/test/modules/tablesample/tsm_test.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "fmgr.h"
+
+#include "access/htup_details.h"
+#include "access/relscan.h"
+#include "access/tupdesc.h"
+#include "catalog/pg_type.h"
+#include "nodes/execnodes.h"
+#include "nodes/relation.h"
+#include "optimizer/clauses.h"
+#include "storage/bufmgr.h"
+#include "utils/builtins.h"
+#include "utils/lsyscache.h"
+#include "utils/rel.h"
+#include "utils/sampling.h"
+#include "utils/tablesample.h"
+
+PG_MODULE_MAGIC;
+
+/* State */
+typedef struct
+{
+ uint32 seed; /* random seed */
+ AttrNumber attnum; /* column to check */
+ TupleDesc tupDesc; /* tuple descriptor of table */
+ BlockNumber startblock; /* starting block, we use ths for syncscan support */
+ BlockNumber nblocks; /* total blocks in relation */
+ BlockNumber blockno; /* current block */
+ OffsetNumber lt; /* last tuple returned from current block */
+ SamplerRandomState randstate; /* random generator state */
+} tsm_test_state;
+
+
+PG_FUNCTION_INFO_V1(tsm_test_init);
+PG_FUNCTION_INFO_V1(tsm_test_nextblock);
+PG_FUNCTION_INFO_V1(tsm_test_nexttuple);
+PG_FUNCTION_INFO_V1(tsm_test_examinetuple);
+PG_FUNCTION_INFO_V1(tsm_test_end);
+PG_FUNCTION_INFO_V1(tsm_test_reset);
+PG_FUNCTION_INFO_V1(tsm_test_cost);
+
+/*
+ * Initialize the state.
+ */
+Datum
+tsm_test_init(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ uint32 seed = PG_GETARG_UINT32(1);
+ char *attname;
+ AttrNumber attnum;
+ Oid atttype;
+ Relation rel = scanstate->ss.ss_currentRelation;
+ HeapScanDesc scan = scanstate->ss.ss_currentScanDesc;
+ tsm_test_state *state;
+ TupleDesc tupDesc = RelationGetDescr(rel);
+
+ if (PG_ARGISNULL(2))
+ ereport(ERROR,
+ (errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED),
+ errmsg("invalid parameter for tablesample method tsm_test"),
+ errhint("attnum cannot be NULL.")));
+
+ attname = text_to_cstring(PG_GETARG_TEXT_P(2));
+
+ attnum = get_attnum(rel->rd_id, attname);
+ if (!AttrNumberIsForUserDefinedAttr(attnum))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid parameter for tablesample method tsm_test"),
+ errhint("column %s does not exist", attname)));
+
+ atttype = get_atttype(rel->rd_id, attnum);
+ if (atttype != FLOAT8OID)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid parameter for tablesample method tsm_test"),
+ errhint("column %s is not of type float.", attname)));
+
+ state = palloc0(sizeof(tsm_test_state));
+
+ /* Remember initial values for reinit */
+ state->seed = seed;
+ state->attnum = attnum;
+ state->tupDesc = tupDesc;
+ state->startblock = scan->rs_startblock;
+ state->nblocks = scan->rs_nblocks;
+ state->blockno = InvalidBlockNumber;
+ state->lt = InvalidOffsetNumber;
+ sampler_random_init_state(state->seed, state->randstate);
+
+ scanstate->tsmdata = (void *) state;
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Get next block number to read or InvalidBlockNumber if we are at the
+ * end of the relation.
+ */
+Datum
+tsm_test_nextblock(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ tsm_test_state *state = (tsm_test_state *) scanstate->tsmdata;
+
+ /* Cycle from startblock to startblock to support syncscan. */
+ if (state->blockno == InvalidBlockNumber)
+ state->blockno = state->startblock;
+ else
+ {
+ state->blockno++;
+
+ if (state->blockno >= state->nblocks)
+ state->blockno = 0;
+
+ if (state->blockno == state->startblock)
+ PG_RETURN_UINT32(InvalidBlockNumber);
+ }
+
+ PG_RETURN_UINT32(state->blockno);
+}
+
+/*
+ * Get next tuple from current block.
+ */
+Datum
+tsm_test_nexttuple(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ OffsetNumber maxoffset = PG_GETARG_UINT16(2);
+ tsm_test_state *state = (tsm_test_state *) scanstate->tsmdata;
+
+ if (state->lt == InvalidOffsetNumber)
+ state->lt = FirstOffsetNumber;
+ else if (++state->lt > maxoffset)
+ PG_RETURN_UINT16(InvalidOffsetNumber);
+
+ PG_RETURN_UINT16(state->lt);
+}
+
+/*
+ * Examine tuple and decide if it should be returned.
+ */
+Datum
+tsm_test_examinetuple(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ HeapTuple tuple = (HeapTuple) PG_GETARG_POINTER(2);
+ bool visible = PG_GETARG_BOOL(3);
+ tsm_test_state *state = (tsm_test_state *) scanstate->tsmdata;
+ bool isnull;
+ float8 val, rand;
+
+ if (!visible)
+ PG_RETURN_BOOL(false);
+
+ val = DatumGetFloat8(heap_getattr(tuple, state->attnum, state->tupDesc, &isnull));
+ rand = sampler_random_fract(state->randstate);
+ if (isnull || val < rand)
+ PG_RETURN_BOOL(false);
+ else
+ PG_RETURN_BOOL(true);
+}
+
+/*
+ * Cleanup method.
+ */
+Datum
+tsm_test_end(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+
+ pfree(scanstate->tsmdata);
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Reset state (called by ReScan).
+ */
+Datum
+tsm_test_reset(PG_FUNCTION_ARGS)
+{
+ SampleScanState *scanstate = (SampleScanState *) PG_GETARG_POINTER(0);
+ tsm_test_state *state = (tsm_test_state *) scanstate->tsmdata;
+
+ state->blockno = InvalidBlockNumber;
+ state->lt = InvalidOffsetNumber;
+
+ sampler_random_init_state(state->seed, state->randstate);
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Costing function.
+ */
+Datum
+tsm_test_cost(PG_FUNCTION_ARGS)
+{
+ Path *path = (Path *) PG_GETARG_POINTER(1);
+ RelOptInfo *baserel = (RelOptInfo *) PG_GETARG_POINTER(2);
+ BlockNumber *pages = (BlockNumber *) PG_GETARG_POINTER(4);
+ double *tuples = (double *) PG_GETARG_POINTER(5);
+
+ *pages = baserel->pages;
+
+ /* This is very bad estimation */
+ *tuples = path->rows = path->rows/2;
+
+ PG_RETURN_VOID();
+}
+
diff --git a/src/test/modules/tablesample/tsm_test.control b/src/test/modules/tablesample/tsm_test.control
new file mode 100644
index 0000000..a7b2741
--- /dev/null
+++ b/src/test/modules/tablesample/tsm_test.control
@@ -0,0 +1,5 @@
+# tsm_test extension
+comment = 'test module for custom tablesample method'
+default_version = '1.0'
+module_pathname = '$libdir/tsm_test'
+relocatable = true
--
1.9.1
0004-tablesample-api-doc-v1.patchtext/x-diff; name=0004-tablesample-api-doc-v1.patchDownload
>From 53c567c6b456bf30c8f7c580af4ed6fcf59d4eea Mon Sep 17 00:00:00 2001
From: Petr Jelinek <pjmodos@pjmodos.net>
Date: Sun, 15 Mar 2015 17:39:22 +0100
Subject: [PATCH 4/4] tablesample api doc v1
---
doc/src/sgml/filelist.sgml | 1 +
doc/src/sgml/postgres.sgml | 1 +
doc/src/sgml/tablesample-method.sgml | 169 +++++++++++++++++++++++++++++++++++
3 files changed, 171 insertions(+)
create mode 100644 doc/src/sgml/tablesample-method.sgml
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 89fff77..23d932d 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -98,6 +98,7 @@
<!ENTITY protocol SYSTEM "protocol.sgml">
<!ENTITY sources SYSTEM "sources.sgml">
<!ENTITY storage SYSTEM "storage.sgml">
+<!ENTITY tablesample-method SYSTEM "tablesample-method.sgml">
<!-- contrib information -->
<!ENTITY contrib SYSTEM "contrib.sgml">
diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index e378d69..dc1f020 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -250,6 +250,7 @@
&gin;
&brin;
&storage;
+ &tablesample-method;
&bki;
&planstats;
diff --git a/doc/src/sgml/tablesample-method.sgml b/doc/src/sgml/tablesample-method.sgml
new file mode 100644
index 0000000..2d6d323
--- /dev/null
+++ b/doc/src/sgml/tablesample-method.sgml
@@ -0,0 +1,169 @@
+<!-- doc/src/sgml/tablesample-method.sgml -->
+
+<chapter id="tablesample-method">
+ <title>Writing A TABLESAMPLE Sampling Method</title>
+
+ <indexterm zone="tablesample-method">
+ <primary>tablesample method</primary>
+ </indexterm>
+
+ <para>
+ The <command>TABLESAMPLE</command> clause implementation in
+ <productname>PostgreSQL</> supports creating a custom sampling methods.
+ These methods control what sample of the table will be returned when the
+ <command>TABLESAMPLE</command> clause is used.
+ </para>
+
+ <sect1 id="tablesample-method-functions">
+ <title>Tablesample Method Functions</title>
+
+ <para>
+ The tablesample method must provide following set of functions:
+ </para>
+
+ <para>
+<programlisting>
+void
+tsm_init (SampleScanState *scanstate,
+ uint32 seed, ...);
+</programlisting>
+ Initialize the tablesample scan. The function is called at the beginning
+ of each relation scan.
+ </para>
+ <para>
+ Note that the first two parameters are required but you can specify
+ additional parameters which then will be used by the <command>TABLESAMPLE</>
+ clause to determine the required user input in the query itself.
+ This means that if your function will specify additional float4 parameter
+ named percent, the user will have to call the tablesample method with
+ expression which evaluates (or can be coerced) to float4.
+ For example this definition:
+<programlisting>
+tsm_init (SampleScanState *scanstate,
+ uint32 seed, float4 pct);
+</programlisting>
+Will lead to SQL call like this:
+<programlisting>
+... TABLESAMPLE yourmethod(0.5) ...
+</programlisting>
+ </para>
+
+ <para>
+<programlisting>
+BlockNumber
+tsm_nextblock (SampleScanState *scanstate);
+</programlisting>
+ Returns the block number of next page to be scanned. InvalidBlockNumber
+ should be returned if the sampling has reached end of the relation.
+ </para>
+
+ <para>
+<programlisting>
+OffsetNumber
+tsm_nexttuple (SampleScanState *scanstate, BlockNumber blockno,
+ OffsetNumber maxoffset);
+</programlisting>
+ Return next tuple offset for the current page. InvalidOffsetNumber should
+ be returned if the sampling has reached end of the page.
+ </para>
+
+ <para>
+<programlisting>
+void
+tsm_end (SampleScanState *scanstate);
+</programlisting>
+ The scan has finished, cleanup any left over state.
+ </para>
+
+ <para>
+<programlisting>
+void
+tsm_reset (SampleScanState *scanstate);
+</programlisting>
+ The scan needs to rescan the relation again, reset any tablesample method
+ state.
+ </para>
+
+ <para>
+<programlisting>
+void
+tsm_cost (PlannerInfo *root, Path *path, RelOptInfo *baserel,
+ List *args, BlockNumber *pages, double *tuples);
+</programlisting>
+ This function is used by optimizer to decide best plan and is also used
+ for output of <command>EXPLAIN</>.
+ </para>
+
+ <para>
+ There is function that tablesampling method can implement in order to gain
+ more fine grained control over sampling. This function is optional:
+ </para>
+
+ <para>
+<programlisting>
+bool
+tsm_examinetuple (SampleScanState *scanstate, BlockNumber blockno,
+ HeapTuple tuple, bool visible);
+</programlisting>
+ Function that enables the sampling method to examine contents of the tuple
+ (for example to collect some internal statistics). The return value of this
+ function is used to determine if the tuple should be returned to client.
+ Note that this function will receive even invisible tuples but it is not
+ allowed to return true for such tuple (if it does,
+ <productname>PostgreSQL</> will raise an error).
+ </para>
+
+ <para>
+ As you can see most of the tablesample method interfaces get the
+ <structname>SampleScanState</> as a first parameter. This structure holds
+ state of the current scan and also provides storage for the tablesample
+ method's state. It is defined as following:
+<programlisting>
+typedef struct SampleScanState
+{
+ ScanState ss;
+ FmgrInfo tsminit;
+ FmgrInfo tsmnextblock;
+ FmgrInfo tsmnexttuple;
+ FmgrInfo tsmexaminetuple;
+ FmgrInfo tsmend;
+ FmgrInfo tsmreset;
+ void *tsmdata;
+} SampleScanState;
+</programlisting>
+ Where <structfield>ss</> is the <structname>ScanState</> itself. From it, you
+ can get <structfield>ss_currentRelation</> (currently scanned relation) and
+ <structfield>ss_currentScanDesc</> (information about the scan).
+ Those are usually useful for the <function>tsm_init</> function.
+ The <structfield>tsminit</>, <structfield>tsmnextblock</>,
+ <structfield>tsmnexttuple</>, <structfield>tsmend</> and
+ <structfield>tsmreset</> are pointers to the tablesample method functions for
+ use by the sample scan itself and the tablesample method does not need to be
+ concerned about these values. The <structfield>tsmdata</> can be used by
+ tablesample method to store any state info it might need during the scan.
+ </para>
+ </sect1>
+
+ <sect1 id="tablesample-method-sql">
+ <title>Tablesample Method Installation</title>
+
+ <para>
+ Once you have written and built the custom tablesample method, you can
+ install it using the SQL statement
+ <xref linkend="sql-createtablesamplemethod"> and removed again using
+ <xref linkend="sql-droptablesamplemethod">.
+ </para>
+
+ </sect1>
+
+ <sect1 id="tablesample-method-example">
+ <title>Tablesample Method Example</title>
+
+ <para>
+ Example of how to implement custom tablesample method can be found in the
+ <productname>PostgreSQL</>'s sources under
+ <filename>src/test/modules/tablesample</> directory.
+ </para>
+ </sect1>
+
+</chapter>
--
1.9.1
On Mon, Apr 6, 2015 at 11:32 PM, Petr Jelinek <petr@2ndquadrant.com> wrote:
On 06/04/15 14:30, Petr Jelinek wrote:
On 06/04/15 11:02, Simon Riggs wrote:
Are we ready for a final detailed review and commit?
I plan to send v12 in the evening with some additional changes that came
up from Amit's comments + some improvements to error reporting. I think
it will be ready then.Ok so here it is.
Changes vs v11:
- changed input parameter list to expr_list
- improved error reporting, particularly when input parameters are wrong
- fixed SELECT docs to show correct syntax and mention that there can be
more sampling methods
- added name of the sampling method to the explain output - I don't like
the code much there as it has to look into RTE but on the other hand I
don't want to create new scan node just so it can hold the name of the
sampling method for explain
- made views containing TABLESAMPLE clause not autoupdatable
- added PageIsAllVisible() check before trying to check for tuple
visibility
- some typo/white space fixes
New changes looks fine to me except for one typo.
+ The optional parameter <literal>REPEATABLE</literal> acceps any
number
+ or expression producing a number and is used as random seed for
+ sampling.
typo
/acceps/accepts
So the patch for implementation of SYSTEM and BERNOULLI TABLESAMPLE
methods is "Ready For Committer". I could not get chance to review the
DDL patch (patch to implement user defined TABLESAMPLE methods).
Note to Committer - None of us is very clear on what should be the
implementation for Tablesample clause incase of UPDATE/DELETE
statement. I am of opinion that either we support to Update/Delete
based on Tablesample clause or prohibit it in all cases, however Peter
thinks it is okay to support for the same in FROM,USING clause of
Update, Delete respectively.
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Mon, Apr 6, 2015 at 11:02 AM, Petr Jelinek <petr@2ndquadrant.com> wrote:
On 06/04/15 14:30, Petr Jelinek wrote:
On 06/04/15 11:02, Simon Riggs wrote:
Are we ready for a final detailed review and commit?
I plan to send v12 in the evening with some additional changes that came
up from Amit's comments + some improvements to error reporting. I think
it will be ready then.Ok so here it is.
Changes vs v11:
- changed input parameter list to expr_list
- improved error reporting, particularly when input parameters are wrong
- fixed SELECT docs to show correct syntax and mention that there can be
more sampling methods
- added name of the sampling method to the explain output - I don't like
the code much there as it has to look into RTE but on the other hand I
don't want to create new scan node just so it can hold the name of the
sampling method for explain
- made views containing TABLESAMPLE clause not autoupdatable
- added PageIsAllVisible() check before trying to check for tuple
visibility
- some typo/white space fixes
Compiler warnings:
explain.c: In function 'ExplainNode':
explain.c:861: warning: 'sname' may be used uninitialized in this function
Docs spellings:
"PostgreSQL distrribution" extra r.
"The optional parameter REPEATABLE acceps" accepts. But I don't know
that 'accepts' is the right word. It makes the seed value sound optional
to REPEATABLE.
"each block having same chance" should have "the" before "same".
"Both of those sampling methods currently...". I think it should be
"these" not "those", as this sentence is immediately after their
introduction, not at a distance.
"...tuple contents and decides if to return in, or zero if none" Something
here is confusing. "return it", not "return in"?
Other comments:
Do we want tab completions for psql? (I will never remember how to spell
BERNOULLI).
Needs a Cat version bump.
Cheers,
Jeff
On Thu, Apr 9, 2015 at 4:30 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
On Mon, Apr 6, 2015 at 11:02 AM, Petr Jelinek <petr@2ndquadrant.com> wrote:
On 06/04/15 14:30, Petr Jelinek wrote:
On 06/04/15 11:02, Simon Riggs wrote:
Are we ready for a final detailed review and commit?
I plan to send v12 in the evening with some additional changes that came
up from Amit's comments + some improvements to error reporting. I think
it will be ready then.Ok so here it is.
Changes vs v11:
- changed input parameter list to expr_list
- improved error reporting, particularly when input parameters are wrong
- fixed SELECT docs to show correct syntax and mention that there can be
more sampling methods
- added name of the sampling method to the explain output - I don't like
the code much there as it has to look into RTE but on the other hand I don't
want to create new scan node just so it can hold the name of the sampling
method for explain
- made views containing TABLESAMPLE clause not autoupdatable
- added PageIsAllVisible() check before trying to check for tuple
visibility
- some typo/white space fixesCompiler warnings:
explain.c: In function 'ExplainNode':
explain.c:861: warning: 'sname' may be used uninitialized in this functionDocs spellings:
"PostgreSQL distrribution" extra r.
"The optional parameter REPEATABLE acceps" accepts. But I don't know that
'accepts' is the right word. It makes the seed value sound optional to
REPEATABLE."each block having same chance" should have "the" before "same".
"Both of those sampling methods currently...". I think it should be "these"
not "those", as this sentence is immediately after their introduction, not
at a distance."...tuple contents and decides if to return in, or zero if none" Something
here is confusing. "return it", not "return in"?Other comments:
Do we want tab completions for psql? (I will never remember how to spell
BERNOULLI).
Yes. I think so.
Needs a Cat version bump.
The committer who will pick up this patch will normally do it.
Patch 1 is simple enough and looks fine to me.
Regarding patch 2...
I found for now some typos:
+ <title><structname>pg_tabesample_method</structname></title>
+ <productname>PostgreSQL</productname> distrribution:
Also, I am wondering if the sampling logic based on block analysis is
actually correct, for example for now this fails and I think that we
should support it:
=# with query_select as (select generate_series(1, 10) as a) select
query_select.a from query_select tablesample system (100.0/11)
REPEATABLE (9999);
ERROR: 42P01: relation "query_select" does not exist
How does the SQL spec define exactly TABLESAMPLE? Shouldn't we get a
sample from a result set?
Thoughts?
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 9 April 2015 at 04:12, Michael Paquier <michael.paquier@gmail.com> wrote:
Also, I am wondering if the sampling logic based on block analysis is
actually correct, for example for now this fails and I think that we
should support it:
=# with query_select as (select generate_series(1, 10) as a) select
query_select.a from query_select tablesample system (100.0/11)
REPEATABLE (9999);
ERROR: 42P01: relation "query_select" does not existHow does the SQL spec define exactly TABLESAMPLE? Shouldn't we get a
sample from a result set?
Thoughts?
Good catch. The above query doesn't make any sense.
TABLESAMPLE SYSTEM implies system-defined sampling mechanism, which is
block level sampling. So any block level sampling method should be
barred from operating on a result set in this way... i.e. should
generate an "ERROR inappropriate sampling method specified"
TABLESAMPLE BERNOULLI could work in this case, or any other non-block
based sampling mechanism. Whether it does work yet is another matter.
This query should be part of the test suite and should generate a
useful message or work correctly.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, RemoteDBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Apr 9, 2015 at 5:12 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
On Thu, Apr 9, 2015 at 4:30 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
On Mon, Apr 6, 2015 at 11:02 AM, Petr Jelinek <petr@2ndquadrant.com> wrote:
On 06/04/15 14:30, Petr Jelinek wrote:
On 06/04/15 11:02, Simon Riggs wrote:
Are we ready for a final detailed review and commit?
I plan to send v12 in the evening with some additional changes that came
up from Amit's comments + some improvements to error reporting. I think
it will be ready then.Ok so here it is.
Changes vs v11:
- changed input parameter list to expr_list
- improved error reporting, particularly when input parameters are wrong
- fixed SELECT docs to show correct syntax and mention that there can be
more sampling methods
- added name of the sampling method to the explain output - I don't like
the code much there as it has to look into RTE but on the other hand I don't
want to create new scan node just so it can hold the name of the sampling
method for explain
- made views containing TABLESAMPLE clause not autoupdatable
- added PageIsAllVisible() check before trying to check for tuple
visibility
- some typo/white space fixesCompiler warnings:
explain.c: In function 'ExplainNode':
explain.c:861: warning: 'sname' may be used uninitialized in this functionDocs spellings:
"PostgreSQL distrribution" extra r.
"The optional parameter REPEATABLE acceps" accepts. But I don't know that
'accepts' is the right word. It makes the seed value sound optional to
REPEATABLE."each block having same chance" should have "the" before "same".
"Both of those sampling methods currently...". I think it should be "these"
not "those", as this sentence is immediately after their introduction, not
at a distance."...tuple contents and decides if to return in, or zero if none" Something
here is confusing. "return it", not "return in"?Other comments:
Do we want tab completions for psql? (I will never remember how to spell
BERNOULLI).Yes. I think so.
Needs a Cat version bump.
The committer who will pick up this patch will normally do it.
Patch 1 is simple enough and looks fine to me.
Regarding patch 2... I found for now some typos: + <title><structname>pg_tabesample_method</structname></title> + <productname>PostgreSQL</productname> distrribution:Also, I am wondering if the sampling logic based on block analysis is
actually correct, for example for now this fails and I think that we
should support it:
=# with query_select as (select generate_series(1, 10) as a) select
query_select.a from query_select tablesample system (100.0/11)
REPEATABLE (9999);
ERROR: 42P01: relation "query_select" does not existHow does the SQL spec define exactly TABLESAMPLE? Shouldn't we get a
sample from a result set?
Thoughts?
Just to be clear, the example above being misleading... Doing table
sampling using SYSTEM at physical level makes sense. In this case I
think that we should properly error out when trying to use this method
on something not present at physical level. But I am not sure that
this restriction applies to BERNOUILLI: you may want to apply it on
other things than physical relations, like views or results of WITH
clauses. Also, based on the fact that we support custom sampling
methods, I think that it should be up to the sampling method to define
on what kind of objects it supports sampling, and where it supports
sampling fetching, be it page-level fetching or analysis from an
existing set of tuples. Looking at the patch, TABLESAMPLE is just
allowed on tables and matviews, this limitation is too restrictive
IMO.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Apr 9, 2015 at 5:52 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
On 9 April 2015 at 04:12, Michael Paquier <michael.paquier@gmail.com> wrote:
Also, I am wondering if the sampling logic based on block analysis is
actually correct, for example for now this fails and I think that we
should support it:
=# with query_select as (select generate_series(1, 10) as a) select
query_select.a from query_select tablesample system (100.0/11)
REPEATABLE (9999);
ERROR: 42P01: relation "query_select" does not existHow does the SQL spec define exactly TABLESAMPLE? Shouldn't we get a
sample from a result set?
Thoughts?Good catch. The above query doesn't make any sense.
TABLESAMPLE SYSTEM implies system-defined sampling mechanism, which is
block level sampling. So any block level sampling method should be
barred from operating on a result set in this way... i.e. should
generate an "ERROR inappropriate sampling method specified"TABLESAMPLE BERNOULLI could work in this case, or any other non-block
based sampling mechanism. Whether it does work yet is another matter.This query should be part of the test suite and should generate a
useful message or work correctly.
Ahah, you just beat me on that ;) See a more precise reply below.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 9 April 2015 at 04:52, Simon Riggs <simon@2ndquadrant.com> wrote:
TABLESAMPLE BERNOULLI could work in this case, or any other non-block
based sampling mechanism. Whether it does work yet is another matter.This query should be part of the test suite and should generate a
useful message or work correctly.
The SQL Standard does allow the WITH query given. It makes no mention
of the obvious point that SYSTEM-defined mechanisms might not work,
but that is for the implementation to define, AFAICS.
The SQL Standard goes on to talk about "possibly non-deterministic"
issues. Which in Postgres relates to the point that the results of a
SampleScan will never be IMMUTABLE. That raises the possibility of
planner issues. We must, for example, never do inner join removal on a
sampled relation - we don't do that yet, but something to watch for.
On balance, in this release, I would be happier to exclude sampled
results from queries, and only allow sampling against base tables.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, RemoteDBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 09/04/15 11:37, Simon Riggs wrote:
On 9 April 2015 at 04:52, Simon Riggs <simon@2ndquadrant.com> wrote:
TABLESAMPLE BERNOULLI could work in this case, or any other non-block
based sampling mechanism. Whether it does work yet is another matter.This query should be part of the test suite and should generate a
useful message or work correctly.The SQL Standard does allow the WITH query given. It makes no mention
of the obvious point that SYSTEM-defined mechanisms might not work,
but that is for the implementation to define, AFAICS.
Yes SQL Standard allows this and the reason why they don't define what
happens with SYSTEM is that they actually don't define how SYSTEM should
behave except that it should return approximately given percentage of
rows, but the actual behavior is left to the DBMS. The reason why other
dbs like MSSQL or DB2 have chosen this to be block sampling is that it
makes most sense (and is fastest) on tables and those databases don't
support TABLESAMPLE on anything else at all.
On balance, in this release, I would be happier to exclude sampled
results from queries, and only allow sampling against base tables.
I think so too, considering how late in the last CF we are. Especially
given my note about MSSQL and DB2 above.
In any case I don't see any fundamental issues with extending the
current implementation with the subquery support. I think most of the
work there is actually in parser/analyzer and planner. The sampling
methods will just not receive the request for next blockid and tupleid
from that block when source of the data is subquery and if they want to
support subquery as source of sampling they will have to provide the
examinetuple interface (which is already there and optional, the
test/example custom sampling method is using it).
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 4/9/15 5:02 AM, Michael Paquier wrote:
Just to be clear, the example above being misleading... Doing table
sampling using SYSTEM at physical level makes sense. In this case I
think that we should properly error out when trying to use this method
on something not present at physical level. But I am not sure that
this restriction applies to BERNOUILLI: you may want to apply it on
other things than physical relations, like views or results of WITH
clauses. Also, based on the fact that we support custom sampling
methods, I think that it should be up to the sampling method to define
on what kind of objects it supports sampling, and where it supports
sampling fetching, be it page-level fetching or analysis from an
existing set of tuples. Looking at the patch, TABLESAMPLE is just
allowed on tables and matviews, this limitation is too restrictive
IMO.
In the SQL standard, the TABLESAMPLE clause is attached to a table
expression (<table primary>), which includes table functions,
subqueries, CTEs, etc. In the proposed patch, it is attached to a table
name, allowing only an ONLY clause. So this is a significant deviation.
Obviously, doing block sampling on a physical table is a significant use
case, but we should be clear about which restrictions and tradeoffs were
are making now and in the future, especially if we are going to present
extension interfaces. The fact that physical tables are interchangeable
with other relation types, at least in data-reading contexts, is a
feature worth preserving.
It may be worth thinking about some examples of other sampling methods,
in order to get a better feeling for whether the interfaces are appropriate.
Earlier in the thread, someone asked about supporting specifying a
number of rows instead of percents. While not essential, that seems
pretty useful, but I wonder how that could be implemented later on if we
take the approach that the argument to the sampling method can be an
arbitrary quantity that is interpreted only by the method.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 9 April 2015 at 15:30, Peter Eisentraut <peter_e@gmx.net> wrote:
On 4/9/15 5:02 AM, Michael Paquier wrote:
Just to be clear, the example above being misleading... Doing table
sampling using SYSTEM at physical level makes sense. In this case I
think that we should properly error out when trying to use this method
on something not present at physical level. But I am not sure that
this restriction applies to BERNOUILLI: you may want to apply it on
other things than physical relations, like views or results of WITH
clauses. Also, based on the fact that we support custom sampling
methods, I think that it should be up to the sampling method to define
on what kind of objects it supports sampling, and where it supports
sampling fetching, be it page-level fetching or analysis from an
existing set of tuples. Looking at the patch, TABLESAMPLE is just
allowed on tables and matviews, this limitation is too restrictive
IMO.In the SQL standard, the TABLESAMPLE clause is attached to a table
expression (<table primary>), which includes table functions,
subqueries, CTEs, etc. In the proposed patch, it is attached to a table
name, allowing only an ONLY clause. So this is a significant deviation.
There is no deviation from the standard in the current patch.
Currently we are 100% unimplemented feature; the patch would move us
directly towards a fully implemented feature, perhaps reduce to fully
implemented.
Obviously, doing block sampling on a physical table is a significant use
case
Very significant use case, which this patch addresses. Query result
sampling would not be a very interesting use case and was not even
thought of without the SQL Standard.
, but we should be clear about which restrictions and tradeoffs were
are making now and in the future, especially if we are going to present
extension interfaces. The fact that physical tables are interchangeable
with other relation types, at least in data-reading contexts, is a
feature worth preserving.
Agreed.
This patch does nothing to change that interchangeability. There is no
restriction or removal of current query capability.
It looks trivial to make it work for query results also, but if it is
not, ISTM something that can be added in a later release.
It may be worth thinking about some examples of other sampling methods,
in order to get a better feeling for whether the interfaces are appropriate.Earlier in the thread, someone asked about supporting specifying a
number of rows instead of percents. While not essential, that seems
pretty useful, but I wonder how that could be implemented later on if we
take the approach that the argument to the sampling method can be an
arbitrary quantity that is interpreted only by the method.
Not sure I understand that. The method could allow parameters of any unit.
Having a function-base implementation allows stratified sampling or
other approaches suited directly to user's data.
I don't think its reasonable to force all methods to offer both limits
on numbers of rows or percentages. They may not be applicable.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, RemoteDBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 09/04/15 21:30, Peter Eisentraut wrote:
In the SQL standard, the TABLESAMPLE clause is attached to a table
expression (<table primary>), which includes table functions,
subqueries, CTEs, etc. In the proposed patch, it is attached to a table
name, allowing only an ONLY clause. So this is a significant deviation.
I wouldn't call something that implements subset of standard a
deviation. Especially if other major dbs have chosen same approach
(afaik the only db that supports sampling over something besides
physical relations is Oracle but their sampling works slightly
differently than what standard has).
Obviously, doing block sampling on a physical table is a significant use
case, but we should be clear about which restrictions and tradeoffs were
are making now and in the future, especially if we are going to present
extension interfaces. The fact that physical tables are interchangeable
with other relation types, at least in data-reading contexts, is a
feature worth preserving.
Yes, but I don't think there is anything that prevents us from adding
this in the future. The sampling scan could made to be able to read both
directly from heap and from executor subnode which is doable even if it
won't be extremely pretty (but it should be easy to encapsulate into 2
internal interfaces as the heap reading is encapsulated to 1 internal
interface already). Another approach would be having two different
executor nodes - SampingScan and SamplingFilter and letting planner pick
one depending on what is the source for TABLESAMPLE clause.
The extension api is currently mainly:
nextblock - gets next blockid to read from heap
nextuple - gets next tupleid to read current block
examinetuple - lets the extension to decide if tuple should be indeed
returned (this one is optional)
For the executor node reading we'd probably just call the examinetuple
as there are no block ids or tuple ids there. This makes the API look
slightly schizophrenic but on the other hand it gives the plugins
control over how physical relation is read if that's indeed the source.
And I guess we could let the plugin specify if it supports the heap
access (nextblock/nexttuple) and if it doesn't then planner would always
choose SamplingFilter over SequentialScan for physical relation instead
of SamplingScan.
All of this is possible to add without breaking compatibility with what
is proposed for commit currently.
The reasons why we need the nextblock and nexttuple interfaces and the
ability to read the heap directly are a) block sampling can't be done by
reading from another executor node, b) performance.
It may be worth thinking about some examples of other sampling methods,
in order to get a better feeling for whether the interfaces are appropriate.
There is one additional method which is just purely for testing the
interface and that uses column value to determine if the tuple should be
returned or not (which is useless in practice obviously as you can do
that using WHERE, it just shows how to use the interface).
I would like to eventually have something that's time limited rather
than size limited for example. I didn't think much about other sampling
algorithms but Simon proposed some and they should work with the current
API.
Earlier in the thread, someone asked about supporting specifying a
number of rows instead of percents. While not essential, that seems
pretty useful, but I wonder how that could be implemented later on if we
take the approach that the argument to the sampling method can be an
arbitrary quantity that is interpreted only by the method.
Well, you can have two approaches to this, either allow some specific
set of keywords that can be used to specify limit, or you let sampling
methods interpret parameters, I believe the latter is more flexible.
There is nothing stopping somebody writing sampling method which takes
limit as number of rows, or anything else.
Also for example for BERNOULLI to work correctly you'd need to convert
the number of rows to fraction of table anyway (and that's exactly what
the one database which has this feature does internally) and then it's
no different than passing (SELECT 100/reltuples*number_of_rows FROM
tablename) as a parameter.
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Apr 10, 2015 at 9:58 AM, Petr Jelinek <petr@2ndquadrant.com> wrote:
On 09/04/15 21:30, Peter Eisentraut wrote:
In the SQL standard, the TABLESAMPLE clause is attached to a table
expression (<table primary>), which includes table functions,
subqueries, CTEs, etc. In the proposed patch, it is attached to a table
name, allowing only an ONLY clause. So this is a significant deviation.I wouldn't call something that implements subset of standard a deviation.
Especially if other major dbs have chosen same approach (afaik the only db
that supports sampling over something besides physical relations is Oracle
but their sampling works slightly differently than what standard has).Obviously, doing block sampling on a physical table is a significant use
case, but we should be clear about which restrictions and tradeoffs were
are making now and in the future, especially if we are going to present
extension interfaces. The fact that physical tables are interchangeable
with other relation types, at least in data-reading contexts, is a
feature worth preserving.Yes, but I don't think there is anything that prevents us from adding this
in the future. The sampling scan could made to be able to read both directly
from heap and from executor subnode which is doable even if it won't be
extremely pretty (but it should be easy to encapsulate into 2 internal
interfaces as the heap reading is encapsulated to 1 internal interface
already). Another approach would be having two different executor nodes -
SampingScan and SamplingFilter and letting planner pick one depending on
what is the source for TABLESAMPLE clause.The extension api is currently mainly:
nextblock - gets next blockid to read from heap
nextuple - gets next tupleid to read current block
examinetuple - lets the extension to decide if tuple should be indeed
returned (this one is optional)For the executor node reading we'd probably just call the examinetuple as
there are no block ids or tuple ids there. This makes the API look slightly
schizophrenic but on the other hand it gives the plugins control over how
physical relation is read if that's indeed the source. And I guess we could
let the plugin specify if it supports the heap access (nextblock/nexttuple)
and if it doesn't then planner would always choose SamplingFilter over
SequentialScan for physical relation instead of SamplingScan.All of this is possible to add without breaking compatibility with what is
proposed for commit currently.The reasons why we need the nextblock and nexttuple interfaces and the
ability to read the heap directly are a) block sampling can't be done by
reading from another executor node, b) performance.It may be worth thinking about some examples of other sampling methods,
in order to get a better feeling for whether the interfaces are
appropriate.There is one additional method which is just purely for testing the
interface and that uses column value to determine if the tuple should be
returned or not (which is useless in practice obviously as you can do that
using WHERE, it just shows how to use the interface).I would like to eventually have something that's time limited rather than
size limited for example. I didn't think much about other sampling
algorithms but Simon proposed some and they should work with the current
API.Earlier in the thread, someone asked about supporting specifying a
number of rows instead of percents. While not essential, that seems
pretty useful, but I wonder how that could be implemented later on if we
take the approach that the argument to the sampling method can be an
arbitrary quantity that is interpreted only by the method.Well, you can have two approaches to this, either allow some specific set of
keywords that can be used to specify limit, or you let sampling methods
interpret parameters, I believe the latter is more flexible. There is
nothing stopping somebody writing sampling method which takes limit as
number of rows, or anything else.Also for example for BERNOULLI to work correctly you'd need to convert the
number of rows to fraction of table anyway (and that's exactly what the one
database which has this feature does internally) and then it's no different
than passing (SELECT 100/reltuples*number_of_rows FROM tablename) as a
parameter.
Mentioning again that patch 1 is interesting as a separate change to
move the sampling logic out of the ANALYZE code in its own portion.
I had a look at patch 2. Here are some comments:
1)
+ <row>
+ <entry><structfield>tsmseqscan</structfield></entry>
+ <entry><type>bool</type></entry>
+ <entry></entry>
+ <entry>Does the sampling method scan the whole table sequentially?
+ </entry>
+ </row>
+
+ <row>
+ <entry><structfield>tsmpagemode</structfield></entry>
+ <entry><type>bool</type></entry>
+ <entry></entry>
+ <entry>Does the sampling method always read whole pages?
+ </entry>
+ </row>
I think that those descriptions using question marks are not adapted.
They could be reformulated as follows:
If true, the sampling method scans the whole table sequentially.
If true, the sampling method always reads the pages completely.
2) Shouldn't there be some EXPLAIN output in the regression tests?
3) The documentation should clearly state that TABLESAMPLE can only be
used on matviews and tables, and can only accept items directly
referenced in a FROM clause, aka no WITH or no row subsets in a
subquery. As things stand, TABLESAMPLE being mentioned in the line of
ONLY, users may think that views are supported as ONLY can be used
with views as well.
4)
+ The <literal>BERNOULLI</literal> scans whole table and returns
+ individual rows with equal probability. Additional sampling methods
+ may be installed in the database via extensions.
In this patch extensions are mentioned but this is implemented only in
patch 3. Hence this reference should be removed.
5)
- * whether a nondefault buffer access strategy can be used, and whether
+ * whether a nondefault buffer access strategy can be used and whether
Noise here?
6) If the sample method strategies are extended at some point, I think
that you want to use a bitmap in heap_beginscan_ss and friends and not
a set of boolean arguments.
7) This is surprising and I can't understand why things are mixed up here:
- scan->rs_pageatatime = IsMVCCSnapshot(snapshot);
+ scan->rs_pageatatime = allow_pagemode && IsMVCCSnapshot(snapshot);
Isn't what you want here a different parameter? I am sure that we do
not want to mix up visibility with MVCC snapshots and page mode.
8) I have done some tests with RLS and things seem to work (filters of
SELECT policies are taken into account after fetching the tuples from
the scan), but I would recommend adding some regression tests in this
area as TABLESAMPLE is a new type of physical heap scan.
9) s/visibilty/visibility
10) s/acceps/accepts.
11) s/dont't/don't
12) Is actually tsmexaminetuple necessary now? This is not used in
this patch at all, and looks to me like an optimization needed for
custom sampling methods. Keeping in mind the core feature I think that
this should be removed for now, let's reintroduce it later if there is
a real test case that shows up, catalogs are extensible AFAIK.
13) Some regression tests with pg_tablesample_method would be welcome.
14) Some comments about this part:
+void
+sampler_random_init_state(long seed, SamplerRandomState randstate)
+{
+ randstate[0] = 0x330e;
+ randstate[1] = (unsigned short) seed;
+ randstate[2] = (unsigned short) (seed >> 16);
+}
/* Select a random value R uniformly distributed in (0 - 1) */
double
-sampler_random_fract()
+sampler_random_fract(SamplerRandomState randstate)
{
- return ((double) random() + 1) / ((double) MAX_RANDOM_VALUE + 2);
+ return pg_erand48(randstate);
}
Hm. Doesn't this impact the selectivity of ANALYZE as well? I think
that we should be careful and use separate methods. I think that you
should use RAND48_SEED_0, _rand48_seed and friends as well instead of
hardcoding those values in your code.
15)
/*
+ * RangeTableSample - represents <table> TABLESAMPLE <method>
(<params>) REPEATABLE (<num>)
+ *
+ * We are more generic than SQL Standard so we pass generic function
+ * arguments to the sampling method.
+ */
This comment should be reformulated...
And some tests:
1) The patch fails correctly when something else than a table or a
matview is used:
=# select * from aav tablesample BERNOULLI (5.5) REPEATABLE (1);
ERROR: 42601: TABLESAMPLE clause can only be used on tables and
materialized views
2) Already mentioned upthread, but WITH clause fails strangely:
=# with query_select as (select a from aa) select
query_select.a from
query_select tablesample BERNOULLI (5.5) REPEATABLE (1);
ERROR: 42P01: relation "query_select" does not exist
I guess that this patch should only allow direct references to tables
or matviews in a FROM clause.
3) Using ALIAS with subqueries...
This works:
=# select i from aa as bb(i) tablesample BERNOULLI (5.5) REPEATABLE (1);
i
---
1
6
(2 rows)
Now I find surprising to see a failure here referring to a syntax failure:
=# select i from (select a from aa) as b(i) tablesample BERNOULLI
(5.5) REPEATABLE (1);
ERROR: 42601: syntax error at or near "tablesample"
4) A dummy sampling method:
=# explain select a from only test tablesample toto (100) REPEATABLE
(10) union select b from aa;
ERROR: 42704: tablesample method "toto" does not exist
5) REPEATABLE and NULL:
=# explain select a from only test tablesample bernoulli (100)
REPEATABLE (NULL) union select b from aa;
ERROR: 22023: REPEATABLE clause must be NOT NULL numeric value
6) Funnily I could not trigger this error:
+ if (base_rte->tablesample)
+ return gettext_noop("Views containing TABLESAMPLE are
not automatically updatable.");
7) Please add a test for that as well for both bernouilli and system:
+ ereport(ERROR,
+ (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+ errmsg("invalid sample size"),
+ errhint("Sample size must be numeric
value between 0 and 100 (inclusive).")));
The regression tests should cover as well those error scenarios.
Just a suggestion: but for 9.5 perhaps we should aim just at patches 1
and 2, and drop the custom TABLESAMPLE methods.
Regards,
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 4/9/15 7:47 PM, Simon Riggs wrote:
Having a function-base implementation allows stratified sampling or
other approaches suited directly to user's data.
How would you implement stratified sampling with this function
interface? You'd need to pass the stratification criteria into the
function somehow. But those would be column names or expressions.
I don't think its reasonable to force all methods to offer both limits
on numbers of rows or percentages. They may not be applicable.
Examples?
In a stratified sample I would still ask for X percent from each stratum
or Y rows from each stratum.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 4/9/15 8:58 PM, Petr Jelinek wrote:
Well, you can have two approaches to this, either allow some specific
set of keywords that can be used to specify limit, or you let sampling
methods interpret parameters, I believe the latter is more flexible.
There is nothing stopping somebody writing sampling method which takes
limit as number of rows, or anything else.Also for example for BERNOULLI to work correctly you'd need to convert
the number of rows to fraction of table anyway (and that's exactly what
the one database which has this feature does internally) and then it's
no different than passing (SELECT 100/reltuples*number_of_rows FROM
tablename) as a parameter.
What is your intended use case for this feature? I know that "give me
100 random rows from this table quickly" is a common use case, but
that's kind of cumbersome if you need to apply formulas like that. I'm
not sure what the use of a percentage is. Presumably, the main use of
this features is on large tables. But then you might not even know how
large it really is, and even saying 0.1% might be more than you wanted
to handle.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 10/04/15 21:26, Peter Eisentraut wrote:
On 4/9/15 8:58 PM, Petr Jelinek wrote:
Well, you can have two approaches to this, either allow some specific
set of keywords that can be used to specify limit, or you let sampling
methods interpret parameters, I believe the latter is more flexible.
There is nothing stopping somebody writing sampling method which takes
limit as number of rows, or anything else.Also for example for BERNOULLI to work correctly you'd need to convert
the number of rows to fraction of table anyway (and that's exactly what
the one database which has this feature does internally) and then it's
no different than passing (SELECT 100/reltuples*number_of_rows FROM
tablename) as a parameter.What is your intended use case for this feature? I know that "give me
100 random rows from this table quickly" is a common use case, but
that's kind of cumbersome if you need to apply formulas like that. I'm
not sure what the use of a percentage is. Presumably, the main use of
this features is on large tables. But then you might not even know how
large it really is, and even saying 0.1% might be more than you wanted
to handle.
My main intended use-case is analytics on very big tables. The
percentages of population vs confidence levels are pretty well mapped
there and you can get quite big speedups if you are fine with getting
results with slightly smaller confidence (ie you care about ballpark
figures).
But this was not really my point, the BERNOULLI just does not work well
with row-limit by definition, it applies probability on each individual
row and while you can get probability from percentage very easily (just
divide by 100), to get it for specific target number of rows you have to
know total number of source rows and that's not something we can do very
accurately so then you won't get 500 rows but approximately 500 rows.
In any case for "give me 500 somewhat random rows from table quickly"
you want probably SYSTEM sampling anyway as it will be orders of
magnitude faster on big tables and yes even 0.1% might be more than you
wanted in that case. I am not against having row limit input for methods
which can work with it like SYSTEM but then that's easily doable by
adding separate sampling method which accepts rows (even if sampling
algorithm itself is same). In current approach all you'd have to do is
write different init function for the sampling method and register it
under new name (yes it won't be named SYSTEM but for example
SYSTEM_ROWLIMIT then).
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 04/10/15 21:57, Petr Jelinek wrote:
On 10/04/15 21:26, Peter Eisentraut wrote:
But this was not really my point, the BERNOULLI just does not work
well with row-limit by definition, it applies probability on each
individual row and while you can get probability from percentage very
easily (just divide by 100), to get it for specific target number of
rows you have to know total number of source rows and that's not
something we can do very accurately so then you won't get 500 rows
but approximately 500 rows.
It's actually even trickier. Even if you happen to know the exact number
of rows in the table, you can't just convert that into a percentage like
that and use it for BERNOULLI sampling. It may give you different number
of result rows, because each row is sampled independently.
That is why we have Vitter's algorithm for reservoir sampling, which
works very differently from BERNOULLI.
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 10/04/15 22:16, Tomas Vondra wrote:
On 04/10/15 21:57, Petr Jelinek wrote:
On 10/04/15 21:26, Peter Eisentraut wrote:
But this was not really my point, the BERNOULLI just does not work
well with row-limit by definition, it applies probability on each
individual row and while you can get probability from percentage very
easily (just divide by 100), to get it for specific target number of
rows you have to know total number of source rows and that's not
something we can do very accurately so then you won't get 500 rows
but approximately 500 rows.It's actually even trickier. Even if you happen to know the exact number
of rows in the table, you can't just convert that into a percentage like
that and use it for BERNOULLI sampling. It may give you different number
of result rows, because each row is sampled independently.That is why we have Vitter's algorithm for reservoir sampling, which
works very differently from BERNOULLI.
Hmm this actually gives me idea - perhaps we could expose Vitter's
reservoir sampling as another sampling method for people who want the
"give me 500 rows from table fast" then? We already have it implemented
it's just matter of adding the glue.
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 10 April 2015 at 15:26, Peter Eisentraut <peter_e@gmx.net> wrote:
What is your intended use case for this feature?
Likely use cases are:
* Limits on numbers of rows in sample. Some research colleagues have
published a new mathematical analysis that will allow a lower limit
than previously considered.
* Time limits on sampling. This allows data visualisation approaches
to gain approximate answers in real time.
* Stratified sampling. Anything with some kind of filtering, lifting
or bias. Allows filtering out known incomplete data.
* Limits on sample error
Later use cases would allow custom aggregates to work together with
custom sampling methods, so we might work our way towards i) an SUM()
function that provides the right answer even when used with a sample
scan, ii) custom aggregates that report the sample error, allowing you
to get both AVG() and AVG_STDERR(). That would be technically possible
with what we have here, but I think a lot more thought required yet.
These have all come out of detailed discussions with two different
groups of data mining researchers.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, RemoteDBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, Apr 11, 2015 at 12:56 AM, Peter Eisentraut <peter_e@gmx.net> wrote:
On 4/9/15 8:58 PM, Petr Jelinek wrote:
Well, you can have two approaches to this, either allow some specific
set of keywords that can be used to specify limit, or you let sampling
methods interpret parameters, I believe the latter is more flexible.
There is nothing stopping somebody writing sampling method which takes
limit as number of rows, or anything else.Also for example for BERNOULLI to work correctly you'd need to convert
the number of rows to fraction of table anyway (and that's exactly what
the one database which has this feature does internally) and then it's
no different than passing (SELECT 100/reltuples*number_of_rows FROM
tablename) as a parameter.What is your intended use case for this feature? I know that "give me
100 random rows from this table quickly" is a common use case, but
that's kind of cumbersome if you need to apply formulas like that. I'm
not sure what the use of a percentage is. Presumably, the main use of
this features is on large tables. But then you might not even know how
large it really is, and even saying 0.1% might be more than you wanted
to handle.
The use case for specifying number of rows for sample scan is valid
and can be achieved by other means if required as suggested by Petr
Jelinek, however the current proposed syntax (w.r.t to Sample
Percentage [1]SQL Standard (2003) w.r.t Sample Percentage <sample clause> ::= TABLESAMPLE <sample method> <left paren> <sample percentage> <right paren> [ <repeatable clause> ]) seems to comply with SQL standard, so why not go
for it and then extend it based on more use-cases?
[1]: SQL Standard (2003) w.r.t Sample Percentage <sample clause> ::= TABLESAMPLE <sample method> <left paren> <sample percentage> <right paren> [ <repeatable clause> ]
SQL Standard (2003) w.r.t Sample Percentage
<sample clause> ::=
TABLESAMPLE <sample method> <left paren> <sample percentage> <right paren>
[ <repeatable clause> ]
<sample percentage> ::= <numeric value expression>
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On 10/04/15 06:46, Michael Paquier wrote:
Mentioning again that patch 1 is interesting as a separate change to
move the sampling logic out of the ANALYZE code in its own portion.I had a look at patch 2. Here are some comments: 1) + <row> + <entry><structfield>tsmseqscan</structfield></entry> + <entry><type>bool</type></entry> + <entry></entry> + <entry>Does the sampling method scan the whole table sequentially? + </entry> + </row> + + <row> + <entry><structfield>tsmpagemode</structfield></entry> + <entry><type>bool</type></entry> + <entry></entry> + <entry>Does the sampling method always read whole pages? + </entry> + </row> I think that those descriptions using question marks are not adapted. They could be reformulated as follows: If true, the sampling method scans the whole table sequentially. If true, the sampling method always reads the pages completely.
Agreed, I didn't like it much either, was just trying to copy style of
indexam.
2) Shouldn't there be some EXPLAIN output in the regression tests?
Done.
3) The documentation should clearly state that TABLESAMPLE can only be
used on matviews and tables, and can only accept items directly
referenced in a FROM clause, aka no WITH or no row subsets in a
subquery. As things stand, TABLESAMPLE being mentioned in the line of
ONLY, users may think that views are supported as ONLY can be used
with views as well.
I added it to standard compatibility section.
4) + The <literal>BERNOULLI</literal> scans whole table and returns + individual rows with equal probability. Additional sampling methods + may be installed in the database via extensions. In this patch extensions are mentioned but this is implemented only in patch 3. Hence this reference should be removed.
They can be installed even without patch 3. The patch 3 just adds SQL
interface for it. This is like saying nobody can write index am which
certainly is possible even if not easy.
6) If the sample method strategies are extended at some point, I think
that you want to use a bitmap in heap_beginscan_ss and friends and not
a set of boolean arguments.
Which can be done once this is actually an issue, it's internal API so
it's fine to change it when needed.
7) This is surprising and I can't understand why things are mixed up here: - scan->rs_pageatatime = IsMVCCSnapshot(snapshot); + scan->rs_pageatatime = allow_pagemode && IsMVCCSnapshot(snapshot); Isn't what you want here a different parameter? I am sure that we do not want to mix up visibility with MVCC snapshots and page mode.
The whole point of pagemode is to make visibility checks for all tuples
on single page in one go because the check requires buffer lock so those
two are very connected.
8) I have done some tests with RLS and things seem to work (filters of
SELECT policies are taken into account after fetching the tuples from
the scan), but I would recommend adding some regression tests in this
area as TABLESAMPLE is a new type of physical heap scan.
Hmm RLS, I freely admit that I don't know what all needs to be tested
there, but I added some simple test for it.
9) s/visibilty/visibility
10) s/acceps/accepts.
11) s/dont't/don't
I can't write apparently :)
12) Is actually tsmexaminetuple necessary now? This is not used in
this patch at all, and looks to me like an optimization needed for
custom sampling methods. Keeping in mind the core feature I think that
this should be removed for now, let's reintroduce it later if there is
a real test case that shows up, catalogs are extensible AFAIK.
Yes, for some sampling methods it's useful as demonstrated by the test
module. I was actually asked off-list for this by a committer and Simon
also sees use for it and I personally see it as good thing for
extendability as well so I'd really like to keep it.
13) Some regression tests with pg_tablesample_method would be welcome.
Not sure what you mean by that.
14) Some comments about this part: +void +sampler_random_init_state(long seed, SamplerRandomState randstate) +{ + randstate[0] = 0x330e; + randstate[1] = (unsigned short) seed; + randstate[2] = (unsigned short) (seed >> 16); +}/* Select a random value R uniformly distributed in (0 - 1) */ double -sampler_random_fract() +sampler_random_fract(SamplerRandomState randstate) { - return ((double) random() + 1) / ((double) MAX_RANDOM_VALUE + 2); + return pg_erand48(randstate); } Hm. Doesn't this impact the selectivity of ANALYZE as well? I think that we should be careful and use separate methods. I think that you
Well, ANALZE is using pg_erand48 anyway and I use random() as seed so I
think it shouldn't but I agree that it's something committer should pay
attention to.
should use RAND48_SEED_0, _rand48_seed and friends as well instead of
hardcoding those values in your code.
Yes, done.
15) /* + * RangeTableSample - represents <table> TABLESAMPLE <method> (<params>) REPEATABLE (<num>) + * + * We are more generic than SQL Standard so we pass generic function + * arguments to the sampling method. + */ This comment should be reformulated...
Done.
And some tests:
1) The patch fails correctly when something else than a table or a
matview is used:
=# select * from aav tablesample BERNOULLI (5.5) REPEATABLE (1);
ERROR: 42601: TABLESAMPLE clause can only be used on tables and
materialized views
2) Already mentioned upthread, but WITH clause fails strangely:
=# with query_select as (select a from aa) selectquery_select.a from
query_select tablesample BERNOULLI (5.5) REPEATABLE (1);
ERROR: 42P01: relation "query_select" does not exist
I guess that this patch should only allow direct references to tables
or matviews in a FROM clause.
I added CTE check and regression test for it.
Now I find surprising to see a failure here referring to a syntax failure:
=# select i from (select a from aa) as b(i) tablesample BERNOULLI
(5.5) REPEATABLE (1);
ERROR: 42601: syntax error at or near "tablesample"
Well that's because we handle this already in parser.
6) Funnily I could not trigger this error: + if (base_rte->tablesample) + return gettext_noop("Views containing TABLESAMPLE are not automatically updatable."); 7) Please add a test for that as well for both bernouilli and system: + ereport(ERROR, + (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE), + errmsg("invalid sample size"), + errhint("Sample size must be numeric value between 0 and 100 (inclusive).")));The regression tests should cover as well those error scenarios.
Done.
Just a suggestion: but for 9.5 perhaps we should aim just at patches 1
and 2, and drop the custom TABLESAMPLE methods.
I agree that DDL patch is not that important to get in (and I made it
last patch in the series now), which does not mean somebody can't write
the extension with new tablesample method.
In any case attached another version.
Changes:
- I addressed the comments from Michael
- I moved the interface between nodeSampleScan and the actual sampling
method to it's own .c file and added TableSampleDesc struct for it. This
makes the interface cleaner and will make it more straightforward to
extend for subqueries in the future (nothing really changes just some
functions were renamed and moved). Amit suggested this at some point and
I thought it's not needed at that time but with the possible future
extension to subquery support I changed my mind.
- renamed heap_beginscan_ss to heap_beginscan_sampling to avoid
confusion with sync scan
- reworded some things and more typo fixes
- Added two sample contrib modules demonstrating row limited and time
limited sampling. I am using linear probing for both of those as the
builtin block sampling is not well suited for row limited or time
limited sampling. For row limited I originally thought of using the
Vitter's reservoir sampling but that does not fit well with the executor
as it needs to keep the reservoir of all the output tuples in memory
which would have horrible memory requirements if the limit was high. The
linear probing seems to work quite well for the use case of "give me 500
random rows from table".
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
0004-tablesample-contrib-add-system_time-sampling-method.patchbinary/octet-stream; name=0004-tablesample-contrib-add-system_time-sampling-method.patchDownload
>From 400fd6f943fe094837bd1ee426d1050fea46f5b2 Mon Sep 17 00:00:00 2001
From: Petr Jelinek <pjmodos@pjmodos.net>
Date: Wed, 15 Apr 2015 00:45:57 +0200
Subject: [PATCH 4/6] tablesample-contrib: add system_time sampling method
---
contrib/tsm_system_time/.gitignore | 4 +
contrib/tsm_system_time/Makefile | 21 ++
.../tsm_system_time/expected/tsm_system_time.out | 54 ++++
contrib/tsm_system_time/sql/tsm_system_time.sql | 14 +
contrib/tsm_system_time/tsm_system_time--1.0.sql | 40 +++
contrib/tsm_system_time/tsm_system_time.c | 315 +++++++++++++++++++++
contrib/tsm_system_time/tsm_system_time.control | 5 +
7 files changed, 453 insertions(+)
create mode 100644 contrib/tsm_system_time/.gitignore
create mode 100644 contrib/tsm_system_time/Makefile
create mode 100644 contrib/tsm_system_time/expected/tsm_system_time.out
create mode 100644 contrib/tsm_system_time/sql/tsm_system_time.sql
create mode 100644 contrib/tsm_system_time/tsm_system_time--1.0.sql
create mode 100644 contrib/tsm_system_time/tsm_system_time.c
create mode 100644 contrib/tsm_system_time/tsm_system_time.control
diff --git a/contrib/tsm_system_time/.gitignore b/contrib/tsm_system_time/.gitignore
new file mode 100644
index 0000000..5dcb3ff
--- /dev/null
+++ b/contrib/tsm_system_time/.gitignore
@@ -0,0 +1,4 @@
+# Generated subdirectories
+/log/
+/results/
+/tmp_check/
diff --git a/contrib/tsm_system_time/Makefile b/contrib/tsm_system_time/Makefile
new file mode 100644
index 0000000..c42c1c6
--- /dev/null
+++ b/contrib/tsm_system_time/Makefile
@@ -0,0 +1,21 @@
+# src/test/modules/tsm_system_time/Makefile
+
+MODULE_big = tsm_system_time
+OBJS = tsm_system_time.o $(WIN32RES)
+PGFILEDESC = "tsm_system_time - SYSTEM TABLESAMPLE method which accepts number rows of as a limit"
+
+EXTENSION = tsm_system_time
+DATA = tsm_system_time--1.0.sql
+
+REGRESS = tsm_system_time
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/tsm_system_time
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/tsm_system_time/expected/tsm_system_time.out b/contrib/tsm_system_time/expected/tsm_system_time.out
new file mode 100644
index 0000000..32ad03c
--- /dev/null
+++ b/contrib/tsm_system_time/expected/tsm_system_time.out
@@ -0,0 +1,54 @@
+CREATE EXTENSION tsm_system_time;
+CREATE TABLE test_tablesample (id int, name text) WITH (fillfactor=10); -- force smaller pages so we don't have to load too much data to get multiple pages
+INSERT INTO test_tablesample SELECT i, repeat(i::text, 1000) FROM generate_series(0, 30) s(i) ORDER BY i;
+ANALYZE test_tablesample;
+SELECT count(*) FROM test_tablesample TABLESAMPLE system_time (1000);
+ count
+-------
+ 31
+(1 row)
+
+SELECT id FROM test_tablesample TABLESAMPLE system_time (1000) REPEATABLE (5432);
+ id
+----
+ 7
+ 14
+ 21
+ 28
+ 4
+ 11
+ 18
+ 25
+ 1
+ 8
+ 15
+ 22
+ 29
+ 5
+ 12
+ 19
+ 26
+ 2
+ 9
+ 16
+ 23
+ 30
+ 6
+ 13
+ 20
+ 27
+ 3
+ 10
+ 17
+ 24
+ 0
+(31 rows)
+
+EXPLAIN SELECT id FROM test_tablesample TABLESAMPLE system_time (100) REPEATABLE (10);
+ QUERY PLAN
+------------------------------------------------------------------------------------
+ Sample Scan (system_time) on test_tablesample (cost=0.00..100.25 rows=25 width=4)
+(1 row)
+
+-- done
+DROP TABLE test_tablesample CASCADE;
diff --git a/contrib/tsm_system_time/sql/tsm_system_time.sql b/contrib/tsm_system_time/sql/tsm_system_time.sql
new file mode 100644
index 0000000..68dbbf9
--- /dev/null
+++ b/contrib/tsm_system_time/sql/tsm_system_time.sql
@@ -0,0 +1,14 @@
+CREATE EXTENSION tsm_system_time;
+
+CREATE TABLE test_tablesample (id int, name text) WITH (fillfactor=10); -- force smaller pages so we don't have to load too much data to get multiple pages
+
+INSERT INTO test_tablesample SELECT i, repeat(i::text, 1000) FROM generate_series(0, 30) s(i) ORDER BY i;
+ANALYZE test_tablesample;
+
+SELECT count(*) FROM test_tablesample TABLESAMPLE system_time (1000);
+SELECT id FROM test_tablesample TABLESAMPLE system_time (1000) REPEATABLE (5432);
+
+EXPLAIN SELECT id FROM test_tablesample TABLESAMPLE system_time (100) REPEATABLE (10);
+
+-- done
+DROP TABLE test_tablesample CASCADE;
diff --git a/contrib/tsm_system_time/tsm_system_time--1.0.sql b/contrib/tsm_system_time/tsm_system_time--1.0.sql
new file mode 100644
index 0000000..834ee77
--- /dev/null
+++ b/contrib/tsm_system_time/tsm_system_time--1.0.sql
@@ -0,0 +1,40 @@
+/* src/test/modules/tablesample/tsm_system_time--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION tsm_system_time" to load this file. \quit
+
+CREATE FUNCTION tsm_system_time_init(internal, int4, int4)
+RETURNS void
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION tsm_system_time_nextblock(internal)
+RETURNS int4
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION tsm_system_time_nexttuple(internal, int4, int2)
+RETURNS int2
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION tsm_system_time_end(internal)
+RETURNS void
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION tsm_system_time_reset(internal)
+RETURNS void
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION tsm_system_time_cost(internal, internal, internal, internal, internal, internal, internal)
+RETURNS void
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+INSERT INTO pg_tablesample_method VALUES('system_time', false, true,
+ 'tsm_system_time_init', 'tsm_system_time_nextblock',
+ 'tsm_system_time_nexttuple', '-', 'tsm_system_time_end',
+ 'tsm_system_time_reset', 'tsm_system_time_cost');
+
diff --git a/contrib/tsm_system_time/tsm_system_time.c b/contrib/tsm_system_time/tsm_system_time.c
new file mode 100644
index 0000000..efb127c
--- /dev/null
+++ b/contrib/tsm_system_time/tsm_system_time.c
@@ -0,0 +1,315 @@
+/*-------------------------------------------------------------------------
+ *
+ * tsm_system_time.c
+ * interface routines for system_time tablesample method
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * contrib/tsm_system_time_rowlimit/tsm_system_time.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "fmgr.h"
+
+#include "access/tablesample.h"
+#include "access/relscan.h"
+#include "miscadmin.h"
+#include "nodes/execnodes.h"
+#include "nodes/relation.h"
+#include "optimizer/clauses.h"
+#include "storage/bufmgr.h"
+#include "utils/sampling.h"
+#include "utils/spccache.h"
+#include "utils/timestamp.h"
+
+PG_MODULE_MAGIC;
+
+/*
+ * State
+ */
+typedef struct
+{
+ SamplerRandomState randstate;
+ uint32 seed; /* random seed */
+ BlockNumber nblocks; /* number of block in relation */
+ int32 time; /* time limit for sampling */
+ TimestampTz start_time; /* start time of sampling */
+ TimestampTz end_time; /* end time of sampling */
+ OffsetNumber lt; /* last tuple returned from current block */
+ BlockNumber step; /* step size */
+ BlockNumber lb; /* last block visited */
+ BlockNumber estblocks; /* estimated number of returned blocks (moving) */
+ BlockNumber doneblocks; /* number of already returned blocks */
+} SystemSamplerData;
+
+
+PG_FUNCTION_INFO_V1(tsm_system_time_init);
+PG_FUNCTION_INFO_V1(tsm_system_time_nextblock);
+PG_FUNCTION_INFO_V1(tsm_system_time_nexttuple);
+PG_FUNCTION_INFO_V1(tsm_system_time_end);
+PG_FUNCTION_INFO_V1(tsm_system_time_reset);
+PG_FUNCTION_INFO_V1(tsm_system_time_cost);
+
+static uint32 random_relative_prime(uint32 n, SamplerRandomState randstate);
+
+/*
+ * Initializes the state.
+ */
+Datum
+tsm_system_time_init(PG_FUNCTION_ARGS)
+{
+ TableSampleDesc *tsdesc = (TableSampleDesc *) PG_GETARG_POINTER(0);
+ uint32 seed = PG_GETARG_UINT32(1);
+ int32 time = PG_ARGISNULL(2) ? -1 : PG_GETARG_INT32(2);
+ HeapScanDesc scan = tsdesc->heapScan;
+ SystemSamplerData *sampler;
+
+ if (time < 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+ errmsg("invalid time limit"),
+ errhint("Time limit must be positive integer value.")));
+
+ sampler = palloc0(sizeof(SystemSamplerData));
+
+ /* Remember initial values for reinit */
+ sampler->seed = seed;
+ sampler->nblocks = scan->rs_nblocks;
+ sampler->lt = InvalidOffsetNumber;
+ sampler->estblocks = 2;
+ sampler->doneblocks = 0;
+ sampler->time = time;
+ sampler->start_time = GetCurrentTimestamp();
+ sampler->end_time = TimestampTzPlusMilliseconds(sampler->start_time,
+ sampler->time);
+
+ sampler_random_init_state(sampler->seed, sampler->randstate);
+
+ /* Find relative prime as step size for linear probing. */
+ sampler->step = random_relative_prime(sampler->nblocks, sampler->randstate);
+ /*
+ * Randomize start position so that blocks close to step size don't have
+ * higher probability of being chosen on very short scan.
+ */
+ sampler->lb = sampler_random_fract(sampler->randstate) * (sampler->nblocks / sampler->step);
+
+ tsdesc->tsmdata = (void *) sampler;
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Get next block number or InvalidBlockNumber when we're done.
+ *
+ * Uses linear probing algorithm for picking next block.
+ */
+Datum
+tsm_system_time_nextblock(PG_FUNCTION_ARGS)
+{
+ TableSampleDesc *tsdesc = (TableSampleDesc *) PG_GETARG_POINTER(0);
+ SystemSamplerData *sampler = (SystemSamplerData *) tsdesc->tsmdata;
+
+ sampler->lb = (sampler->lb + sampler->step) % sampler->nblocks;
+ sampler->doneblocks++;
+
+ /* All blocks have been read, we're done */
+ if (sampler->doneblocks > sampler->nblocks)
+ PG_RETURN_UINT32(InvalidBlockNumber);
+
+ /*
+ * Update the estimations for time limit at least 10 times per estimated
+ * number of returned blocks to handle variations in block read speed.
+ */
+ if (sampler->doneblocks % Max(sampler->estblocks/10, 1) == 0)
+ {
+ TimestampTz now = GetCurrentTimestamp();
+ long secs;
+ int usecs;
+ int usecs_remaining;
+ int time_per_block;
+
+ TimestampDifference(sampler->start_time, now, &secs, &usecs);
+ usecs += (int) secs * 1000000;
+
+ time_per_block = usecs / sampler->doneblocks;
+
+ /* No time left, end. */
+ TimestampDifference(now, sampler->end_time, &secs, &usecs);
+ if (secs <= 0 && usecs <= 0)
+ PG_RETURN_UINT32(InvalidBlockNumber);
+
+ /* Remaining microseconds */
+ usecs_remaining = usecs + (int) secs * 1000000;
+
+ /* Recalculate estimated returned number of blocks */
+ if (time_per_block < usecs_remaining && time_per_block > 0)
+ sampler->estblocks = sampler->time * time_per_block;
+ }
+
+ PG_RETURN_UINT32(sampler->lb);
+}
+
+/*
+ * Get next tuple offset in current block or InvalidOffsetNumber if we are done
+ * with this block.
+ */
+Datum
+tsm_system_time_nexttuple(PG_FUNCTION_ARGS)
+{
+ TableSampleDesc *tsdesc = (TableSampleDesc *) PG_GETARG_POINTER(0);
+ OffsetNumber maxoffset = PG_GETARG_UINT16(2);
+ SystemSamplerData *sampler = (SystemSamplerData *) tsdesc->tsmdata;
+ OffsetNumber tupoffset = sampler->lt;
+
+ if (tupoffset == InvalidOffsetNumber)
+ tupoffset = FirstOffsetNumber;
+ else
+ tupoffset++;
+
+ if (tupoffset > maxoffset)
+ tupoffset = InvalidOffsetNumber;
+
+ sampler->lt = tupoffset;
+
+ PG_RETURN_UINT16(tupoffset);
+}
+
+/*
+ * Cleanup method.
+ */
+Datum
+tsm_system_time_end(PG_FUNCTION_ARGS)
+{
+ TableSampleDesc *tsdesc = (TableSampleDesc *) PG_GETARG_POINTER(0);
+
+ pfree(tsdesc->tsmdata);
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Reset state (called by ReScan).
+ */
+Datum
+tsm_system_time_reset(PG_FUNCTION_ARGS)
+{
+ TableSampleDesc *tsdesc = (TableSampleDesc *) PG_GETARG_POINTER(0);
+ SystemSamplerData *sampler = (SystemSamplerData *) tsdesc->tsmdata;
+
+ sampler->lt = InvalidOffsetNumber;
+ sampler->start_time = GetCurrentTimestamp();
+ sampler->end_time = TimestampTzPlusMilliseconds(sampler->start_time,
+ sampler->time);
+ sampler->estblocks = 2;
+ sampler->doneblocks = 0;
+
+ sampler_random_init_state(sampler->seed, sampler->randstate);
+ sampler->step = random_relative_prime(sampler->nblocks, sampler->randstate);
+ sampler->lb = sampler_random_fract(sampler->randstate) * (sampler->nblocks / sampler->step);
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Costing function.
+ */
+Datum
+tsm_system_time_cost(PG_FUNCTION_ARGS)
+{
+ PlannerInfo *root = (PlannerInfo *) PG_GETARG_POINTER(0);
+ Path *path = (Path *) PG_GETARG_POINTER(1);
+ RelOptInfo *baserel = (RelOptInfo *) PG_GETARG_POINTER(2);
+ List *args = (List *) PG_GETARG_POINTER(3);
+ BlockNumber *pages = (BlockNumber *) PG_GETARG_POINTER(4);
+ double *tuples = (double *) PG_GETARG_POINTER(5);
+ Node *limitnode;
+ int32 time;
+ BlockNumber relpages;
+ double reltuples;
+ double density;
+ double spc_random_page_cost;
+
+ limitnode = linitial(args);
+ limitnode = estimate_expression_value(root, limitnode);
+
+ if (IsA(limitnode, RelabelType))
+ limitnode = (Node *) ((RelabelType *) limitnode)->arg;
+
+ if (IsA(limitnode, Const))
+ time = DatumGetInt32(((Const *) limitnode)->constvalue);
+ else
+ {
+ /* Default time (1s) if the estimation didn't return Const. */
+ time = 1000;
+ }
+
+ relpages = baserel->pages;
+ reltuples = baserel->tuples;
+
+ /* estimate the tuple density */
+ if (relpages > 0)
+ density = reltuples / (double) relpages;
+ else
+ density = (BLCKSZ - SizeOfPageHeaderData) / baserel->width;
+
+ /*
+ * We equal random page cost value to number of ms it takes to read the
+ * random page here which is far from accurate but we don't have anything
+ * better to base our predicted page reads.
+ */
+ get_tablespace_page_costs(baserel->reltablespace,
+ &spc_random_page_cost,
+ NULL);
+
+ /*
+ * Assumption here is that we'll never read less then 1% of table pages,
+ * this is here mainly because it is much less bad to overestimate than
+ * underestimate and using just spc_random_page_cost will probably lead
+ * to underestimations in general.
+ */
+ *pages = Min(baserel->pages, Max(time/spc_random_page_cost, baserel->pages/100));
+ *tuples = rint(density * (double) *pages * path->rows / baserel->tuples);
+ path->rows = *tuples;
+
+ PG_RETURN_VOID();
+}
+
+static uint32
+gcd (uint32 a, uint32 b)
+{
+ uint32 c;
+
+ while (a != 0)
+ {
+ c = a;
+ a = b % a;
+ b = c;
+ }
+
+ return b;
+}
+
+static uint32
+random_relative_prime(uint32 n, SamplerRandomState randstate)
+{
+ /* Pick random starting number, with some limits on what it can be. */
+ uint32 r = (uint32) sampler_random_fract(randstate) * n/2 + n/4,
+ t;
+
+ /*
+ * This should only take 2 or 3 iterations as the probability of 2 numbers
+ * being relatively prime is ~61%.
+ */
+ while ((t = gcd(r, n)) > 1)
+ {
+ CHECK_FOR_INTERRUPTS();
+ r /= t;
+ }
+
+ return r;
+}
diff --git a/contrib/tsm_system_time/tsm_system_time.control b/contrib/tsm_system_time/tsm_system_time.control
new file mode 100644
index 0000000..ebcee19
--- /dev/null
+++ b/contrib/tsm_system_time/tsm_system_time.control
@@ -0,0 +1,5 @@
+# tsm_system_time extension
+comment = 'SYSTEM TABLESAMPLE method which accepts time in milliseconds as a limit'
+default_version = '1.0'
+module_pathname = '$libdir/tsm_system_time'
+relocatable = true
--
1.9.1
0003-tablesample-contrib-add-system_rows-sampling-method.patchbinary/octet-stream; name=0003-tablesample-contrib-add-system_rows-sampling-method.patchDownload
>From 275f7ad324bdd372517427b9a282573b6a6c2ffa Mon Sep 17 00:00:00 2001
From: Petr Jelinek <pjmodos@pjmodos.net>
Date: Mon, 13 Apr 2015 18:20:16 +0200
Subject: [PATCH 3/6] tablesample-contrib: add system_rows sampling method
---
contrib/tsm_system_rows/.gitignore | 4 +
contrib/tsm_system_rows/Makefile | 21 ++
.../tsm_system_rows/expected/tsm_system_rows.out | 31 +++
contrib/tsm_system_rows/sql/tsm_system_rows.sql | 14 ++
contrib/tsm_system_rows/tsm_system_rows--1.0.sql | 45 ++++
contrib/tsm_system_rows/tsm_system_rows.c | 270 +++++++++++++++++++++
contrib/tsm_system_rows/tsm_system_rows.control | 5 +
7 files changed, 390 insertions(+)
create mode 100644 contrib/tsm_system_rows/.gitignore
create mode 100644 contrib/tsm_system_rows/Makefile
create mode 100644 contrib/tsm_system_rows/expected/tsm_system_rows.out
create mode 100644 contrib/tsm_system_rows/sql/tsm_system_rows.sql
create mode 100644 contrib/tsm_system_rows/tsm_system_rows--1.0.sql
create mode 100644 contrib/tsm_system_rows/tsm_system_rows.c
create mode 100644 contrib/tsm_system_rows/tsm_system_rows.control
diff --git a/contrib/tsm_system_rows/.gitignore b/contrib/tsm_system_rows/.gitignore
new file mode 100644
index 0000000..5dcb3ff
--- /dev/null
+++ b/contrib/tsm_system_rows/.gitignore
@@ -0,0 +1,4 @@
+# Generated subdirectories
+/log/
+/results/
+/tmp_check/
diff --git a/contrib/tsm_system_rows/Makefile b/contrib/tsm_system_rows/Makefile
new file mode 100644
index 0000000..700ab27
--- /dev/null
+++ b/contrib/tsm_system_rows/Makefile
@@ -0,0 +1,21 @@
+# src/test/modules/tsm_system_rows/Makefile
+
+MODULE_big = tsm_system_rows
+OBJS = tsm_system_rows.o $(WIN32RES)
+PGFILEDESC = "tsm_system_rows - SYSTEM TABLESAMPLE method which accepts number of rows as a limit"
+
+EXTENSION = tsm_system_rows
+DATA = tsm_system_rows--1.0.sql
+
+REGRESS = tsm_system_rows
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/tsm_system_rows
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/tsm_system_rows/expected/tsm_system_rows.out b/contrib/tsm_system_rows/expected/tsm_system_rows.out
new file mode 100644
index 0000000..7e0f72b
--- /dev/null
+++ b/contrib/tsm_system_rows/expected/tsm_system_rows.out
@@ -0,0 +1,31 @@
+CREATE EXTENSION tsm_system_rows;
+CREATE TABLE test_tablesample (id int, name text) WITH (fillfactor=10); -- force smaller pages so we don't have to load too much data to get multiple pages
+INSERT INTO test_tablesample SELECT i, repeat(i::text, 1000) FROM generate_series(0, 30) s(i) ORDER BY i;
+ANALYZE test_tablesample;
+SELECT count(*) FROM test_tablesample TABLESAMPLE system_rows (1000);
+ count
+-------
+ 31
+(1 row)
+
+SELECT id FROM test_tablesample TABLESAMPLE system_rows (8) REPEATABLE (5432);
+ id
+----
+ 7
+ 14
+ 21
+ 28
+ 4
+ 11
+ 18
+ 25
+(8 rows)
+
+EXPLAIN SELECT id FROM test_tablesample TABLESAMPLE system_rows (20) REPEATABLE (10);
+ QUERY PLAN
+-----------------------------------------------------------------------------------
+ Sample Scan (system_rows) on test_tablesample (cost=0.00..80.20 rows=20 width=4)
+(1 row)
+
+-- done
+DROP TABLE test_tablesample CASCADE;
diff --git a/contrib/tsm_system_rows/sql/tsm_system_rows.sql b/contrib/tsm_system_rows/sql/tsm_system_rows.sql
new file mode 100644
index 0000000..bd812220e
--- /dev/null
+++ b/contrib/tsm_system_rows/sql/tsm_system_rows.sql
@@ -0,0 +1,14 @@
+CREATE EXTENSION tsm_system_rows;
+
+CREATE TABLE test_tablesample (id int, name text) WITH (fillfactor=10); -- force smaller pages so we don't have to load too much data to get multiple pages
+
+INSERT INTO test_tablesample SELECT i, repeat(i::text, 1000) FROM generate_series(0, 30) s(i) ORDER BY i;
+ANALYZE test_tablesample;
+
+SELECT count(*) FROM test_tablesample TABLESAMPLE system_rows (1000);
+SELECT id FROM test_tablesample TABLESAMPLE system_rows (8) REPEATABLE (5432);
+
+EXPLAIN SELECT id FROM test_tablesample TABLESAMPLE system_rows (20) REPEATABLE (10);
+
+-- done
+DROP TABLE test_tablesample CASCADE;
diff --git a/contrib/tsm_system_rows/tsm_system_rows--1.0.sql b/contrib/tsm_system_rows/tsm_system_rows--1.0.sql
new file mode 100644
index 0000000..9d1b7e2
--- /dev/null
+++ b/contrib/tsm_system_rows/tsm_system_rows--1.0.sql
@@ -0,0 +1,45 @@
+/* src/test/modules/tablesample/tsm_system_rows--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION tsm_system_rows" to load this file. \quit
+
+CREATE FUNCTION tsm_system_rows_init(internal, int4, int4)
+RETURNS void
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION tsm_system_rows_nextblock(internal)
+RETURNS int4
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION tsm_system_rows_nexttuple(internal, int4, int2)
+RETURNS int2
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION tsm_system_rows_examinetuple(internal, int4, internal, bool)
+RETURNS bool
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION tsm_system_rows_end(internal)
+RETURNS void
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION tsm_system_rows_reset(internal)
+RETURNS void
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION tsm_system_rows_cost(internal, internal, internal, internal, internal, internal, internal)
+RETURNS void
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+INSERT INTO pg_tablesample_method VALUES('system_rows', false, true,
+ 'tsm_system_rows_init', 'tsm_system_rows_nextblock',
+ 'tsm_system_rows_nexttuple', 'tsm_system_rows_examinetuple',
+ 'tsm_system_rows_end', 'tsm_system_rows_reset', 'tsm_system_rows_cost');
+
diff --git a/contrib/tsm_system_rows/tsm_system_rows.c b/contrib/tsm_system_rows/tsm_system_rows.c
new file mode 100644
index 0000000..14efb27
--- /dev/null
+++ b/contrib/tsm_system_rows/tsm_system_rows.c
@@ -0,0 +1,270 @@
+/*-------------------------------------------------------------------------
+ *
+ * tsm_system_rows.c
+ * interface routines for system_rows tablesample method
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * contrib/tsm_system_rows_rowlimit/tsm_system_rows.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "fmgr.h"
+
+#include "access/tablesample.h"
+#include "access/relscan.h"
+#include "miscadmin.h"
+#include "nodes/execnodes.h"
+#include "nodes/relation.h"
+#include "optimizer/clauses.h"
+#include "storage/bufmgr.h"
+#include "utils/sampling.h"
+
+PG_MODULE_MAGIC;
+
+/*
+ * State
+ */
+typedef struct
+{
+ SamplerRandomState randstate;
+ uint32 seed; /* random seed */
+ BlockNumber nblocks; /* number of block in relation */
+ int32 ntuples; /* number of tuples to return */
+ int32 donetuples; /* tuples already returned */
+ OffsetNumber lt; /* last tuple returned from current block */
+ BlockNumber step; /* step size */
+ BlockNumber lb; /* last block visited */
+ BlockNumber doneblocks; /* number of already returned blocks */
+} SystemSamplerData;
+
+
+PG_FUNCTION_INFO_V1(tsm_system_rows_init);
+PG_FUNCTION_INFO_V1(tsm_system_rows_nextblock);
+PG_FUNCTION_INFO_V1(tsm_system_rows_nexttuple);
+PG_FUNCTION_INFO_V1(tsm_system_rows_examinetuple);
+PG_FUNCTION_INFO_V1(tsm_system_rows_end);
+PG_FUNCTION_INFO_V1(tsm_system_rows_reset);
+PG_FUNCTION_INFO_V1(tsm_system_rows_cost);
+
+static uint32 random_relative_prime(uint32 n, SamplerRandomState randstate);
+
+/*
+ * Initializes the state.
+ */
+Datum
+tsm_system_rows_init(PG_FUNCTION_ARGS)
+{
+ TableSampleDesc *tsdesc = (TableSampleDesc *) PG_GETARG_POINTER(0);
+ uint32 seed = PG_GETARG_UINT32(1);
+ int32 ntuples = PG_ARGISNULL(2) ? -1 : PG_GETARG_INT32(2);
+ HeapScanDesc scan = tsdesc->heapScan;
+ SystemSamplerData *sampler;
+
+ if (ntuples < 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+ errmsg("invalid sample size"),
+ errhint("Sample size must be positive integer value.")));
+
+ sampler = palloc0(sizeof(SystemSamplerData));
+
+ /* Remember initial values for reinit */
+ sampler->seed = seed;
+ sampler->nblocks = scan->rs_nblocks;
+ sampler->ntuples = ntuples;
+ sampler->donetuples = 0;
+ sampler->lt = InvalidOffsetNumber;
+ sampler->doneblocks = 0;
+
+ sampler_random_init_state(sampler->seed, sampler->randstate);
+
+ /* Find relative prime as step size for linear probing. */
+ sampler->step = random_relative_prime(sampler->nblocks, sampler->randstate);
+ /*
+ * Randomize start position so that blocks close to step size don't have
+ * higher probability of being chosen on very short scan.
+ */
+ sampler->lb = sampler_random_fract(sampler->randstate) *
+ (sampler->nblocks / sampler->step);
+
+ tsdesc->tsmdata = (void *) sampler;
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Get next block number or InvalidBlockNumber when we're done.
+ *
+ * Uses linear probing algorithm for picking next block.
+ */
+Datum
+tsm_system_rows_nextblock(PG_FUNCTION_ARGS)
+{
+ TableSampleDesc *tsdesc = (TableSampleDesc *) PG_GETARG_POINTER(0);
+ SystemSamplerData *sampler = (SystemSamplerData *) tsdesc->tsmdata;
+
+ sampler->lb = (sampler->lb + sampler->step) % sampler->nblocks;
+ sampler->doneblocks++;
+
+ /* All blocks have been read, we're done */
+ if (sampler->doneblocks > sampler->nblocks ||
+ sampler->donetuples >= sampler->ntuples)
+ PG_RETURN_UINT32(InvalidBlockNumber);
+
+ PG_RETURN_UINT32(sampler->lb);
+}
+
+/*
+ * Get next tuple offset in current block or InvalidOffsetNumber if we are done
+ * with this block.
+ */
+Datum
+tsm_system_rows_nexttuple(PG_FUNCTION_ARGS)
+{
+ TableSampleDesc *tsdesc = (TableSampleDesc *) PG_GETARG_POINTER(0);
+ OffsetNumber maxoffset = PG_GETARG_UINT16(2);
+ SystemSamplerData *sampler = (SystemSamplerData *) tsdesc->tsmdata;
+ OffsetNumber tupoffset = sampler->lt;
+
+ if (tupoffset == InvalidOffsetNumber)
+ tupoffset = FirstOffsetNumber;
+ else
+ tupoffset++;
+
+ if (tupoffset > maxoffset ||
+ sampler->donetuples >= sampler->ntuples)
+ tupoffset = InvalidOffsetNumber;
+
+ sampler->lt = tupoffset;
+
+ PG_RETURN_UINT16(tupoffset);
+}
+
+/*
+ * Examine tuple and decide if it should be returned.
+ */
+Datum
+tsm_system_rows_examinetuple(PG_FUNCTION_ARGS)
+{
+ TableSampleDesc *tsdesc = (TableSampleDesc *) PG_GETARG_POINTER(0);
+ bool visible = PG_GETARG_BOOL(3);
+ SystemSamplerData *sampler = (SystemSamplerData *) tsdesc->tsmdata;
+
+ if (!visible)
+ PG_RETURN_BOOL(false);
+
+ sampler->donetuples++;
+
+ PG_RETURN_BOOL(true);
+}
+
+/*
+ * Cleanup method.
+ */
+Datum
+tsm_system_rows_end(PG_FUNCTION_ARGS)
+{
+ TableSampleDesc *tsdesc = (TableSampleDesc *) PG_GETARG_POINTER(0);
+
+ pfree(tsdesc->tsmdata);
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Reset state (called by ReScan).
+ */
+Datum
+tsm_system_rows_reset(PG_FUNCTION_ARGS)
+{
+ TableSampleDesc *tsdesc = (TableSampleDesc *) PG_GETARG_POINTER(0);
+ SystemSamplerData *sampler = (SystemSamplerData *) tsdesc->tsmdata;
+
+ sampler->lt = InvalidOffsetNumber;
+ sampler->donetuples = 0;
+ sampler->doneblocks = 0;
+
+ sampler_random_init_state(sampler->seed, sampler->randstate);
+ sampler->step = random_relative_prime(sampler->nblocks, sampler->randstate);
+ sampler->lb = sampler_random_fract(sampler->randstate) * (sampler->nblocks / sampler->step);
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Costing function.
+ */
+Datum
+tsm_system_rows_cost(PG_FUNCTION_ARGS)
+{
+ PlannerInfo *root = (PlannerInfo *) PG_GETARG_POINTER(0);
+ Path *path = (Path *) PG_GETARG_POINTER(1);
+ RelOptInfo *baserel = (RelOptInfo *) PG_GETARG_POINTER(2);
+ List *args = (List *) PG_GETARG_POINTER(3);
+ BlockNumber *pages = (BlockNumber *) PG_GETARG_POINTER(4);
+ double *tuples = (double *) PG_GETARG_POINTER(5);
+ Node *limitnode;
+ int32 ntuples;
+
+ limitnode = linitial(args);
+ limitnode = estimate_expression_value(root, limitnode);
+
+ if (IsA(limitnode, RelabelType))
+ limitnode = (Node *) ((RelabelType *) limitnode)->arg;
+
+ if (IsA(limitnode, Const))
+ ntuples = DatumGetInt32(((Const *) limitnode)->constvalue);
+ else
+ {
+ /* Default ntuples if the estimation didn't return Const. */
+ ntuples = 1000;
+ }
+
+ *pages = Min(baserel->pages, ntuples);
+ *tuples = ntuples;
+ path->rows = *tuples;
+
+ PG_RETURN_VOID();
+}
+
+
+static uint32
+gcd (uint32 a, uint32 b)
+{
+ uint32 c;
+
+ while (a != 0)
+ {
+ c = a;
+ a = b % a;
+ b = c;
+ }
+
+ return b;
+}
+
+static uint32
+random_relative_prime(uint32 n, SamplerRandomState randstate)
+{
+ /* Pick random starting number, with some limits on what it can be. */
+ uint32 r = (uint32) sampler_random_fract(randstate) * n/2 + n/4,
+ t;
+
+ /*
+ * This should only take 2 or 3 iterations as the probability of 2 numbers
+ * being relatively prime is ~61%.
+ */
+ while ((t = gcd(r, n)) > 1)
+ {
+ CHECK_FOR_INTERRUPTS();
+ r /= t;
+ }
+
+ return r;
+}
diff --git a/contrib/tsm_system_rows/tsm_system_rows.control b/contrib/tsm_system_rows/tsm_system_rows.control
new file mode 100644
index 0000000..84ea7ad
--- /dev/null
+++ b/contrib/tsm_system_rows/tsm_system_rows.control
@@ -0,0 +1,5 @@
+# tsm_system_rows extension
+comment = 'SYSTEM TABLESAMPLE method which accepts number rows as a limit'
+default_version = '1.0'
+module_pathname = '$libdir/tsm_system_rows'
+relocatable = true
--
1.9.1
0002-tablesample-v13.patchbinary/octet-stream; name=0002-tablesample-v13.patchDownload
>From cb0618e8e21d040c53c501b80cffff17207c1286 Mon Sep 17 00:00:00 2001
From: Petr Jelinek <pjmodos@pjmodos.net>
Date: Wed, 7 Jan 2015 23:37:55 +0100
Subject: [PATCH 2/6] tablesample v13
---
contrib/file_fdw/file_fdw.c | 2 +-
contrib/postgres_fdw/postgres_fdw.c | 2 +-
doc/src/sgml/catalogs.sgml | 120 +++++++++
doc/src/sgml/ref/select.sgml | 61 ++++-
src/backend/access/Makefile | 3 +-
src/backend/access/heap/heapam.c | 41 ++-
src/backend/access/tablesample/Makefile | 17 ++
src/backend/access/tablesample/bernoulli.c | 235 +++++++++++++++++
src/backend/access/tablesample/system.c | 186 ++++++++++++++
src/backend/access/tablesample/tablesample.c | 368 +++++++++++++++++++++++++++
src/backend/catalog/Makefile | 2 +-
src/backend/commands/analyze.c | 2 +-
src/backend/commands/explain.c | 19 ++
src/backend/executor/Makefile | 2 +-
src/backend/executor/execAmi.c | 8 +
src/backend/executor/execCurrent.c | 1 +
src/backend/executor/execProcnode.c | 14 +
src/backend/executor/nodeSamplescan.c | 256 +++++++++++++++++++
src/backend/nodes/copyfuncs.c | 60 +++++
src/backend/nodes/equalfuncs.c | 37 +++
src/backend/nodes/nodeFuncs.c | 12 +
src/backend/nodes/outfuncs.c | 48 ++++
src/backend/nodes/readfuncs.c | 45 ++++
src/backend/optimizer/path/allpaths.c | 49 ++++
src/backend/optimizer/path/costsize.c | 67 +++++
src/backend/optimizer/plan/createplan.c | 69 +++++
src/backend/optimizer/plan/setrefs.c | 11 +
src/backend/optimizer/plan/subselect.c | 1 +
src/backend/optimizer/util/pathnode.c | 22 ++
src/backend/parser/gram.y | 36 ++-
src/backend/parser/parse_clause.c | 56 ++++
src/backend/parser/parse_func.c | 143 +++++++++++
src/backend/rewrite/rewriteHandler.c | 3 +
src/backend/utils/adt/ruleutils.c | 50 ++++
src/backend/utils/cache/lsyscache.c | 27 ++
src/backend/utils/cache/syscache.c | 23 ++
src/backend/utils/misc/sampling.c | 33 ++-
src/include/access/heapam.h | 4 +
src/include/access/relscan.h | 1 +
src/include/access/tablesample.h | 60 +++++
src/include/catalog/indexing.h | 5 +
src/include/catalog/pg_proc.h | 26 ++
src/include/catalog/pg_tablesample_method.h | 78 ++++++
src/include/executor/nodeSamplescan.h | 24 ++
src/include/nodes/execnodes.h | 9 +
src/include/nodes/nodes.h | 4 +
src/include/nodes/parsenodes.h | 37 +++
src/include/nodes/plannodes.h | 6 +
src/include/optimizer/cost.h | 1 +
src/include/optimizer/pathnode.h | 2 +
src/include/parser/kwlist.h | 1 +
src/include/parser/parse_func.h | 5 +
src/include/port.h | 4 +
src/include/utils/lsyscache.h | 1 +
src/include/utils/rel.h | 1 -
src/include/utils/sampling.h | 15 +-
src/include/utils/syscache.h | 2 +
src/port/erand48.c | 3 -
src/test/regress/expected/rowsecurity.out | 26 ++
src/test/regress/expected/sanity_check.out | 1 +
src/test/regress/expected/tablesample.out | 216 ++++++++++++++++
src/test/regress/parallel_schedule | 2 +-
src/test/regress/serial_schedule | 1 +
src/test/regress/sql/rowsecurity.sql | 4 +
src/test/regress/sql/tablesample.sql | 61 +++++
65 files changed, 2694 insertions(+), 37 deletions(-)
create mode 100644 src/backend/access/tablesample/Makefile
create mode 100644 src/backend/access/tablesample/bernoulli.c
create mode 100644 src/backend/access/tablesample/system.c
create mode 100644 src/backend/access/tablesample/tablesample.c
create mode 100644 src/backend/executor/nodeSamplescan.c
create mode 100644 src/include/access/tablesample.h
create mode 100644 src/include/catalog/pg_tablesample_method.h
create mode 100644 src/include/executor/nodeSamplescan.h
create mode 100644 src/test/regress/expected/tablesample.out
create mode 100644 src/test/regress/sql/tablesample.sql
diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index 249d541..6a813a3 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -1096,7 +1096,7 @@ file_acquire_sample_rows(Relation onerel, int elevel,
* Found a suitable tuple, so save it, replacing one old tuple
* at random
*/
- int k = (int) (targrows * sampler_random_fract());
+ int k = (int) (targrows * sampler_random_fract(rstate.randstate));
Assert(k >= 0 && k < targrows);
heap_freetuple(rows[k]);
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 74ef792..5903384 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -2543,7 +2543,7 @@ analyze_row_processor(PGresult *res, int row, PgFdwAnalyzeState *astate)
if (astate->rowstoskip <= 0)
{
/* Choose a random reservoir element to replace. */
- pos = (int) (targrows * sampler_random_fract());
+ pos = (int) (targrows * sampler_random_fract(astate->rstate.randstate));
Assert(pos >= 0 && pos < targrows);
heap_freetuple(astate->rows[pos]);
}
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index d0b78f2..ce4bcc4 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -269,6 +269,11 @@
</row>
<row>
+ <entry><link linkend="catalog-pg-tablesample-method"><structname>pg_tablesample_method</structname></link></entry>
+ <entry>table sampling methods</entry>
+ </row>
+
+ <row>
<entry><link linkend="catalog-pg-tablespace"><structname>pg_tablespace</structname></link></entry>
<entry>tablespaces within this database cluster</entry>
</row>
@@ -5980,6 +5985,121 @@
</sect1>
+ <sect1 id="catalog-pg-tablesample-method">
+ <title><structname>pg_tabesample_method</structname></title>
+
+ <indexterm zone="catalog-pg-tablesample-method">
+ <primary>pg_am</primary>
+ </indexterm>
+
+ <para>
+ The catalog <structname>pg_tablesample_method</structname> stores
+ information about table sampling methods which can be used in
+ <command>TABLESAMPLE</command> clause of a <command>SELECT</command>
+ statement.
+ </para>
+
+ <table>
+ <title><structname>pg_tablesample_method</> Columns</title>
+
+ <tgroup cols="4">
+ <thead>
+ <row>
+ <entry>Name</entry>
+ <entry>Type</entry>
+ <entry>References</entry>
+ <entry>Description</entry>
+ </row>
+ </thead>
+ <tbody>
+
+ <row>
+ <entry><structfield>oid</structfield></entry>
+ <entry><type>oid</type></entry>
+ <entry></entry>
+ <entry>Row identifier (hidden attribute; must be explicitly selected)</entry>
+ </row>
+
+ <row>
+ <entry><structfield>tsmname</structfield></entry>
+ <entry><type>name</type></entry>
+ <entry></entry>
+ <entry>Name of the sampling method</entry>
+ </row>
+
+ <row>
+ <entry><structfield>tsmseqscan</structfield></entry>
+ <entry><type>bool</type></entry>
+ <entry></entry>
+ <entry>If true, the sampling method scans the whole table sequentially.
+ </entry>
+ </row>
+
+ <row>
+ <entry><structfield>tsmpagemode</structfield></entry>
+ <entry><type>bool</type></entry>
+ <entry></entry>
+ <entry>If true, the sampling method always reads the pages completely.
+ </entry>
+ </row>
+
+ <row>
+ <entry><structfield>tsminit</structfield></entry>
+ <entry><type>regproc</type></entry>
+ <entry><literal><link linkend="catalog-pg-proc"><structname>pg_proc</structname></link>.oid</literal></entry>
+ <entry><quote>Initialize the sampling scan</quote> function</entry>
+ </row>
+
+ <row>
+ <entry><structfield>tsmnextblock</structfield></entry>
+ <entry><type>regproc</type></entry>
+ <entry><literal><link linkend="catalog-pg-proc"><structname>pg_proc</structname></link>.oid</literal></entry>
+ <entry><quote>Get next block number</quote> function</entry>
+ </row>
+
+ <row>
+ <entry><structfield>tsmnexttuple</structfield></entry>
+ <entry><type>regproc</type></entry>
+ <entry><literal><link linkend="catalog-pg-proc"><structname>pg_proc</structname></link>.oid</literal></entry>
+ <entry><quote>Get next tuple offset</quote> function</entry>
+ </row>
+
+ <row>
+ <entry><structfield>tsmexaminetuple</structfield></entry>
+ <entry><type>regproc</type></entry>
+ <entry><literal><link linkend="catalog-pg-proc"><structname>pg_proc</structname></link>.oid</literal></entry>
+ <entry>Function which examines the tuple contents and decides if to
+ return it, or zero if none</entry>
+ </row>
+
+ <row>
+ <entry><structfield>tsmend</structfield></entry>
+ <entry><type>regproc</type></entry>
+ <entry><literal><link linkend="catalog-pg-proc"><structname>pg_proc</structname></link>.oid</literal></entry>
+ <entry><quote>End the sampling scan</quote> function</entry>
+ </row>
+
+ <row>
+ <entry><structfield>tsmreset</structfield></entry>
+ <entry><type>regproc</type></entry>
+ <entry><literal><link linkend="catalog-pg-proc"><structname>pg_proc</structname></link>.oid</literal></entry>
+ <entry><quote>Restart the state of sampling scan</quote> function</entry>
+ </row>
+
+ <row>
+ <entry><structfield>tsmcost</structfield></entry>
+ <entry><type>regproc</type></entry>
+ <entry><literal><link linkend="catalog-pg-proc"><structname>pg_proc</structname></link>.oid</literal></entry>
+ <entry>Costing function</entry>
+ </row>
+
+ </tbody>
+ </tgroup>
+ </table>
+
+ </sect1>
+
+
<sect1 id="catalog-pg-tablespace">
<title><structname>pg_tablespace</structname></title>
diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml
index 2295f63..42e0466 100644
--- a/doc/src/sgml/ref/select.sgml
+++ b/doc/src/sgml/ref/select.sgml
@@ -49,7 +49,7 @@ SELECT [ ALL | DISTINCT [ ON ( <replaceable class="parameter">expression</replac
<phrase>where <replaceable class="parameter">from_item</replaceable> can be one of:</phrase>
- [ ONLY ] <replaceable class="parameter">table_name</replaceable> [ * ] [ [ AS ] <replaceable class="parameter">alias</replaceable> [ ( <replaceable class="parameter">column_alias</replaceable> [, ...] ) ] ]
+ [ ONLY ] <replaceable class="parameter">table_name</replaceable> [ * ] [ [ AS ] <replaceable class="parameter">alias</replaceable> [ ( <replaceable class="parameter">column_alias</replaceable> [, ...] ) ] ] [ TABLESAMPLE <replaceable class="parameter">sampling_method</replaceable> ( <replaceable class="parameter">argument</replaceable> [, ...] ) [ REPEATABLE ( <replaceable class="parameter">seed</replaceable> ) ] ]
[ LATERAL ] ( <replaceable class="parameter">select</replaceable> ) [ AS ] <replaceable class="parameter">alias</replaceable> [ ( <replaceable class="parameter">column_alias</replaceable> [, ...] ) ]
<replaceable class="parameter">with_query_name</replaceable> [ [ AS ] <replaceable class="parameter">alias</replaceable> [ ( <replaceable class="parameter">column_alias</replaceable> [, ...] ) ] ]
[ LATERAL ] <replaceable class="parameter">function_name</replaceable> ( [ <replaceable class="parameter">argument</replaceable> [, ...] ] )
@@ -317,6 +317,50 @@ TABLE [ ONLY ] <replaceable class="parameter">table_name</replaceable> [ * ]
</varlistentry>
<varlistentry>
+ <term>TABLESAMPLE <replaceable class="parameter">sampling_method</replaceable> ( <replaceable class="parameter">argument</replaceable> [, ...] ) [ REPEATABLE ( <replaceable class="parameter">seed</replaceable> ) ]</term>
+ <listitem>
+ <para>
+ Table sample clause after
+ <replaceable class="parameter">table_name</replaceable> indicates that
+ a <replaceable class="parameter">sampling_method</replaceable> should
+ be used to retrieve subset of rows in the table.
+ The <replaceable class="parameter">sampling_method</replaceable> can be
+ any sampling method installed in the database. There are currently two
+ sampling methods available in the standard
+ <productname>PostgreSQL</productname> distribution:
+ <itemizedlist>
+ <listitem>
+ <para><literal>SYSTEM</literal></para>
+ </listitem>
+ <listitem>
+ <para><literal>BERNOULLI</literal></para>
+ </listitem>
+ </itemizedlist>
+ Both of these sampling methods currently accept only single argument
+ which is the percent (floating point from 0 to 100) of the rows to
+ be returned.
+ The <literal>SYSTEM</literal> sampling method does block level
+ sampling with each block having the same chance of being selected and
+ returns all rows from each selected block.
+ The <literal>BERNOULLI</literal> scans whole table and returns
+ individual rows with equal probability. Additional sampling methods
+ may be installed in the database via extensions.
+ </para>
+ <para>
+ The optional parameter <literal>REPEATABLE</literal> uses the seed
+ parameter, which can be a number or expression producing a number, as
+ a random seed for sampling. Note that subsequent commands may return
+ different results even if same <literal>REPEATABLE</literal> clause was
+ specified. This happens because <acronym>DML</acronym> statements and
+ maintenance operations such as <command>VACUUM</> may affect physical
+ distribution of data. The <function>setseed()</> function will not
+ affect the sampling result when the <literal>REPEATABLE</literal>
+ parameter is used.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><replaceable class="parameter">alias</replaceable></term>
<listitem>
<para>
@@ -1927,5 +1971,20 @@ SELECT distributors.* WHERE distributors.name = 'Westward';
<literal>ROWS FROM( ... )</> is an extension of the SQL standard.
</para>
</refsect2>
+
+ <refsect2>
+ <title><literal>TABLESAMPLE</literal> clause</title>
+
+ <para>
+ The <literal>TABLESAMPLE</> clause is currently accepted only on physical
+ relations and materialized views.
+ </para>
+
+ <para>
+ Additional modules allow you to install custom sampling methods and use
+ them instead of the SQL standard methods.
+ </para>
+ </refsect2>
+
</refsect1>
</refentry>
diff --git a/src/backend/access/Makefile b/src/backend/access/Makefile
index 21721b4..bd93a6a 100644
--- a/src/backend/access/Makefile
+++ b/src/backend/access/Makefile
@@ -8,6 +8,7 @@ subdir = src/backend/access
top_builddir = ../../..
include $(top_builddir)/src/Makefile.global
-SUBDIRS = brin common gin gist hash heap index nbtree rmgrdesc spgist transam
+SUBDIRS = brin common gin gist hash heap index nbtree rmgrdesc spgist \
+ tablesample transam
include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 457cd70..ef39cf2 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -79,8 +79,9 @@ bool synchronize_seqscans = true;
static HeapScanDesc heap_beginscan_internal(Relation relation,
Snapshot snapshot,
int nkeys, ScanKey key,
- bool allow_strat, bool allow_sync,
- bool is_bitmapscan, bool temp_snap);
+ bool allow_strat, bool allow_sync, bool allow_pagemode,
+ bool is_bitmapscan, bool is_samplescan,
+ bool temp_snap);
static HeapTuple heap_prepare_insert(Relation relation, HeapTuple tup,
TransactionId xid, CommandId cid, int options);
static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
@@ -293,9 +294,10 @@ initscan(HeapScanDesc scan, ScanKey key, bool is_rescan)
/*
* Currently, we don't have a stats counter for bitmap heap scans (but the
- * underlying bitmap index scans will be counted).
+ * underlying bitmap index scans will be counted) or sample scans (we only
+ * update stats for tuple fetches there)
*/
- if (!scan->rs_bitmapscan)
+ if (!scan->rs_bitmapscan && !scan->rs_samplescan)
pgstat_count_heap_scan(scan->rs_rd);
}
@@ -314,7 +316,7 @@ heap_setscanlimits(HeapScanDesc scan, BlockNumber startBlk, BlockNumber numBlks)
* In page-at-a-time mode it performs additional work, namely determining
* which tuples on the page are visible.
*/
-static void
+void
heapgetpage(HeapScanDesc scan, BlockNumber page)
{
Buffer buffer;
@@ -1297,6 +1299,9 @@ heap_openrv_extended(const RangeVar *relation, LOCKMODE lockmode,
* HeapScanDesc for a bitmap heap scan. Although that scan technology is
* really quite unlike a standard seqscan, there is just enough commonality
* to make it worth using the same data structure.
+ *
+ * heap_beginscan_samplingscan is alternate entry point for setting up a
+ * HeapScanDesc for a TABLESAMPLE scan.
* ----------------
*/
HeapScanDesc
@@ -1304,7 +1309,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
int nkeys, ScanKey key)
{
return heap_beginscan_internal(relation, snapshot, nkeys, key,
- true, true, false, false);
+ true, true, true, false, false, false);
}
HeapScanDesc
@@ -1314,7 +1319,7 @@ heap_beginscan_catalog(Relation relation, int nkeys, ScanKey key)
Snapshot snapshot = RegisterSnapshot(GetCatalogSnapshot(relid));
return heap_beginscan_internal(relation, snapshot, nkeys, key,
- true, true, false, true);
+ true, true, true, false, false, true);
}
HeapScanDesc
@@ -1323,7 +1328,8 @@ heap_beginscan_strat(Relation relation, Snapshot snapshot,
bool allow_strat, bool allow_sync)
{
return heap_beginscan_internal(relation, snapshot, nkeys, key,
- allow_strat, allow_sync, false, false);
+ allow_strat, allow_sync, true,
+ false, false, false);
}
HeapScanDesc
@@ -1331,14 +1337,24 @@ heap_beginscan_bm(Relation relation, Snapshot snapshot,
int nkeys, ScanKey key)
{
return heap_beginscan_internal(relation, snapshot, nkeys, key,
- false, false, true, false);
+ false, false, true, true, false, false);
+}
+
+HeapScanDesc
+heap_beginscan_sampling(Relation relation, Snapshot snapshot,
+ int nkeys, ScanKey key,
+ bool allow_strat, bool allow_pagemode)
+{
+ return heap_beginscan_internal(relation, snapshot, nkeys, key,
+ allow_strat, false, allow_pagemode,
+ false, true, false);
}
static HeapScanDesc
heap_beginscan_internal(Relation relation, Snapshot snapshot,
int nkeys, ScanKey key,
- bool allow_strat, bool allow_sync,
- bool is_bitmapscan, bool temp_snap)
+ bool allow_strat, bool allow_sync, bool allow_pagemode,
+ bool is_bitmapscan, bool is_samplescan, bool temp_snap)
{
HeapScanDesc scan;
@@ -1360,6 +1376,7 @@ heap_beginscan_internal(Relation relation, Snapshot snapshot,
scan->rs_snapshot = snapshot;
scan->rs_nkeys = nkeys;
scan->rs_bitmapscan = is_bitmapscan;
+ scan->rs_samplescan = is_samplescan;
scan->rs_strategy = NULL; /* set in initscan */
scan->rs_allow_strat = allow_strat;
scan->rs_allow_sync = allow_sync;
@@ -1368,7 +1385,7 @@ heap_beginscan_internal(Relation relation, Snapshot snapshot,
/*
* we can use page-at-a-time mode if it's an MVCC-safe snapshot
*/
- scan->rs_pageatatime = IsMVCCSnapshot(snapshot);
+ scan->rs_pageatatime = allow_pagemode && IsMVCCSnapshot(snapshot);
/*
* For a seqscan in a serializable transaction, acquire a predicate lock
diff --git a/src/backend/access/tablesample/Makefile b/src/backend/access/tablesample/Makefile
new file mode 100644
index 0000000..46eeb59
--- /dev/null
+++ b/src/backend/access/tablesample/Makefile
@@ -0,0 +1,17 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+# Makefile for utils/tablesample
+#
+# IDENTIFICATION
+# src/backend/utils/tablesample/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/backend/access/tablesample
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS = tablesample.o system.o bernoulli.o
+
+include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/tablesample/bernoulli.c b/src/backend/access/tablesample/bernoulli.c
new file mode 100644
index 0000000..c91f3f5
--- /dev/null
+++ b/src/backend/access/tablesample/bernoulli.c
@@ -0,0 +1,235 @@
+/*-------------------------------------------------------------------------
+ *
+ * bernoulli.c
+ * interface routines for BERNOULLI tablesample method
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/utils/tablesample/bernoulli.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "fmgr.h"
+
+#include "access/tablesample.h"
+#include "access/relscan.h"
+#include "nodes/execnodes.h"
+#include "nodes/relation.h"
+#include "optimizer/clauses.h"
+#include "storage/bufmgr.h"
+#include "utils/sampling.h"
+
+
+/* tsdesc */
+typedef struct
+{
+ uint32 seed; /* random seed */
+ BlockNumber startblock; /* starting block, we use ths for syncscan support */
+ BlockNumber nblocks; /* number of blocks */
+ BlockNumber blockno; /* current block */
+ float4 probability; /* probabilty that tuple will be returned (0.0-1.0) */
+ OffsetNumber lt; /* last tuple returned from current block */
+ SamplerRandomState randstate; /* random generator tsdesc */
+} BernoulliSamplerData;
+
+/*
+ * Initialize the state.
+ */
+Datum
+tsm_bernoulli_init(PG_FUNCTION_ARGS)
+{
+ TableSampleDesc *tsdesc = (TableSampleDesc *) PG_GETARG_POINTER(0);
+ uint32 seed = PG_GETARG_UINT32(1);
+ float4 percent = PG_ARGISNULL(2) ? -1 : PG_GETARG_FLOAT4(2);
+ HeapScanDesc scan = tsdesc->heapScan;
+ BernoulliSamplerData *sampler;
+
+ if (percent < 0 || percent > 100)
+ ereport(ERROR,
+ (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+ errmsg("invalid sample size"),
+ errhint("Sample size must be numeric value between 0 and 100 (inclusive).")));
+
+ sampler = palloc0(sizeof(BernoulliSamplerData));
+
+ /* Remember initial values for reinit */
+ sampler->seed = seed;
+ sampler->startblock = scan->rs_startblock;
+ sampler->nblocks = scan->rs_nblocks;
+ sampler->blockno = InvalidBlockNumber;
+ sampler->probability = percent / 100;
+ sampler->lt = InvalidOffsetNumber;
+ sampler_random_init_state(sampler->seed, sampler->randstate);
+
+ tsdesc->tsmdata = (void *) sampler;
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Get next block number to read or InvalidBlockNumber if we are at the
+ * end of the relation.
+ */
+Datum
+tsm_bernoulli_nextblock(PG_FUNCTION_ARGS)
+{
+ TableSampleDesc *tsdesc = (TableSampleDesc *) PG_GETARG_POINTER(0);
+ BernoulliSamplerData *sampler =
+ (BernoulliSamplerData *) tsdesc->tsmdata;
+
+ /*
+ * Bernoulli sampling scans all blocks on the table and supports
+ * syncscan so loop from startblock to startblock instead of
+ * from 0 to nblocks.
+ */
+ if (sampler->blockno == InvalidBlockNumber)
+ sampler->blockno = sampler->startblock;
+ else
+ {
+ sampler->blockno++;
+
+ if (sampler->blockno >= sampler->nblocks)
+ sampler->blockno = 0;
+
+ if (sampler->blockno == sampler->startblock)
+ PG_RETURN_UINT32(InvalidBlockNumber);
+ }
+
+ PG_RETURN_UINT32(sampler->blockno);
+}
+
+/*
+ * Get next tuple from current block.
+ *
+ * This method implements the main logic in bernoulli sampling.
+ * The algorithm simply generates new random number (in 0.0-1.0 range) and if
+ * it falls within user specified probability (in the same range) return the
+ * tuple offset.
+ *
+ * It is ok here to return tuple offset without knowing if tuple is visible
+ * and not check it via examinetuple. The reason for that is that we do the
+ * coinflip (random number generation) for every tuple in the table. Since all
+ * tuples have same probability of being returned the visible and invisible
+ * tuples will be returned in same ratio as they have in the actual table.
+ * This means that there is no skew towards either visible or invisible tuples
+ * and the number returned visible tuples to from the executor node is the
+ * fraction of visible tuples which was specified in input.
+ *
+ * This is faster than doing the coinflip in the examinetuple because we don't
+ * have to do visibility checks on uninteresting tuples.
+ *
+ * If we reach end of the block return InvalidOffsetNumber which tells
+ * SampleScan to go to next block.
+ */
+Datum
+tsm_bernoulli_nexttuple(PG_FUNCTION_ARGS)
+{
+ TableSampleDesc *tsdesc = (TableSampleDesc *) PG_GETARG_POINTER(0);
+ OffsetNumber maxoffset = PG_GETARG_UINT16(2);
+ BernoulliSamplerData *sampler =
+ (BernoulliSamplerData *) tsdesc->tsmdata;
+ OffsetNumber tupoffset = sampler->lt;
+ float4 probability = sampler->probability;
+
+ if (tupoffset == InvalidOffsetNumber)
+ tupoffset = FirstOffsetNumber;
+ else
+ tupoffset++;
+
+ /*
+ * Loop over tuple offsets until the random generator returns value that
+ * is within the probability of returning the tuple or until we reach
+ * end of the block.
+ *
+ * (This is our implementation of bernoulli trial)
+ */
+ while (sampler_random_fract(sampler->randstate) > probability)
+ {
+ tupoffset++;
+
+ if (tupoffset > maxoffset)
+ break;
+ }
+
+ if (tupoffset > maxoffset)
+ /* Tell SampleScan that we want next block. */
+ tupoffset = InvalidOffsetNumber;
+
+ sampler->lt = tupoffset;
+
+ PG_RETURN_UINT16(tupoffset);
+}
+
+/*
+ * Cleanup method.
+ */
+Datum
+tsm_bernoulli_end(PG_FUNCTION_ARGS)
+{
+ TableSampleDesc *tsdesc = (TableSampleDesc *) PG_GETARG_POINTER(0);
+
+ pfree(tsdesc->tsmdata);
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Reset tsdesc (called by ReScan).
+ */
+Datum
+tsm_bernoulli_reset(PG_FUNCTION_ARGS)
+{
+ TableSampleDesc *tsdesc = (TableSampleDesc *) PG_GETARG_POINTER(0);
+ BernoulliSamplerData *sampler =
+ (BernoulliSamplerData *) tsdesc->tsmdata;
+
+ sampler->blockno = InvalidBlockNumber;
+ sampler->lt = InvalidOffsetNumber;
+ sampler_random_init_state(sampler->seed, sampler->randstate);
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Costing function.
+ */
+Datum
+tsm_bernoulli_cost(PG_FUNCTION_ARGS)
+{
+ PlannerInfo *root = (PlannerInfo *) PG_GETARG_POINTER(0);
+ Path *path = (Path *) PG_GETARG_POINTER(1);
+ RelOptInfo *baserel = (RelOptInfo *) PG_GETARG_POINTER(2);
+ List *args = (List *) PG_GETARG_POINTER(3);
+ BlockNumber *pages = (BlockNumber *) PG_GETARG_POINTER(4);
+ double *tuples = (double *) PG_GETARG_POINTER(5);
+ Node *pctnode;
+ float4 samplesize;
+
+ *pages = baserel->pages;
+
+ pctnode = linitial(args);
+ pctnode = estimate_expression_value(root, pctnode);
+
+ if (IsA(pctnode, RelabelType))
+ pctnode = (Node *) ((RelabelType *) pctnode)->arg;
+
+ if (IsA(pctnode, Const))
+ {
+ samplesize = DatumGetFloat4(((Const *) pctnode)->constvalue);
+ samplesize /= 100.0;
+ }
+ else
+ {
+ /* Default samplesize if the estimation didn't return Const. */
+ samplesize = 0.1f;
+ }
+
+ *tuples = path->rows * samplesize;
+ path->rows = *tuples;
+
+ PG_RETURN_VOID();
+}
diff --git a/src/backend/access/tablesample/system.c b/src/backend/access/tablesample/system.c
new file mode 100644
index 0000000..1412e51
--- /dev/null
+++ b/src/backend/access/tablesample/system.c
@@ -0,0 +1,186 @@
+/*-------------------------------------------------------------------------
+ *
+ * system.c
+ * interface routines for system tablesample method
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/utils/tablesample/system.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "fmgr.h"
+
+#include "access/tablesample.h"
+#include "access/relscan.h"
+#include "nodes/execnodes.h"
+#include "nodes/relation.h"
+#include "optimizer/clauses.h"
+#include "storage/bufmgr.h"
+#include "utils/sampling.h"
+
+
+/*
+ * State
+ */
+typedef struct
+{
+ BlockSamplerData bs;
+ uint32 seed; /* random seed */
+ BlockNumber nblocks; /* number of block in relation */
+ int samplesize; /* number of blocks to return */
+ OffsetNumber lt; /* last tuple returned from current block */
+} SystemSamplerData;
+
+
+/*
+ * Initializes the state.
+ */
+Datum
+tsm_system_init(PG_FUNCTION_ARGS)
+{
+ TableSampleDesc *tsdesc = (TableSampleDesc *) PG_GETARG_POINTER(0);
+ uint32 seed = PG_GETARG_UINT32(1);
+ float4 percent = PG_ARGISNULL(2) ? -1 : PG_GETARG_FLOAT4(2);
+ HeapScanDesc scan = tsdesc->heapScan;
+ SystemSamplerData *sampler;
+
+ if (percent < 0 || percent > 100)
+ ereport(ERROR,
+ (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+ errmsg("invalid sample size"),
+ errhint("Sample size must be numeric value between 0 and 100 (inclusive).")));
+
+ sampler = palloc0(sizeof(SystemSamplerData));
+
+ /* Remember initial values for reinit */
+ sampler->seed = seed;
+ sampler->nblocks = scan->rs_nblocks;
+ sampler->samplesize = 1 + (int) (sampler->nblocks * (percent / 100.0));
+ sampler->lt = InvalidOffsetNumber;
+
+ BlockSampler_Init(&sampler->bs, sampler->nblocks, sampler->samplesize,
+ sampler->seed);
+
+ tsdesc->tsmdata = (void *) sampler;
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Get next block number or InvalidBlockNumber when we're done.
+ *
+ * Uses the same logic as ANALYZE for picking the random blocks.
+ */
+Datum
+tsm_system_nextblock(PG_FUNCTION_ARGS)
+{
+ TableSampleDesc *tsdesc = (TableSampleDesc *) PG_GETARG_POINTER(0);
+ SystemSamplerData *sampler = (SystemSamplerData *) tsdesc->tsmdata;
+ BlockNumber blockno;
+
+ if (!BlockSampler_HasMore(&sampler->bs))
+ PG_RETURN_UINT32(InvalidBlockNumber);
+
+ blockno = BlockSampler_Next(&sampler->bs);
+
+ PG_RETURN_UINT32(blockno);
+}
+
+/*
+ * Get next tuple offset in current block or InvalidOffsetNumber if we are done
+ * with this block.
+ */
+Datum
+tsm_system_nexttuple(PG_FUNCTION_ARGS)
+{
+ TableSampleDesc *tsdesc = (TableSampleDesc *) PG_GETARG_POINTER(0);
+ OffsetNumber maxoffset = PG_GETARG_UINT16(2);
+ SystemSamplerData *sampler = (SystemSamplerData *) tsdesc->tsmdata;
+ OffsetNumber tupoffset = sampler->lt;
+
+ if (tupoffset == InvalidOffsetNumber)
+ tupoffset = FirstOffsetNumber;
+ else
+ tupoffset++;
+
+ if (tupoffset > maxoffset)
+ tupoffset = InvalidOffsetNumber;
+
+ sampler->lt = tupoffset;
+
+ PG_RETURN_UINT16(tupoffset);
+}
+
+/*
+ * Cleanup method.
+ */
+Datum
+tsm_system_end(PG_FUNCTION_ARGS)
+{
+ TableSampleDesc *tsdesc = (TableSampleDesc *) PG_GETARG_POINTER(0);
+
+ pfree(tsdesc->tsmdata);
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Reset state (called by ReScan).
+ */
+Datum
+tsm_system_reset(PG_FUNCTION_ARGS)
+{
+ TableSampleDesc *tsdesc = (TableSampleDesc *) PG_GETARG_POINTER(0);
+ SystemSamplerData *sampler = (SystemSamplerData *) tsdesc->tsmdata;
+
+ sampler->lt = InvalidOffsetNumber;
+ BlockSampler_Init(&sampler->bs, sampler->nblocks, sampler->samplesize,
+ sampler->seed);
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Costing function.
+ */
+Datum
+tsm_system_cost(PG_FUNCTION_ARGS)
+{
+ PlannerInfo *root = (PlannerInfo *) PG_GETARG_POINTER(0);
+ Path *path = (Path *) PG_GETARG_POINTER(1);
+ RelOptInfo *baserel = (RelOptInfo *) PG_GETARG_POINTER(2);
+ List *args = (List *) PG_GETARG_POINTER(3);
+ BlockNumber *pages = (BlockNumber *) PG_GETARG_POINTER(4);
+ double *tuples = (double *) PG_GETARG_POINTER(5);
+ Node *pctnode;
+ float4 samplesize;
+
+ pctnode = linitial(args);
+ pctnode = estimate_expression_value(root, pctnode);
+
+ if (IsA(pctnode, RelabelType))
+ pctnode = (Node *) ((RelabelType *) pctnode)->arg;
+
+ if (IsA(pctnode, Const))
+ {
+ samplesize = DatumGetFloat4(((Const *) pctnode)->constvalue);
+ samplesize /= 100.0;
+ }
+ else
+ {
+ /* Default samplesize if the estimation didn't return Const. */
+ samplesize = 0.1f;
+ }
+
+ *pages = baserel->pages * samplesize;
+ *tuples = path->rows * samplesize;
+ path->rows = *tuples;
+
+ PG_RETURN_VOID();
+}
diff --git a/src/backend/access/tablesample/tablesample.c b/src/backend/access/tablesample/tablesample.c
new file mode 100644
index 0000000..ef55d06
--- /dev/null
+++ b/src/backend/access/tablesample/tablesample.c
@@ -0,0 +1,368 @@
+/*-------------------------------------------------------------------------
+ *
+ * tablesample.c
+ * TABLESAMPLE internal API
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ * src/backend/access/tablesample/tablesample.c
+ *
+ * TABLESAMPLE is the SQL standard clause for sampling the relations.
+ *
+ * The API is interface between the Executor and the TABLESAMPLE Methods.
+ *
+ * TABLESAMPLE Methods are implementations of actual sampling algorithms which
+ * can be used for returning a sample of the source relation.
+ * Methods don't read the table directly but are asked for block number and
+ * tuple offset which they want to examine (or return) and the tablesample
+ * interface implemented here does the reading for them.
+ *
+ * We currently only support sampling of the physical relations, but in the
+ * future we might extend the API to support subqueries as well.
+ *
+ * -------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "access/tablesample.h"
+
+#include "catalog/pg_tablesample_method.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "storage/bufmgr.h"
+#include "storage/predicate.h"
+#include "utils/rel.h"
+#include "utils/tqual.h"
+
+
+static bool SampleTupleVisible(HeapTuple tuple, OffsetNumber tupoffset, HeapScanDesc scan);
+
+
+/*
+ * Initialize the TABLESAMPLE Descriptor and the TABLESAMPLE Method.
+ */
+TableSampleDesc *
+tablesample_init(SampleScanState *scanstate, TableSampleClause *tablesample)
+{
+ FunctionCallInfoData fcinfo;
+ int i;
+ List *args = tablesample->args;
+ ListCell *arg;
+ ExprContext *econtext = scanstate->ss.ps.ps_ExprContext;
+ TableSampleDesc *tsdesc = (TableSampleDesc *) palloc0(sizeof(TableSampleDesc));
+
+ /* Load functions */
+ fmgr_info(tablesample->tsminit, &(tsdesc->tsminit));
+ fmgr_info(tablesample->tsmnextblock, &(tsdesc->tsmnextblock));
+ fmgr_info(tablesample->tsmnexttuple, &(tsdesc->tsmnexttuple));
+ if (OidIsValid(tablesample->tsmexaminetuple))
+ fmgr_info(tablesample->tsmexaminetuple, &(tsdesc->tsmexaminetuple));
+ else
+ tsdesc->tsmexaminetuple.fn_oid = InvalidOid;
+ fmgr_info(tablesample->tsmreset, &(tsdesc->tsmreset));
+ fmgr_info(tablesample->tsmend, &(tsdesc->tsmend));
+
+ InitFunctionCallInfoData(fcinfo, &tsdesc->tsminit,
+ list_length(args) + 2,
+ InvalidOid, NULL, NULL);
+
+ tsdesc->tupDesc = scanstate->ss.ss_ScanTupleSlot->tts_tupleDescriptor;
+ tsdesc->heapScan = scanstate->ss.ss_currentScanDesc;
+
+ /* First argument for init function is always TableSampleDesc */
+ fcinfo.arg[0] = PointerGetDatum(tsdesc);
+ fcinfo.argnull[0] = false;
+
+ /*
+ * Second arg for init function is always REPEATABLE
+ * When tablesample->repeatable is NULL then REPEATABLE clause was not
+ * specified.
+ * When specified, the expression cannot evaluate to NULL.
+ */
+ if (tablesample->repeatable)
+ {
+ ExprState *argstate = ExecInitExpr((Expr *) tablesample->repeatable,
+ (PlanState *) scanstate);
+ fcinfo.arg[1] = ExecEvalExpr(argstate, econtext,
+ &fcinfo.argnull[1], NULL);
+ if (fcinfo.argnull[1])
+ ereport(ERROR,
+ (errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED),
+ errmsg("REPEATABLE clause must be NOT NULL numeric value")));
+ }
+ else
+ {
+ fcinfo.arg[1] = UInt32GetDatum(random());
+ fcinfo.argnull[1] = false;
+ }
+
+ /* Rest of the arguments come from user. */
+ i = 2;
+ foreach(arg, args)
+ {
+ Expr *argexpr = (Expr *) lfirst(arg);
+ ExprState *argstate = ExecInitExpr(argexpr, (PlanState *) scanstate);
+
+ if (argstate == NULL)
+ {
+ fcinfo.argnull[i] = true;
+ fcinfo.arg[i] = (Datum) 0;;
+ }
+
+ fcinfo.arg[i] = ExecEvalExpr(argstate, econtext,
+ &fcinfo.argnull[i], NULL);
+ i++;
+ }
+ Assert(i == fcinfo.nargs);
+
+ (void) FunctionCallInvoke(&fcinfo);
+
+ return tsdesc;
+}
+
+/*
+ * Get next tuple from TABLESAMPLE Method.
+ */
+HeapTuple
+tablesample_getnext(TableSampleDesc *desc)
+{
+ HeapScanDesc scan = desc->heapScan;
+ HeapTuple tuple = &(scan->rs_ctup);
+ bool pagemode = scan->rs_pageatatime;
+ BlockNumber blockno;
+ Page page;
+ bool page_all_visible;
+ ItemId itemid;
+ OffsetNumber tupoffset,
+ maxoffset;
+
+ if (!scan->rs_inited)
+ {
+ /*
+ * return null immediately if relation is empty
+ */
+ if (scan->rs_nblocks == 0)
+ {
+ Assert(!BufferIsValid(scan->rs_cbuf));
+ tuple->t_data = NULL;
+ return NULL;
+ }
+ blockno = DatumGetInt32(FunctionCall1(&desc->tsmnextblock,
+ PointerGetDatum(desc)));
+ if (!BlockNumberIsValid(blockno))
+ {
+ tuple->t_data = NULL;
+ return NULL;
+ }
+
+ heapgetpage(scan, blockno);
+ scan->rs_inited = true;
+ }
+ else
+ {
+ /* continue from previously returned page/tuple */
+ blockno = scan->rs_cblock; /* current page */
+ }
+
+ /*
+ * When pagemode is disabled, the scan will do visibility checks for each
+ * tuple it finds so the buffer needs to be locked.
+ */
+ if (!pagemode)
+ LockBuffer(scan->rs_cbuf, BUFFER_LOCK_SHARE);
+
+ page = (Page) BufferGetPage(scan->rs_cbuf);
+ page_all_visible = PageIsAllVisible(page);
+ maxoffset = PageGetMaxOffsetNumber(page);
+
+ for (;;)
+ {
+ CHECK_FOR_INTERRUPTS();
+
+ tupoffset = DatumGetUInt16(FunctionCall3(&desc->tsmnexttuple,
+ PointerGetDatum(desc),
+ UInt32GetDatum(blockno),
+ UInt16GetDatum(maxoffset)));
+
+ if (OffsetNumberIsValid(tupoffset))
+ {
+ bool visible;
+ bool found;
+
+ /* Skip invalid tuple pointers. */
+ itemid = PageGetItemId(page, tupoffset);
+ if (!ItemIdIsNormal(itemid))
+ continue;
+
+ tuple->t_data = (HeapTupleHeader) PageGetItem((Page) page, itemid);
+ tuple->t_len = ItemIdGetLength(itemid);
+ ItemPointerSet(&(tuple->t_self), blockno, tupoffset);
+
+ if (page_all_visible)
+ visible = true;
+ else
+ visible = SampleTupleVisible(tuple, tupoffset, scan);
+
+ /*
+ * Let the sampling method examine the actual tuple and decide if we
+ * should return it.
+ *
+ * Note that we let it examine even invisible tuples for
+ * statistical purposes, but not return them since user should
+ * never see invisible tuples.
+ */
+ if (OidIsValid(desc->tsmexaminetuple.fn_oid))
+ {
+ found = DatumGetBool(FunctionCall4(&desc->tsmexaminetuple,
+ PointerGetDatum(desc),
+ UInt32GetDatum(blockno),
+ PointerGetDatum(tuple),
+ BoolGetDatum(visible)));
+ /* Should not happen if sampling method is well written. */
+ if (found && !visible)
+ elog(ERROR, "Sampling method wanted to return invisible tuple");
+ }
+ else
+ found = visible;
+
+ /* Found visible tuple, return it. */
+ if (found)
+ {
+ if (!pagemode)
+ LockBuffer(scan->rs_cbuf, BUFFER_LOCK_UNLOCK);
+ break;
+ }
+ else
+ {
+ /* Try next tuple from same page. */
+ continue;
+ }
+ }
+
+
+ if (!pagemode)
+ LockBuffer(scan->rs_cbuf, BUFFER_LOCK_UNLOCK);
+
+ blockno = DatumGetInt32(FunctionCall1(&desc->tsmnextblock,
+ PointerGetDatum(desc)));
+
+ /*
+ * Report our new scan position for synchronization purposes. We
+ * don't do that when moving backwards, however. That would just
+ * mess up any other forward-moving scanners.
+ *
+ * Note: we do this before checking for end of scan so that the
+ * final state of the position hint is back at the start of the
+ * rel. That's not strictly necessary, but otherwise when you run
+ * the same query multiple times the starting position would shift
+ * a little bit backwards on every invocation, which is confusing.
+ * We don't guarantee any specific ordering in general, though.
+ */
+ if (scan->rs_syncscan)
+ ss_report_location(scan->rs_rd, BlockNumberIsValid(blockno) ?
+ blockno : scan->rs_startblock);
+
+ /*
+ * Reached end of scan.
+ */
+ if (!BlockNumberIsValid(blockno))
+ {
+ if (BufferIsValid(scan->rs_cbuf))
+ ReleaseBuffer(scan->rs_cbuf);
+ scan->rs_cbuf = InvalidBuffer;
+ scan->rs_cblock = InvalidBlockNumber;
+ tuple->t_data = NULL;
+ scan->rs_inited = false;
+ return NULL;
+ }
+
+ heapgetpage(scan, blockno);
+
+ if (!pagemode)
+ LockBuffer(scan->rs_cbuf, BUFFER_LOCK_SHARE);
+
+ page = (Page) BufferGetPage(scan->rs_cbuf);
+ page_all_visible = PageIsAllVisible(page);
+ maxoffset = PageGetMaxOffsetNumber(page);
+ }
+
+ pgstat_count_heap_getnext(scan->rs_rd);
+
+ return &(scan->rs_ctup);
+}
+
+/*
+ * Reset the sampling to starting state
+ */
+void
+tablesample_reset(TableSampleDesc *desc)
+{
+ (void) FunctionCall1(&desc->tsmreset, PointerGetDatum(desc));
+}
+
+/*
+ * Signal the sampling method that the scan has finished.
+ */
+void
+tablesample_end(TableSampleDesc *desc)
+{
+ (void) FunctionCall1(&desc->tsmend, PointerGetDatum(desc));
+}
+
+/*
+ * Check visibility of the tuple.
+ */
+static bool
+SampleTupleVisible(HeapTuple tuple, OffsetNumber tupoffset, HeapScanDesc scan)
+{
+ /*
+ * If this scan is reading whole pages at a time, there is already
+ * visibility info present in rs_vistuples so we can just search it
+ * for the tupoffset.
+ */
+ if (scan->rs_pageatatime)
+ {
+ int start = 0,
+ end = scan->rs_ntuples - 1;
+
+ /*
+ * Do the binary search over rs_vistuples, it's already sorted by
+ * OffsetNumber so we don't need to do any sorting ourselves here.
+ *
+ * We could use bsearch() here but it's slower for integers because
+ * of the function call overhead and because it needs boiler plate code
+ * it would not save us anything code-wise anyway.
+ */
+ while (start <= end)
+ {
+ int mid = start + (end - start) / 2;
+ OffsetNumber curoffset = scan->rs_vistuples[mid];
+
+ if (curoffset == tupoffset)
+ return true;
+ else if (curoffset > tupoffset)
+ end = mid - 1;
+ else
+ start = mid + 1;
+ }
+
+ return false;
+ }
+ else
+ {
+ /* No pagemode, we have to check the tuple itself. */
+ Snapshot snapshot = scan->rs_snapshot;
+ Buffer buffer = scan->rs_cbuf;
+
+ bool visible = HeapTupleSatisfiesVisibility(tuple, snapshot, buffer);
+
+ CheckForSerializableConflictOut(visible, scan->rs_rd, tuple, buffer,
+ snapshot);
+
+ return visible;
+ }
+}
diff --git a/src/backend/catalog/Makefile b/src/backend/catalog/Makefile
index a403c64..34db3e6 100644
--- a/src/backend/catalog/Makefile
+++ b/src/backend/catalog/Makefile
@@ -39,7 +39,7 @@ POSTGRES_BKI_SRCS = $(addprefix $(top_srcdir)/src/include/catalog/,\
pg_ts_config.h pg_ts_config_map.h pg_ts_dict.h \
pg_ts_parser.h pg_ts_template.h pg_extension.h \
pg_foreign_data_wrapper.h pg_foreign_server.h pg_user_mapping.h \
- pg_foreign_table.h pg_policy.h \
+ pg_foreign_table.h pg_policy.h pg_tablesample_method.h \
pg_default_acl.h pg_seclabel.h pg_shseclabel.h pg_collation.h pg_range.h \
toasting.h indexing.h \
)
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 952cf20..65e329e 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -1150,7 +1150,7 @@ acquire_sample_rows(Relation onerel, int elevel,
* Found a suitable tuple, so save it, replacing one
* old tuple at random
*/
- int k = (int) (targrows * sampler_random_fract());
+ int k = (int) (targrows * sampler_random_fract(rstate.randstate));
Assert(k >= 0 && k < targrows);
heap_freetuple(rows[k]);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 315a528..17088f3 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -732,6 +732,7 @@ ExplainPreScanNode(PlanState *planstate, Bitmapset **rels_used)
case T_WorkTableScan:
case T_ForeignScan:
case T_CustomScan:
+ case T_SampleScan:
*rels_used = bms_add_member(*rels_used,
((Scan *) plan)->scanrelid);
break;
@@ -957,6 +958,21 @@ ExplainNode(PlanState *planstate, List *ancestors,
else
pname = sname;
break;
+ case T_SampleScan:
+ {
+ /*
+ * Fetch the tablesample method name from RTE.
+ *
+ * It would be nice to also show parameters, but since we
+ * support arbitrary expressions as parameter it might get
+ * quite messy.
+ */
+ RangeTblEntry *rte;
+ rte = rt_fetch(((SampleScan *) plan)->scanrelid, es->rtable);
+ custom_name = get_tablesample_method_name(rte->tablesample->tsmid);
+ pname = psprintf("Sample Scan (%s)", custom_name);
+ }
+ break;
case T_Material:
pname = sname = "Materialize";
break;
@@ -1074,6 +1090,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
case T_WorkTableScan:
case T_ForeignScan:
case T_CustomScan:
+ case T_SampleScan:
ExplainScanTarget((Scan *) plan, es);
break;
case T_IndexScan:
@@ -1326,6 +1343,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
case T_CteScan:
case T_WorkTableScan:
case T_SubqueryScan:
+ case T_SampleScan:
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
@@ -2224,6 +2242,7 @@ ExplainTargetRel(Plan *plan, Index rti, ExplainState *es)
case T_TidScan:
case T_ForeignScan:
case T_CustomScan:
+ case T_SampleScan:
case T_ModifyTable:
/* Assert it's on a real relation */
Assert(rte->rtekind == RTE_RELATION);
diff --git a/src/backend/executor/Makefile b/src/backend/executor/Makefile
index af707b0..75f799c 100644
--- a/src/backend/executor/Makefile
+++ b/src/backend/executor/Makefile
@@ -21,7 +21,7 @@ OBJS = execAmi.o execCurrent.o execGrouping.o execJunk.o execMain.o \
nodeLimit.o nodeLockRows.o \
nodeMaterial.o nodeMergeAppend.o nodeMergejoin.o nodeModifyTable.o \
nodeNestloop.o nodeFunctionscan.o nodeRecursiveunion.o nodeResult.o \
- nodeSeqscan.o nodeSetOp.o nodeSort.o nodeUnique.o \
+ nodeSamplescan.o nodeSeqscan.o nodeSetOp.o nodeSort.o nodeUnique.o \
nodeValuesscan.o nodeCtescan.o nodeWorktablescan.o \
nodeGroup.o nodeSubplan.o nodeSubqueryscan.o nodeTidscan.o \
nodeForeignscan.o nodeWindowAgg.o tstoreReceiver.o spi.o
diff --git a/src/backend/executor/execAmi.c b/src/backend/executor/execAmi.c
index 6ebad2f..4948a26 100644
--- a/src/backend/executor/execAmi.c
+++ b/src/backend/executor/execAmi.c
@@ -39,6 +39,7 @@
#include "executor/nodeNestloop.h"
#include "executor/nodeRecursiveunion.h"
#include "executor/nodeResult.h"
+#include "executor/nodeSamplescan.h"
#include "executor/nodeSeqscan.h"
#include "executor/nodeSetOp.h"
#include "executor/nodeSort.h"
@@ -155,6 +156,10 @@ ExecReScan(PlanState *node)
ExecReScanSeqScan((SeqScanState *) node);
break;
+ case T_SampleScanState:
+ ExecReScanSampleScan((SampleScanState *) node);
+ break;
+
case T_IndexScanState:
ExecReScanIndexScan((IndexScanState *) node);
break;
@@ -480,6 +485,9 @@ ExecSupportsBackwardScan(Plan *node)
}
return false;
+ case T_SampleScan:
+ return false;
+
case T_Material:
case T_Sort:
/* these don't evaluate tlist */
diff --git a/src/backend/executor/execCurrent.c b/src/backend/executor/execCurrent.c
index d87be96..bcd287f 100644
--- a/src/backend/executor/execCurrent.c
+++ b/src/backend/executor/execCurrent.c
@@ -261,6 +261,7 @@ search_plan_tree(PlanState *node, Oid table_oid)
* Relation scan nodes can all be treated alike
*/
case T_SeqScanState:
+ case T_SampleScanState:
case T_IndexScanState:
case T_IndexOnlyScanState:
case T_BitmapHeapScanState:
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 9892499..03c2feb 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -102,6 +102,7 @@
#include "executor/nodeNestloop.h"
#include "executor/nodeRecursiveunion.h"
#include "executor/nodeResult.h"
+#include "executor/nodeSamplescan.h"
#include "executor/nodeSeqscan.h"
#include "executor/nodeSetOp.h"
#include "executor/nodeSort.h"
@@ -190,6 +191,11 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
estate, eflags);
break;
+ case T_SampleScan:
+ result = (PlanState *) ExecInitSampleScan((SampleScan *) node,
+ estate, eflags);
+ break;
+
case T_IndexScan:
result = (PlanState *) ExecInitIndexScan((IndexScan *) node,
estate, eflags);
@@ -406,6 +412,10 @@ ExecProcNode(PlanState *node)
result = ExecSeqScan((SeqScanState *) node);
break;
+ case T_SampleScanState:
+ result = ExecSampleScan((SampleScanState *) node);
+ break;
+
case T_IndexScanState:
result = ExecIndexScan((IndexScanState *) node);
break;
@@ -644,6 +654,10 @@ ExecEndNode(PlanState *node)
ExecEndSeqScan((SeqScanState *) node);
break;
+ case T_SampleScanState:
+ ExecEndSampleScan((SampleScanState *) node);
+ break;
+
case T_IndexScanState:
ExecEndIndexScan((IndexScanState *) node);
break;
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
new file mode 100644
index 0000000..fc89d1d
--- /dev/null
+++ b/src/backend/executor/nodeSamplescan.c
@@ -0,0 +1,256 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeSamplescan.c
+ * Support routines for sample scans of relations (table sampling).
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ * src/backend/executor/nodeSamplescan.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/tablesample.h"
+#include "executor/executor.h"
+#include "executor/nodeSamplescan.h"
+#include "miscadmin.h"
+#include "parser/parsetree.h"
+#include "pgstat.h"
+#include "storage/bufmgr.h"
+#include "storage/predicate.h"
+#include "utils/rel.h"
+#include "utils/syscache.h"
+#include "utils/tqual.h"
+
+static void InitScanRelation(SampleScanState *node, EState *estate,
+ int eflags, TableSampleClause *tablesample);
+static TupleTableSlot *SampleNext(SampleScanState *node);
+
+
+/* ----------------------------------------------------------------
+ * Scan Support
+ * ----------------------------------------------------------------
+ */
+
+/* ----------------------------------------------------------------
+ * SampleNext
+ *
+ * This is a workhorse for ExecSampleScan
+ * ----------------------------------------------------------------
+ */
+static TupleTableSlot *
+SampleNext(SampleScanState *node)
+{
+ TupleTableSlot *slot;
+ TableSampleDesc *tsdesc;
+ HeapTuple tuple;
+
+ /*
+ * get information from the scan state
+ */
+ slot = node->ss.ss_ScanTupleSlot;
+ tsdesc = node->tsdesc;
+
+ tuple = tablesample_getnext(tsdesc);
+
+ if (tuple)
+ ExecStoreTuple(tuple, /* tuple to store */
+ slot, /* slot to store in */
+ tsdesc->heapScan->rs_cbuf, /* buffer associated with this tuple */
+ false); /* don't pfree this pointer */
+ else
+ ExecClearTuple(slot);
+
+ return slot;
+}
+
+/*
+ * SampleRecheck -- access method routine to recheck a tuple in EvalPlanQual
+ */
+static bool
+SampleRecheck(SampleScanState *node, TupleTableSlot *slot)
+{
+ /* No need to recheck for SampleScan */
+ return true;
+}
+
+/* ----------------------------------------------------------------
+ * ExecSampleScan(node)
+ *
+ * Scans the relation using the sampling method and returns
+ * the next qualifying tuple.
+ * We call the ExecScan() routine and pass it the appropriate
+ * access method functions.
+ * ----------------------------------------------------------------
+ */
+TupleTableSlot *
+ExecSampleScan(SampleScanState *node)
+{
+ return ExecScan((ScanState *) node,
+ (ExecScanAccessMtd) SampleNext,
+ (ExecScanRecheckMtd) SampleRecheck);
+}
+
+/* ----------------------------------------------------------------
+ * InitScanRelation
+ *
+ * Set up to access the scan relation.
+ * ----------------------------------------------------------------
+ */
+static void
+InitScanRelation(SampleScanState *node, EState *estate, int eflags,
+ TableSampleClause *tablesample)
+{
+ Relation currentRelation;
+
+ /*
+ * get the relation object id from the relid'th entry in the range table,
+ * open that relation and acquire appropriate lock on it.
+ */
+ currentRelation = ExecOpenScanRelation(estate,
+ ((SampleScan *) node->ss.ps.plan)->scanrelid,
+ eflags);
+
+ node->ss.ss_currentRelation = currentRelation;
+
+ /*
+ * Even though we aren't going to do a conventional seqscan, it is useful
+ * to create a HeapScanDesc --- many of the fields in it are usable.
+ */
+ node->ss.ss_currentScanDesc =
+ heap_beginscan_sampling(currentRelation, estate->es_snapshot, 0, NULL,
+ tablesample->tsmseqscan,
+ tablesample->tsmpagemode);
+
+ /* and report the scan tuple slot's rowtype */
+ ExecAssignScanType(&node->ss, RelationGetDescr(currentRelation));
+}
+
+
+/* ----------------------------------------------------------------
+ * ExecInitSampleScan
+ * ----------------------------------------------------------------
+ */
+SampleScanState *
+ExecInitSampleScan(SampleScan *node, EState *estate, int eflags)
+{
+ SampleScanState *scanstate;
+ RangeTblEntry *rte = rt_fetch(node->scanrelid,
+ estate->es_range_table);
+
+ Assert(outerPlan(node) == NULL);
+ Assert(innerPlan(node) == NULL);
+ Assert(rte->tablesample != NULL);
+
+ /*
+ * create state structure
+ */
+ scanstate = makeNode(SampleScanState);
+ scanstate->ss.ps.plan = (Plan *) node;
+ scanstate->ss.ps.state = estate;
+
+ /*
+ * Miscellaneous initialization
+ *
+ * create expression context for node
+ */
+ ExecAssignExprContext(estate, &scanstate->ss.ps);
+
+ /*
+ * initialize child expressions
+ */
+ scanstate->ss.ps.targetlist = (List *)
+ ExecInitExpr((Expr *) node->plan.targetlist,
+ (PlanState *) scanstate);
+ scanstate->ss.ps.qual = (List *)
+ ExecInitExpr((Expr *) node->plan.qual,
+ (PlanState *) scanstate);
+
+ /*
+ * tuple table initialization
+ */
+ ExecInitResultTupleSlot(estate, &scanstate->ss.ps);
+ ExecInitScanTupleSlot(estate, &scanstate->ss);
+
+ /*
+ * initialize scan relation
+ */
+ InitScanRelation(scanstate, estate, eflags, rte->tablesample);
+
+ scanstate->ss.ps.ps_TupFromTlist = false;
+
+ /*
+ * Initialize result tuple type and projection info.
+ */
+ ExecAssignResultTypeFromTL(&scanstate->ss.ps);
+ ExecAssignScanProjectionInfo(&scanstate->ss);
+
+ scanstate->tsdesc = tablesample_init(scanstate, rte->tablesample);
+
+ return scanstate;
+}
+
+/* ----------------------------------------------------------------
+ * ExecEndSampleScan
+ *
+ * frees any storage allocated through C routines.
+ * ----------------------------------------------------------------
+ */
+void
+ExecEndSampleScan(SampleScanState *node)
+{
+ /*
+ * Tell sampling function that we finished the scan.
+ */
+ tablesample_end(node->tsdesc);
+
+ /*
+ * Free the exprcontext
+ */
+ ExecFreeExprContext(&node->ss.ps);
+
+ /*
+ * clean out the tuple table
+ */
+ ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
+
+ /*
+ * close heap scan
+ */
+ heap_endscan(node->ss.ss_currentScanDesc);
+
+ /*
+ * close the heap relation.
+ */
+ ExecCloseScanRelation(node->ss.ss_currentRelation);
+}
+
+/* ----------------------------------------------------------------
+ * Join Support
+ * ----------------------------------------------------------------
+ */
+
+/* ----------------------------------------------------------------
+ * ExecReScanSampleScan
+ *
+ * Rescans the relation.
+ *
+ * ----------------------------------------------------------------
+ */
+void
+ExecReScanSampleScan(SampleScanState *node)
+{
+ heap_rescan(node->ss.ss_currentScanDesc, NULL);
+
+ /*
+ * Tell sampling function to reset its state for rescan.
+ */
+ tablesample_reset(node->tsdesc);
+
+ ExecScanReScan(&node->ss);
+}
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 029761e..1a4c85b 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -630,6 +630,22 @@ _copyCustomScan(const CustomScan *from)
}
/*
+ * _copySampleScan
+ */
+static SampleScan *
+_copySampleScan(const SampleScan *from)
+{
+ SampleScan *newnode = makeNode(SampleScan);
+
+ /*
+ * copy node superclass fields
+ */
+ CopyScanFields((const Scan *) from, (Scan *) newnode);
+
+ return newnode;
+}
+
+/*
* CopyJoinFields
*
* This function copies the fields of the Join node. It is used by
@@ -2015,6 +2031,7 @@ _copyRangeTblEntry(const RangeTblEntry *from)
COPY_SCALAR_FIELD(rtekind);
COPY_SCALAR_FIELD(relid);
COPY_SCALAR_FIELD(relkind);
+ COPY_NODE_FIELD(tablesample);
COPY_NODE_FIELD(subquery);
COPY_SCALAR_FIELD(security_barrier);
COPY_SCALAR_FIELD(jointype);
@@ -2147,6 +2164,40 @@ _copyCommonTableExpr(const CommonTableExpr *from)
return newnode;
}
+static RangeTableSample *
+_copyRangeTableSample(const RangeTableSample *from)
+{
+ RangeTableSample *newnode = makeNode(RangeTableSample);
+
+ COPY_NODE_FIELD(relation);
+ COPY_STRING_FIELD(method);
+ COPY_NODE_FIELD(repeatable);
+ COPY_NODE_FIELD(args);
+
+ return newnode;
+}
+
+static TableSampleClause *
+_copyTableSampleClause(const TableSampleClause *from)
+{
+ TableSampleClause *newnode = makeNode(TableSampleClause);
+
+ COPY_SCALAR_FIELD(tsmid);
+ COPY_SCALAR_FIELD(tsmseqscan);
+ COPY_SCALAR_FIELD(tsmpagemode);
+ COPY_SCALAR_FIELD(tsminit);
+ COPY_SCALAR_FIELD(tsmnextblock);
+ COPY_SCALAR_FIELD(tsmnexttuple);
+ COPY_SCALAR_FIELD(tsmexaminetuple);
+ COPY_SCALAR_FIELD(tsmend);
+ COPY_SCALAR_FIELD(tsmreset);
+ COPY_SCALAR_FIELD(tsmcost);
+ COPY_NODE_FIELD(repeatable);
+ COPY_NODE_FIELD(args);
+
+ return newnode;
+}
+
static A_Expr *
_copyAExpr(const A_Expr *from)
{
@@ -4084,6 +4135,9 @@ copyObject(const void *from)
case T_CustomScan:
retval = _copyCustomScan(from);
break;
+ case T_SampleScan:
+ retval = _copySampleScan(from);
+ break;
case T_Join:
retval = _copyJoin(from);
break;
@@ -4732,6 +4786,12 @@ copyObject(const void *from)
case T_CommonTableExpr:
retval = _copyCommonTableExpr(from);
break;
+ case T_RangeTableSample:
+ retval = _copyRangeTableSample(from);
+ break;
+ case T_TableSampleClause:
+ retval = _copyTableSampleClause(from);
+ break;
case T_FuncWithArgs:
retval = _copyFuncWithArgs(from);
break;
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 190e50a..27626b5 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -2318,6 +2318,7 @@ _equalRangeTblEntry(const RangeTblEntry *a, const RangeTblEntry *b)
COMPARE_SCALAR_FIELD(rtekind);
COMPARE_SCALAR_FIELD(relid);
COMPARE_SCALAR_FIELD(relkind);
+ COMPARE_NODE_FIELD(tablesample);
COMPARE_NODE_FIELD(subquery);
COMPARE_SCALAR_FIELD(security_barrier);
COMPARE_SCALAR_FIELD(jointype);
@@ -2437,6 +2438,36 @@ _equalCommonTableExpr(const CommonTableExpr *a, const CommonTableExpr *b)
}
static bool
+_equalRangeTableSample(const RangeTableSample *a, const RangeTableSample *b)
+{
+ COMPARE_NODE_FIELD(relation);
+ COMPARE_STRING_FIELD(method);
+ COMPARE_NODE_FIELD(repeatable);
+ COMPARE_NODE_FIELD(args);
+
+ return true;
+}
+
+static bool
+_equalTableSampleClause(const TableSampleClause *a, const TableSampleClause *b)
+{
+ COMPARE_SCALAR_FIELD(tsmid);
+ COMPARE_SCALAR_FIELD(tsmseqscan);
+ COMPARE_SCALAR_FIELD(tsmpagemode);
+ COMPARE_SCALAR_FIELD(tsminit);
+ COMPARE_SCALAR_FIELD(tsmnextblock);
+ COMPARE_SCALAR_FIELD(tsmnexttuple);
+ COMPARE_SCALAR_FIELD(tsmexaminetuple);
+ COMPARE_SCALAR_FIELD(tsmend);
+ COMPARE_SCALAR_FIELD(tsmreset);
+ COMPARE_SCALAR_FIELD(tsmcost);
+ COMPARE_NODE_FIELD(repeatable);
+ COMPARE_NODE_FIELD(args);
+
+ return true;
+}
+
+static bool
_equalXmlSerialize(const XmlSerialize *a, const XmlSerialize *b)
{
COMPARE_SCALAR_FIELD(xmloption);
@@ -3155,6 +3186,12 @@ equal(const void *a, const void *b)
case T_CommonTableExpr:
retval = _equalCommonTableExpr(a, b);
break;
+ case T_RangeTableSample:
+ retval = _equalRangeTableSample(a, b);
+ break;
+ case T_TableSampleClause:
+ retval = _equalTableSampleClause(a, b);
+ break;
case T_FuncWithArgs:
retval = _equalFuncWithArgs(a, b);
break;
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index d6f1f5b..7742189 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -3219,6 +3219,18 @@ raw_expression_tree_walker(Node *node,
return walker(((WithClause *) node)->ctes, context);
case T_CommonTableExpr:
return walker(((CommonTableExpr *) node)->ctequery, context);
+ case T_RangeTableSample:
+ {
+ RangeTableSample *rts = (RangeTableSample *) node;
+
+ if (walker(rts->relation, context))
+ return true;
+ if (walker(rts->repeatable, context))
+ return true;
+ if (walker(rts->args, context))
+ return true;
+ }
+ break;
default:
elog(ERROR, "unrecognized node type: %d",
(int) nodeTag(node));
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 385b289..e26dbf0 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -580,6 +580,14 @@ _outCustomScan(StringInfo str, const CustomScan *node)
}
static void
+_outSampleScan(StringInfo str, const SampleScan *node)
+{
+ WRITE_NODE_TYPE("SAMPLESCAN");
+
+ _outScanInfo(str, (const Scan *) node);
+}
+
+static void
_outJoin(StringInfo str, const Join *node)
{
WRITE_NODE_TYPE("JOIN");
@@ -2404,6 +2412,36 @@ _outCommonTableExpr(StringInfo str, const CommonTableExpr *node)
}
static void
+_outRangeTableSample(StringInfo str, const RangeTableSample *node)
+{
+ WRITE_NODE_TYPE("RANGETABLESAMPLE");
+
+ WRITE_NODE_FIELD(relation);
+ WRITE_STRING_FIELD(method);
+ WRITE_NODE_FIELD(repeatable);
+ WRITE_NODE_FIELD(args);
+}
+
+static void
+_outTableSampleClause(StringInfo str, const TableSampleClause *node)
+{
+ WRITE_NODE_TYPE("TABLESAMPLECLAUSE");
+
+ WRITE_OID_FIELD(tsmid);
+ WRITE_BOOL_FIELD(tsmseqscan);
+ WRITE_BOOL_FIELD(tsmpagemode);
+ WRITE_OID_FIELD(tsminit);
+ WRITE_OID_FIELD(tsmnextblock);
+ WRITE_OID_FIELD(tsmnexttuple);
+ WRITE_OID_FIELD(tsmexaminetuple);
+ WRITE_OID_FIELD(tsmend);
+ WRITE_OID_FIELD(tsmreset);
+ WRITE_OID_FIELD(tsmcost);
+ WRITE_NODE_FIELD(repeatable);
+ WRITE_NODE_FIELD(args);
+}
+
+static void
_outSetOperationStmt(StringInfo str, const SetOperationStmt *node)
{
WRITE_NODE_TYPE("SETOPERATIONSTMT");
@@ -2433,6 +2471,7 @@ _outRangeTblEntry(StringInfo str, const RangeTblEntry *node)
case RTE_RELATION:
WRITE_OID_FIELD(relid);
WRITE_CHAR_FIELD(relkind);
+ WRITE_NODE_FIELD(tablesample);
break;
case RTE_SUBQUERY:
WRITE_NODE_FIELD(subquery);
@@ -2931,6 +2970,9 @@ _outNode(StringInfo str, const void *obj)
case T_CustomScan:
_outCustomScan(str, obj);
break;
+ case T_SampleScan:
+ _outSampleScan(str, obj);
+ break;
case T_Join:
_outJoin(str, obj);
break;
@@ -3272,6 +3314,12 @@ _outNode(StringInfo str, const void *obj)
case T_CommonTableExpr:
_outCommonTableExpr(str, obj);
break;
+ case T_RangeTableSample:
+ _outRangeTableSample(str, obj);
+ break;
+ case T_TableSampleClause:
+ _outTableSampleClause(str, obj);
+ break;
case T_SetOperationStmt:
_outSetOperationStmt(str, obj);
break;
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 563209c..05ed9a8 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -350,6 +350,46 @@ _readCommonTableExpr(void)
}
/*
+ * _readRangeTableSample
+ */
+static RangeTableSample *
+_readRangeTableSample(void)
+{
+ READ_LOCALS(RangeTableSample);
+
+ READ_NODE_FIELD(relation);
+ READ_STRING_FIELD(method);
+ READ_NODE_FIELD(repeatable);
+ READ_NODE_FIELD(args);
+
+ READ_DONE();
+}
+
+/*
+ * _readTableSampleClause
+ */
+static TableSampleClause *
+_readTableSampleClause(void)
+{
+ READ_LOCALS(TableSampleClause);
+
+ READ_OID_FIELD(tsmid);
+ READ_BOOL_FIELD(tsmseqscan);
+ READ_BOOL_FIELD(tsmpagemode);
+ READ_OID_FIELD(tsminit);
+ READ_OID_FIELD(tsmnextblock);
+ READ_OID_FIELD(tsmnexttuple);
+ READ_OID_FIELD(tsmexaminetuple);
+ READ_OID_FIELD(tsmend);
+ READ_OID_FIELD(tsmreset);
+ READ_OID_FIELD(tsmcost);
+ READ_NODE_FIELD(repeatable);
+ READ_NODE_FIELD(args);
+
+ READ_DONE();
+}
+
+/*
* _readSetOperationStmt
*/
static SetOperationStmt *
@@ -1218,6 +1258,7 @@ _readRangeTblEntry(void)
case RTE_RELATION:
READ_OID_FIELD(relid);
READ_CHAR_FIELD(relkind);
+ READ_NODE_FIELD(tablesample);
break;
case RTE_SUBQUERY:
READ_NODE_FIELD(subquery);
@@ -1313,6 +1354,10 @@ parseNodeString(void)
return_value = _readRowMarkClause();
else if (MATCH("COMMONTABLEEXPR", 15))
return_value = _readCommonTableExpr();
+ else if (MATCH("RANGETABLESAMPLE", 16))
+ return_value = _readRangeTableSample();
+ else if (MATCH("TABLESAMPLECLAUSE", 17))
+ return_value = _readTableSampleClause();
else if (MATCH("SETOPERATIONSTMT", 16))
return_value = _readSetOperationStmt();
else if (MATCH("ALIAS", 5))
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 58d78e6..5f12477 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -71,6 +71,10 @@ static void set_plain_rel_size(PlannerInfo *root, RelOptInfo *rel,
RangeTblEntry *rte);
static void set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
RangeTblEntry *rte);
+static void set_tablesample_rel_size(PlannerInfo *root, RelOptInfo *rel,
+ RangeTblEntry *rte);
+static void set_tablesample_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
+ RangeTblEntry *rte);
static void set_foreign_size(PlannerInfo *root, RelOptInfo *rel,
RangeTblEntry *rte);
static void set_foreign_pathlist(PlannerInfo *root, RelOptInfo *rel,
@@ -265,6 +269,11 @@ set_rel_size(PlannerInfo *root, RelOptInfo *rel,
/* Foreign table */
set_foreign_size(root, rel, rte);
}
+ else if (rte->tablesample != NULL)
+ {
+ /* Sampled relation */
+ set_tablesample_rel_size(root, rel, rte);
+ }
else
{
/* Plain relation */
@@ -332,6 +341,11 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
/* Foreign table */
set_foreign_pathlist(root, rel, rte);
}
+ else if (rte->tablesample != NULL)
+ {
+ /* Build sample scan on relation */
+ set_tablesample_rel_pathlist(root, rel, rte);
+ }
else
{
/* Plain relation */
@@ -418,6 +432,41 @@ set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
}
/*
+ * set_tablesample_rel_size
+ * Set size estimates for a sampled relation.
+ */
+static void
+set_tablesample_rel_size(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
+{
+ /* Mark rel with estimated output rows, width, etc */
+ set_baserel_size_estimates(root, rel);
+}
+
+/*
+ * set_tablesample_rel_pathlist
+ * Build access paths for a sampled relation
+ *
+ * There is only one possible path - sampling scan
+ */
+static void
+set_tablesample_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
+{
+ Relids required_outer;
+ Path *path;
+
+ /*
+ * We don't support pushing join clauses into the quals of a seqscan, but
+ * it could still have required parameterization due to LATERAL refs in
+ * its tlist.
+ */
+ required_outer = rel->lateral_relids;
+
+ /* We only do sample scan if it was requested */
+ path = create_samplescan_path(root, rel, required_outer);
+ rel->pathlist = list_make1(path);
+}
+
+/*
* set_foreign_size
* Set size estimates for a foreign table RTE
*/
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 1a0d358..c2b2b76 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -220,6 +220,73 @@ cost_seqscan(Path *path, PlannerInfo *root,
}
/*
+ * cost_samplescan
+ * Determines and returns the cost of scanning a relation using sampling.
+ *
+ * From planner/optimizer perspective, we don't care all that much about cost
+ * itself since there is always only one scan path to consider when sampling
+ * scan is present, but number of rows estimation is still important.
+ *
+ * 'baserel' is the relation to be scanned
+ * 'param_info' is the ParamPathInfo if this is a parameterized path, else NULL
+ */
+void
+cost_samplescan(Path *path, PlannerInfo *root, RelOptInfo *baserel)
+{
+ Cost startup_cost = 0;
+ Cost run_cost = 0;
+ double spc_seq_page_cost,
+ spc_random_page_cost,
+ spc_page_cost;
+ QualCost qpqual_cost;
+ Cost cpu_per_tuple;
+ BlockNumber pages;
+ double tuples;
+ RangeTblEntry *rte = planner_rt_fetch(baserel->relid, root);
+ TableSampleClause *tablesample = rte->tablesample;
+
+ /* Should only be applied to base relations */
+ Assert(baserel->relid > 0);
+ Assert(baserel->rtekind == RTE_RELATION);
+
+ /* Mark the path with the correct row estimate */
+ if (path->param_info)
+ path->rows = path->param_info->ppi_rows;
+ else
+ path->rows = baserel->rows;
+
+ /* Call the sampling method's costing function. */
+ OidFunctionCall6(tablesample->tsmcost, PointerGetDatum(root),
+ PointerGetDatum(path), PointerGetDatum(baserel),
+ PointerGetDatum(tablesample->args),
+ PointerGetDatum(&pages), PointerGetDatum(&tuples));
+
+ /* fetch estimated page cost for tablespace containing table */
+ get_tablespace_page_costs(baserel->reltablespace,
+ &spc_random_page_cost,
+ &spc_seq_page_cost);
+
+
+ spc_page_cost = tablesample->tsmseqscan ? spc_seq_page_cost :
+ spc_random_page_cost;
+
+ /*
+ * disk costs
+ */
+ run_cost += spc_page_cost * pages;
+
+ /* CPU costs */
+ get_restriction_qual_cost(root, baserel, path->param_info, &qpqual_cost);
+
+ startup_cost += qpqual_cost.startup;
+ cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple;
+ run_cost += cpu_per_tuple * tuples;
+
+ path->startup_cost = startup_cost;
+ path->total_cost = startup_cost + run_cost;
+}
+
+/*
* cost_index
* Determines and returns the cost of scanning a relation using an index.
*
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index cb69c03..3fc84e2 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -58,6 +58,8 @@ static Material *create_material_plan(PlannerInfo *root, MaterialPath *best_path
static Plan *create_unique_plan(PlannerInfo *root, UniquePath *best_path);
static SeqScan *create_seqscan_plan(PlannerInfo *root, Path *best_path,
List *tlist, List *scan_clauses);
+static SampleScan *create_samplescan_plan(PlannerInfo *root, Path *best_path,
+ List *tlist, List *scan_clauses);
static Scan *create_indexscan_plan(PlannerInfo *root, IndexPath *best_path,
List *tlist, List *scan_clauses, bool indexonly);
static BitmapHeapScan *create_bitmap_scan_plan(PlannerInfo *root,
@@ -100,6 +102,7 @@ static List *order_qual_clauses(PlannerInfo *root, List *clauses);
static void copy_path_costsize(Plan *dest, Path *src);
static void copy_plan_costsize(Plan *dest, Plan *src);
static SeqScan *make_seqscan(List *qptlist, List *qpqual, Index scanrelid);
+static SampleScan *make_samplescan(List *qptlist, List *qpqual, Index scanrelid);
static IndexScan *make_indexscan(List *qptlist, List *qpqual, Index scanrelid,
Oid indexid, List *indexqual, List *indexqualorig,
List *indexorderby, List *indexorderbyorig,
@@ -228,6 +231,7 @@ create_plan_recurse(PlannerInfo *root, Path *best_path)
switch (best_path->pathtype)
{
case T_SeqScan:
+ case T_SampleScan:
case T_IndexScan:
case T_IndexOnlyScan:
case T_BitmapHeapScan:
@@ -343,6 +347,13 @@ create_scan_plan(PlannerInfo *root, Path *best_path)
scan_clauses);
break;
+ case T_SampleScan:
+ plan = (Plan *) create_samplescan_plan(root,
+ best_path,
+ tlist,
+ scan_clauses);
+ break;
+
case T_IndexScan:
plan = (Plan *) create_indexscan_plan(root,
(IndexPath *) best_path,
@@ -546,6 +557,7 @@ disuse_physical_tlist(PlannerInfo *root, Plan *plan, Path *path)
switch (path->pathtype)
{
case T_SeqScan:
+ case T_SampleScan:
case T_IndexScan:
case T_IndexOnlyScan:
case T_BitmapHeapScan:
@@ -1133,6 +1145,45 @@ create_seqscan_plan(PlannerInfo *root, Path *best_path,
}
/*
+ * create_samplescan_plan
+ * Returns a samplecan plan for the base relation scanned by 'best_path'
+ * with restriction clauses 'scan_clauses' and targetlist 'tlist'.
+ */
+static SampleScan *
+create_samplescan_plan(PlannerInfo *root, Path *best_path,
+ List *tlist, List *scan_clauses)
+{
+ SampleScan *scan_plan;
+ Index scan_relid = best_path->parent->relid;
+
+ /* it should be a base rel with tablesample clause... */
+ Assert(scan_relid > 0);
+ Assert(best_path->parent->rtekind == RTE_RELATION);
+ Assert(best_path->pathtype == T_SampleScan);
+
+ /* Sort clauses into best execution order */
+ scan_clauses = order_qual_clauses(root, scan_clauses);
+
+ /* Reduce RestrictInfo list to bare expressions; ignore pseudoconstants */
+ scan_clauses = extract_actual_clauses(scan_clauses, false);
+
+ /* Replace any outer-relation variables with nestloop params */
+ if (best_path->param_info)
+ {
+ scan_clauses = (List *)
+ replace_nestloop_params(root, (Node *) scan_clauses);
+ }
+
+ scan_plan = make_samplescan(tlist,
+ scan_clauses,
+ scan_relid);
+
+ copy_path_costsize(&scan_plan->plan, best_path);
+
+ return scan_plan;
+}
+
+/*
* create_indexscan_plan
* Returns an indexscan plan for the base relation scanned by 'best_path'
* with restriction clauses 'scan_clauses' and targetlist 'tlist'.
@@ -3321,6 +3372,24 @@ make_seqscan(List *qptlist,
return node;
}
+static SampleScan *
+make_samplescan(List *qptlist,
+ List *qpqual,
+ Index scanrelid)
+{
+ SampleScan *node = makeNode(SampleScan);
+ Plan *plan = &node->plan;
+
+ /* cost should be inserted by caller */
+ plan->targetlist = qptlist;
+ plan->qual = qpqual;
+ plan->lefttree = NULL;
+ plan->righttree = NULL;
+ node->scanrelid = scanrelid;
+
+ return node;
+}
+
static IndexScan *
make_indexscan(List *qptlist,
List *qpqual,
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 94b12ab..a2ae940 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -445,6 +445,17 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
fix_scan_list(root, splan->plan.qual, rtoffset);
}
break;
+ case T_SampleScan:
+ {
+ SampleScan *splan = (SampleScan *) plan;
+
+ splan->scanrelid += rtoffset;
+ splan->plan.targetlist =
+ fix_scan_list(root, splan->plan.targetlist, rtoffset);
+ splan->plan.qual =
+ fix_scan_list(root, splan->plan.qual, rtoffset);
+ }
+ break;
case T_IndexScan:
{
IndexScan *splan = (IndexScan *) plan;
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index acfd0bc..9971b54 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -2167,6 +2167,7 @@ finalize_plan(PlannerInfo *root, Plan *plan, Bitmapset *valid_params,
break;
case T_SeqScan:
+ case T_SampleScan:
context.paramids = bms_add_members(context.paramids, scan_params);
break;
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index faca30b..ea7a47b 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -706,6 +706,26 @@ create_seqscan_path(PlannerInfo *root, RelOptInfo *rel, Relids required_outer)
}
/*
+ * create_samplescan_path
+ * Like seqscan but uses sampling function while scanning.
+ */
+Path *
+create_samplescan_path(PlannerInfo *root, RelOptInfo *rel, Relids required_outer)
+{
+ Path *pathnode = makeNode(Path);
+
+ pathnode->pathtype = T_SampleScan;
+ pathnode->parent = rel;
+ pathnode->param_info = get_baserel_parampathinfo(root, rel,
+ required_outer);
+ pathnode->pathkeys = NIL; /* samplescan has unordered result */
+
+ cost_samplescan(pathnode, root, rel);
+
+ return pathnode;
+}
+
+/*
* create_index_path
* Creates a path node for an index scan.
*
@@ -1778,6 +1798,8 @@ reparameterize_path(PlannerInfo *root, Path *path,
case T_SubqueryScan:
return create_subqueryscan_path(root, rel, path->pathkeys,
required_outer);
+ case T_SampleScan:
+ return (Path *) create_samplescan_path(root, rel, required_outer);
default:
break;
}
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 5818858..d5405ad 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -448,6 +448,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
%type <range> relation_expr
%type <range> relation_expr_opt_alias
%type <target> target_el single_set_clause set_target insert_column_item
+%type <node> relation_expr_tablesample tablesample_clause opt_repeatable_clause
%type <str> generic_option_name
%type <node> generic_option_arg
@@ -615,8 +616,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
STATEMENT STATISTICS STDIN STDOUT STORAGE STRICT_P STRIP_P SUBSTRING
SYMMETRIC SYSID SYSTEM_P
- TABLE TABLES TABLESPACE TEMP TEMPLATE TEMPORARY TEXT_P THEN TIME TIMESTAMP
- TO TRAILING TRANSACTION TREAT TRIGGER TRIM TRUE_P
+ TABLE TABLES TABLESAMPLE TABLESPACE TEMP TEMPLATE TEMPORARY TEXT_P THEN
+ TIME TIMESTAMP TO TRAILING TRANSACTION TREAT TRIGGER TRIM TRUE_P
TRUNCATE TRUSTED TYPE_P TYPES_P
UNBOUNDED UNCOMMITTED UNENCRYPTED UNION UNIQUE UNKNOWN UNLISTEN UNLOGGED
@@ -10191,6 +10192,10 @@ table_ref: relation_expr opt_alias_clause
$1->alias = $2;
$$ = (Node *) $1;
}
+ | relation_expr_tablesample
+ {
+ $$ = (Node *) $1;
+ }
| func_table func_alias_clause
{
RangeFunction *n = (RangeFunction *) $1;
@@ -10516,6 +10521,32 @@ relation_expr_opt_alias: relation_expr %prec UMINUS
}
;
+
+relation_expr_tablesample: relation_expr opt_alias_clause tablesample_clause
+ {
+ RangeTableSample *n = (RangeTableSample *) $3;
+ n->relation = $1;
+ n->relation->alias = $2;
+ $$ = (Node *) n;
+ }
+ ;
+
+tablesample_clause:
+ TABLESAMPLE ColId '(' expr_list ')' opt_repeatable_clause
+ {
+ RangeTableSample *n = makeNode(RangeTableSample);
+ n->method = $2;
+ n->args = $4;
+ n->repeatable = $6;
+ $$ = (Node *) n;
+ }
+ ;
+
+opt_repeatable_clause:
+ REPEATABLE '(' a_expr ')' { $$ = (Node *) $3; }
+ | /*EMPTY*/ { $$ = NULL; }
+ ;
+
/*
* func_table represents a function invocation in a FROM list. It can be
* a plain function call, like "foo(...)", or a ROWS FROM expression with
@@ -13606,6 +13637,7 @@ type_func_name_keyword:
| OVERLAPS
| RIGHT
| SIMILAR
+ | TABLESAMPLE
| VERBOSE
;
diff --git a/src/backend/parser/parse_clause.c b/src/backend/parser/parse_clause.c
index 8d90b50..d512ce1 100644
--- a/src/backend/parser/parse_clause.c
+++ b/src/backend/parser/parse_clause.c
@@ -16,6 +16,7 @@
#include "postgres.h"
#include "access/heapam.h"
+#include "access/htup_details.h"
#include "catalog/heap.h"
#include "catalog/pg_type.h"
#include "commands/defrem.h"
@@ -29,6 +30,7 @@
#include "parser/parse_coerce.h"
#include "parser/parse_collate.h"
#include "parser/parse_expr.h"
+#include "parser/parse_func.h"
#include "parser/parse_oper.h"
#include "parser/parse_relation.h"
#include "parser/parse_target.h"
@@ -36,6 +38,7 @@
#include "utils/guc.h"
#include "utils/lsyscache.h"
#include "utils/rel.h"
+#include "utils/syscache.h"
/* Convenience macro for the most common makeNamespaceItem() case */
@@ -414,6 +417,39 @@ transformJoinOnClause(ParseState *pstate, JoinExpr *j, List *namespace)
return result;
}
+static RangeTblEntry *
+transformTableSampleEntry(ParseState *pstate, RangeTableSample *rv)
+{
+ RangeTblEntry *rte = NULL;
+ CommonTableExpr *cte = NULL;
+ TableSampleClause *tablesample = NULL;
+
+ /* if relation has an unqualified name, it might be a CTE reference */
+ if (!rv->relation->schemaname)
+ {
+ Index levelsup;
+ cte = scanNameSpaceForCTE(pstate, rv->relation->relname, &levelsup);
+ }
+
+ /* We first need to build a range table entry */
+ if (!cte)
+ rte = transformTableEntry(pstate, rv->relation);
+
+ if (!rte ||
+ (rte->relkind != RELKIND_RELATION &&
+ rte->relkind != RELKIND_MATVIEW))
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("TABLESAMPLE clause can only be used on tables and materialized views"),
+ parser_errposition(pstate, rv->relation->location)));
+
+ tablesample = ParseTableSample(pstate, rv->method, rv->repeatable,
+ rv->args, rv->relation->location);
+ rte->tablesample = tablesample;
+
+ return rte;
+}
+
/*
* transformTableEntry --- transform a RangeVar (simple relation reference)
*/
@@ -1122,6 +1158,26 @@ transformFromClauseItem(ParseState *pstate, Node *n,
return (Node *) j;
}
+ else if (IsA(n, RangeTableSample))
+ {
+ /* Tablesample reference */
+ RangeTableSample *rv = (RangeTableSample *) n;
+ RangeTblRef *rtr;
+ RangeTblEntry *rte = NULL;
+ int rtindex;
+
+ rte = transformTableSampleEntry(pstate, rv);
+
+ /* assume new rte is at end */
+ rtindex = list_length(pstate->p_rtable);
+ Assert(rte == rt_fetch(rtindex, pstate->p_rtable));
+ *top_rte = rte;
+ *top_rti = rtindex;
+ *namespace = list_make1(makeDefaultNSItem(rte));
+ rtr = makeNode(RangeTblRef);
+ rtr->rtindex = rtindex;
+ return (Node *) rtr;
+ }
else
elog(ERROR, "unrecognized node type: %d", (int) nodeTag(n));
return NULL; /* can't get here, keep compiler quiet */
diff --git a/src/backend/parser/parse_func.c b/src/backend/parser/parse_func.c
index 1385776..ab87635 100644
--- a/src/backend/parser/parse_func.c
+++ b/src/backend/parser/parse_func.c
@@ -18,6 +18,7 @@
#include "catalog/pg_aggregate.h"
#include "catalog/pg_proc.h"
#include "catalog/pg_type.h"
+#include "catalog/pg_tablesample_method.h"
#include "funcapi.h"
#include "lib/stringinfo.h"
#include "nodes/makefuncs.h"
@@ -26,6 +27,7 @@
#include "parser/parse_clause.h"
#include "parser/parse_coerce.h"
#include "parser/parse_func.h"
+#include "parser/parse_expr.h"
#include "parser/parse_relation.h"
#include "parser/parse_target.h"
#include "parser/parse_type.h"
@@ -767,6 +769,147 @@ ParseFuncOrColumn(ParseState *pstate, List *funcname, List *fargs,
}
+/*
+ * ParseTableSample
+ *
+ * Parse TABLESAMPLE clause and process the arguments
+ */
+TableSampleClause *
+ParseTableSample(ParseState *pstate, char *samplemethod, Node *repeatable,
+ List *sampleargs, int location)
+{
+ HeapTuple tuple;
+ Form_pg_tablesample_method tsm;
+ Form_pg_proc procform;
+ TableSampleClause *tablesample;
+ List *fargs;
+ ListCell *larg;
+ int nargs, initnargs;
+ Oid init_arg_types[FUNC_MAX_ARGS];
+
+ /* Load the tablesample method */
+ tuple = SearchSysCache1(TABLESAMPLEMETHODNAME, PointerGetDatum(samplemethod));
+ if (!HeapTupleIsValid(tuple))
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_OBJECT),
+ errmsg("tablesample method \"%s\" does not exist",
+ samplemethod),
+ parser_errposition(pstate, location)));
+
+ tablesample = makeNode(TableSampleClause);
+ tablesample->tsmid = HeapTupleGetOid(tuple);
+
+ tsm = (Form_pg_tablesample_method) GETSTRUCT(tuple);
+
+ tablesample->tsmseqscan = tsm->tsmseqscan;
+ tablesample->tsmpagemode = tsm->tsmpagemode;
+ tablesample->tsminit = tsm->tsminit;
+ tablesample->tsmnextblock = tsm->tsmnextblock;
+ tablesample->tsmnexttuple = tsm->tsmnexttuple;
+ tablesample->tsmexaminetuple = tsm->tsmexaminetuple;
+ tablesample->tsmend = tsm->tsmend;
+ tablesample->tsmreset = tsm->tsmreset;
+ tablesample->tsmcost = tsm->tsmcost;
+
+ ReleaseSysCache(tuple);
+
+ /* Validate the parameters against init function definition. */
+ tuple = SearchSysCache1(PROCOID,
+ ObjectIdGetDatum(tablesample->tsminit));
+
+ if (!HeapTupleIsValid(tuple)) /* should not happen */
+ elog(ERROR, "cache lookup failed for function %u",
+ tablesample->tsminit);
+
+ procform = (Form_pg_proc) GETSTRUCT(tuple);
+ initnargs = procform->pronargs;
+ Assert(initnargs >= 3);
+
+ /*
+ * First parameter is used to pass the SampleScanState, second is
+ * seed (REPEATABLE), skip the processing for them here, just assert
+ * that the types are correct.
+ */
+ Assert(procform->proargtypes.values[0] == INTERNALOID);
+ Assert(procform->proargtypes.values[1] == INT4OID);
+ initnargs -= 2;
+ memcpy(init_arg_types, procform->proargtypes.values + 2,
+ initnargs * sizeof(Oid));
+
+ /* Now we are done with the catalog */
+ ReleaseSysCache(tuple);
+
+ /* Process repeatable (seed) */
+ if (repeatable != NULL)
+ {
+ Node *arg = repeatable;
+
+ if (arg && IsA(arg, A_Const))
+ {
+ A_Const *con = (A_Const *) arg;
+
+ if (con->val.type == T_Null)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("REPEATABLE clause must be NOT NULL numeric value"),
+ parser_errposition(pstate, con->location)));
+
+ }
+
+ arg = transformExpr(pstate, arg, EXPR_KIND_FROM_FUNCTION);
+ arg = coerce_to_specific_type(pstate, arg, INT4OID, "REPEATABLE");
+ tablesample->repeatable = arg;
+ }
+ else
+ tablesample->repeatable = NULL;
+
+ /* Check user provided expected number of arguments. */
+ if (list_length(sampleargs) != initnargs)
+ ereport(ERROR,
+ (errcode(ERRCODE_TOO_MANY_ARGUMENTS),
+ errmsg_plural("tablesample method \"%s\" expects %d argument got %d",
+ "tablesample method \"%s\" expects %d arguments got %d",
+ initnargs,
+ samplemethod,
+ initnargs, list_length(sampleargs)),
+ parser_errposition(pstate, location)));
+
+ /* Transform the arguments, typecasting them as needed. */
+ fargs = NIL;
+ nargs = 0;
+ foreach(larg, sampleargs)
+ {
+ Node *inarg = (Node *) lfirst(larg);
+ Node *arg = transformExpr(pstate, inarg, EXPR_KIND_FROM_FUNCTION);
+ Oid argtype = exprType(arg);
+
+ if (argtype != init_arg_types[nargs])
+ {
+ if (!can_coerce_type(1, &argtype, &init_arg_types[nargs],
+ COERCION_IMPLICIT))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("wrong parameter %d for tablesample method \"%s\"",
+ nargs + 1, samplemethod),
+ errdetail("Expected type %s got %s.",
+ format_type_be(init_arg_types[nargs]),
+ format_type_be(argtype)),
+ parser_errposition(pstate, exprLocation(inarg))));
+
+ arg = coerce_type(pstate, arg, argtype, init_arg_types[nargs], -1,
+ COERCION_IMPLICIT, COERCE_IMPLICIT_CAST, -1);
+ }
+
+ fargs = lappend(fargs, arg);
+ nargs++;
+ }
+
+ /* Pass the arguments down */
+ tablesample->args = fargs;
+
+ return tablesample;
+}
+
/* func_match_argtypes()
*
* Given a list of candidate functions (having the right name and number
diff --git a/src/backend/rewrite/rewriteHandler.c b/src/backend/rewrite/rewriteHandler.c
index 9d2c280..385ae9c 100644
--- a/src/backend/rewrite/rewriteHandler.c
+++ b/src/backend/rewrite/rewriteHandler.c
@@ -2160,6 +2160,9 @@ view_query_is_auto_updatable(Query *viewquery, bool check_cols)
base_rte->relkind != RELKIND_VIEW))
return gettext_noop("Views that do not select from a single table or view are not automatically updatable.");
+ if (base_rte->tablesample)
+ return gettext_noop("Views containing TABLESAMPLE are not automatically updatable.");
+
/*
* Check that the view has at least one updatable column. This is required
* for INSERT/UPDATE but not for DELETE.
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index 29b5b1b..a6bd34c 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -32,6 +32,7 @@
#include "catalog/pg_opclass.h"
#include "catalog/pg_operator.h"
#include "catalog/pg_proc.h"
+#include "catalog/pg_tablesample_method.h"
#include "catalog/pg_trigger.h"
#include "catalog/pg_type.h"
#include "commands/defrem.h"
@@ -344,6 +345,8 @@ static void make_ruledef(StringInfo buf, HeapTuple ruletup, TupleDesc rulettc,
int prettyFlags);
static void make_viewdef(StringInfo buf, HeapTuple ruletup, TupleDesc rulettc,
int prettyFlags, int wrapColumn);
+static void get_tablesample_def(TableSampleClause *tablesample,
+ deparse_context *context);
static void get_query_def(Query *query, StringInfo buf, List *parentnamespace,
TupleDesc resultDesc,
int prettyFlags, int wrapColumn, int startIndent);
@@ -4185,6 +4188,50 @@ make_viewdef(StringInfo buf, HeapTuple ruletup, TupleDesc rulettc,
heap_close(ev_relation, AccessShareLock);
}
+/* ----------
+ * get_tablesample_def - Convert TableSampleClause back to SQL
+ * ----------
+ */
+static void
+get_tablesample_def(TableSampleClause *tablesample, deparse_context *context)
+{
+ StringInfo buf = context->buf;
+ HeapTuple tuple;
+ Form_pg_tablesample_method tsm;
+ char *tsmname;
+ int nargs;
+ ListCell *l;
+
+ /* Load the tablesample method */
+ tuple = SearchSysCache1(TABLESAMPLEMETHODOID, ObjectIdGetDatum(tablesample->tsmid));
+ if (!HeapTupleIsValid(tuple))
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_OBJECT),
+ errmsg("cache lookup failed for tablesample method %u",
+ tablesample->tsmid)));
+
+ tsm = (Form_pg_tablesample_method) GETSTRUCT(tuple);
+ tsmname = NameStr(tsm->tsmname);
+ appendStringInfo(buf, " TABLESAMPLE %s (", quote_identifier(tsmname));
+
+ ReleaseSysCache(tuple);
+
+ nargs = 0;
+ foreach(l, tablesample->args)
+ {
+ if (nargs++ > 0)
+ appendStringInfoString(buf, ", ");
+ get_rule_expr((Node *) lfirst(l), context, true);
+ }
+ appendStringInfoChar(buf, ')');
+
+ if (tablesample->repeatable != NULL)
+ {
+ appendStringInfoString(buf, " REPEATABLE (");
+ get_rule_expr(tablesample->repeatable, context, true);
+ appendStringInfoChar(buf, ')');
+ }
+}
/* ----------
* get_query_def - Parse back one query parsetree
@@ -8453,6 +8500,9 @@ get_from_clause_item(Node *jtnode, Query *query, deparse_context *context)
only_marker(rte),
generate_relation_name(rte->relid,
context->namespaces));
+
+ if (rte->tablesample)
+ get_tablesample_def(rte->tablesample, context);
break;
case RTE_SUBQUERY:
/* Subquery RTE */
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index 6a39863..9be3d64 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -30,6 +30,7 @@
#include "catalog/pg_proc.h"
#include "catalog/pg_range.h"
#include "catalog/pg_statistic.h"
+#include "catalog/pg_tablesample_method.h"
#include "catalog/pg_type.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
@@ -2925,3 +2926,29 @@ get_range_subtype(Oid rangeOid)
else
return InvalidOid;
}
+
+/* ---------- PG_TABLESAMPLE_METHOD CACHE ---------- */
+
+/*
+ * get_tablesample_method_name - given a tablesample method OID,
+ * look up the name or NULL if not found
+ */
+char *
+get_tablesample_method_name(Oid tsmid)
+{
+ HeapTuple tuple;
+
+ tuple = SearchSysCache1(TABLESAMPLEMETHODOID, ObjectIdGetDatum(tsmid));
+ if (HeapTupleIsValid(tuple))
+ {
+ Form_pg_tablesample_method tup =
+ (Form_pg_tablesample_method) GETSTRUCT(tuple);
+ char *result;
+
+ result = pstrdup(NameStr(tup->tsmname));
+ ReleaseSysCache(tuple);
+ return result;
+ }
+ else
+ return NULL;
+}
diff --git a/src/backend/utils/cache/syscache.c b/src/backend/utils/cache/syscache.c
index bd27168..3a8f01e 100644
--- a/src/backend/utils/cache/syscache.c
+++ b/src/backend/utils/cache/syscache.c
@@ -55,6 +55,7 @@
#include "catalog/pg_shdescription.h"
#include "catalog/pg_shseclabel.h"
#include "catalog/pg_statistic.h"
+#include "catalog/pg_tablesample_method.h"
#include "catalog/pg_tablespace.h"
#include "catalog/pg_ts_config.h"
#include "catalog/pg_ts_config_map.h"
@@ -642,6 +643,28 @@ static const struct cachedesc cacheinfo[] = {
},
128
},
+ {TableSampleMethodRelationId, /* TABLESAMPLEMETHODNAME */
+ TableSampleMethodNameIndexId,
+ 1,
+ {
+ Anum_pg_tablesample_method_tsmname,
+ 0,
+ 0,
+ 0,
+ },
+ 2
+ },
+ {TableSampleMethodRelationId, /* TABLESAMPLEMETHODOID */
+ TableSampleMethodOidIndexId,
+ 1,
+ {
+ ObjectIdAttributeNumber,
+ 0,
+ 0,
+ 0,
+ },
+ 2
+ },
{TableSpaceRelationId, /* TABLESPACEOID */
TablespaceOidIndexId,
1,
diff --git a/src/backend/utils/misc/sampling.c b/src/backend/utils/misc/sampling.c
index 1eeabaf..9becc63 100644
--- a/src/backend/utils/misc/sampling.c
+++ b/src/backend/utils/misc/sampling.c
@@ -46,6 +46,8 @@ BlockSampler_Init(BlockSampler bs, BlockNumber nblocks, int samplesize,
bs->n = samplesize;
bs->t = 0; /* blocks scanned so far */
bs->m = 0; /* blocks selected so far */
+
+ sampler_random_init_state(randseed, bs->randstate);
}
bool
@@ -92,7 +94,7 @@ BlockSampler_Next(BlockSampler bs)
* less than k, which means that we cannot fail to select enough blocks.
*----------
*/
- V = sampler_random_fract();
+ V = sampler_random_fract(bs->randstate);
p = 1.0 - (double) k / (double) K;
while (V < p)
{
@@ -126,8 +128,14 @@ BlockSampler_Next(BlockSampler bs)
void
reservoir_init_selection_state(ReservoirState rs, int n)
{
+ /*
+ * Reservoir sampling is not used anywhere where it would need to return
+ * repeatable results so we can initialize it randomly.
+ */
+ sampler_random_init_state(random(), rs->randstate);
+
/* Initial value of W (for use when Algorithm Z is first applied) */
- *rs = exp(-log(sampler_random_fract()) / n);
+ rs->W = exp(-log(sampler_random_fract(rs->randstate)) / n);
}
double
@@ -142,7 +150,7 @@ reservoir_get_next_S(ReservoirState rs, double t, int n)
double V,
quot;
- V = sampler_random_fract(); /* Generate V */
+ V = sampler_random_fract(rs->randstate); /* Generate V */
S = 0;
t += 1;
/* Note: "num" in Vitter's code is always equal to t - n */
@@ -158,7 +166,7 @@ reservoir_get_next_S(ReservoirState rs, double t, int n)
else
{
/* Now apply Algorithm Z */
- double W = *rs;
+ double W = rs->W;
double term = t - (double) n + 1;
for (;;)
@@ -174,7 +182,7 @@ reservoir_get_next_S(ReservoirState rs, double t, int n)
tmp;
/* Generate U and X */
- U = sampler_random_fract();
+ U = sampler_random_fract(rs->randstate);
X = t * (W - 1.0);
S = floor(X); /* S is tentatively set to floor(X) */
/* Test if U <= h(S)/cg(X) in the manner of (6.3) */
@@ -203,11 +211,11 @@ reservoir_get_next_S(ReservoirState rs, double t, int n)
y *= numer / denom;
denom -= 1;
}
- W = exp(-log(sampler_random_fract()) / n); /* Generate W in advance */
+ W = exp(-log(sampler_random_fract(rs->randstate)) / n); /* Generate W in advance */
if (exp(log(y) / n) <= (t + X) / t)
break;
}
- *rs = W;
+ rs->W = W;
}
return S;
}
@@ -217,10 +225,17 @@ reservoir_get_next_S(ReservoirState rs, double t, int n)
* Random number generator used by sampling
*----------
*/
+void
+sampler_random_init_state(long seed, SamplerRandomState randstate)
+{
+ randstate[0] = RAND48_SEED_0;
+ randstate[1] = (unsigned short) seed;
+ randstate[2] = (unsigned short) (seed >> 16);
+}
/* Select a random value R uniformly distributed in (0 - 1) */
double
-sampler_random_fract()
+sampler_random_fract(SamplerRandomState randstate)
{
- return ((double) random() + 1) / ((double) MAX_RANDOM_VALUE + 2);
+ return pg_erand48(randstate);
}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 888cce7..3174cc1 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -113,8 +113,12 @@ extern HeapScanDesc heap_beginscan_strat(Relation relation, Snapshot snapshot,
bool allow_strat, bool allow_sync);
extern HeapScanDesc heap_beginscan_bm(Relation relation, Snapshot snapshot,
int nkeys, ScanKey key);
+extern HeapScanDesc heap_beginscan_sampling(Relation relation,
+ Snapshot snapshot, int nkeys, ScanKey key,
+ bool allow_strat, bool allow_pagemode);
extern void heap_setscanlimits(HeapScanDesc scan, BlockNumber startBlk,
BlockNumber endBlk);
+extern void heapgetpage(HeapScanDesc scan, BlockNumber page);
extern void heap_rescan(HeapScanDesc scan, ScanKey key);
extern void heap_endscan(HeapScanDesc scan);
extern HeapTuple heap_getnext(HeapScanDesc scan, ScanDirection direction);
diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
index 9bb6362..e2b2b4f 100644
--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -29,6 +29,7 @@ typedef struct HeapScanDescData
int rs_nkeys; /* number of scan keys */
ScanKey rs_key; /* array of scan key descriptors */
bool rs_bitmapscan; /* true if this is really a bitmap scan */
+ bool rs_samplescan; /* true if this is really a sample scan */
bool rs_pageatatime; /* verify visibility page-at-a-time? */
bool rs_allow_strat; /* allow or disallow use of access strategy */
bool rs_allow_sync; /* allow or disallow use of syncscan */
diff --git a/src/include/access/tablesample.h b/src/include/access/tablesample.h
new file mode 100644
index 0000000..222fa8d
--- /dev/null
+++ b/src/include/access/tablesample.h
@@ -0,0 +1,60 @@
+/*-------------------------------------------------------------------------
+ *
+ * tablesample.h
+ * Public header file for TABLESAMPLE clause interface
+ *
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/access/tablesample.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef TABLESAMPLE_H
+#define TABLESAMPLE_H
+
+#include "access/relscan.h"
+#include "executor/executor.h"
+
+typedef struct TableSampleDesc {
+ HeapScanDesc heapScan;
+ TupleDesc tupDesc; /* Mostly useful for tsmexaminetuple */
+
+ void *tsmdata; /* private method data */
+
+ /* These point to he function of the TABLESAMPLE Method. */
+ FmgrInfo tsminit;
+ FmgrInfo tsmnextblock;
+ FmgrInfo tsmnexttuple;
+ FmgrInfo tsmexaminetuple;
+ FmgrInfo tsmreset;
+ FmgrInfo tsmend;
+} TableSampleDesc;
+
+
+extern TableSampleDesc *tablesample_init(SampleScanState *scanstate,
+ TableSampleClause *tablesample);
+extern HeapTuple tablesample_getnext(TableSampleDesc *desc);
+extern void tablesample_reset(TableSampleDesc *desc);
+extern void tablesample_end(TableSampleDesc *desc);
+extern HeapTuple tablesample_source_getnext(TableSampleDesc *desc);
+extern HeapTuple tablesample_source_gettup(TableSampleDesc *desc, ItemPointer tid,
+ bool *visible);
+
+extern Datum tsm_system_init(PG_FUNCTION_ARGS);
+extern Datum tsm_system_nextblock(PG_FUNCTION_ARGS);
+extern Datum tsm_system_nexttuple(PG_FUNCTION_ARGS);
+extern Datum tsm_system_end(PG_FUNCTION_ARGS);
+extern Datum tsm_system_reset(PG_FUNCTION_ARGS);
+extern Datum tsm_system_cost(PG_FUNCTION_ARGS);
+
+extern Datum tsm_bernoulli_init(PG_FUNCTION_ARGS);
+extern Datum tsm_bernoulli_nextblock(PG_FUNCTION_ARGS);
+extern Datum tsm_bernoulli_nexttuple(PG_FUNCTION_ARGS);
+extern Datum tsm_bernoulli_end(PG_FUNCTION_ARGS);
+extern Datum tsm_bernoulli_reset(PG_FUNCTION_ARGS);
+extern Datum tsm_bernoulli_cost(PG_FUNCTION_ARGS);
+
+
+#endif
diff --git a/src/include/catalog/indexing.h b/src/include/catalog/indexing.h
index a680229..e01bd0c 100644
--- a/src/include/catalog/indexing.h
+++ b/src/include/catalog/indexing.h
@@ -305,6 +305,11 @@ DECLARE_UNIQUE_INDEX(pg_policy_oid_index, 3257, on pg_policy using btree(oid oid
DECLARE_UNIQUE_INDEX(pg_policy_polrelid_polname_index, 3258, on pg_policy using btree(polrelid oid_ops, polname name_ops));
#define PolicyPolrelidPolnameIndexId 3258
+DECLARE_UNIQUE_INDEX(pg_tablesample_method_name_index, 3291, on pg_tablesample_method using btree(tsmname name_ops));
+#define TableSampleMethodNameIndexId 3291
+DECLARE_UNIQUE_INDEX(pg_tablesample_method_oid_index, 3292, on pg_tablesample_method using btree(oid oid_ops));
+#define TableSampleMethodOidIndexId 3292
+
/* last step of initialization script: build the indexes declared above */
BUILD_INDICES
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 8469c82..1ef8198 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -5201,6 +5201,32 @@ DESCR("for use by pg_upgrade");
DATA(insert OID = 3591 ( binary_upgrade_create_empty_extension PGNSP PGUID 12 1 0 0 0 f f f f t f v 7 0 2278 "25 25 16 25 1028 1009 1009" _null_ _null_ _null_ _null_ binary_upgrade_create_empty_extension _null_ _null_ _null_ ));
DESCR("for use by pg_upgrade");
+/* tablesample */
+DATA(insert OID = 3295 ( tsm_system_init PGNSP PGUID 12 1 0 0 0 f f f f t f v 3 0 2278 "2281 23 700" _null_ _null_ _null_ _null_ tsm_system_init _null_ _null_ _null_ ));
+DESCR("tsm_system_init(internal)");
+DATA(insert OID = 3296 ( tsm_system_nextblock PGNSP PGUID 12 1 0 0 0 f f f f t f v 2 0 23 "2281 16" _null_ _null_ _null_ _null_ tsm_system_nextblock _null_ _null_ _null_ ));
+DESCR("tsm_system_nextblock(internal)");
+DATA(insert OID = 3297 ( tsm_system_nexttuple PGNSP PGUID 12 1 0 0 0 f f f f t f v 4 0 21 "2281 23 21 16" _null_ _null_ _null_ _null_ tsm_system_nexttuple _null_ _null_ _null_ ));
+DESCR("tsm_system_nexttuple(internal)");
+DATA(insert OID = 3298 ( tsm_system_end PGNSP PGUID 12 1 0 0 0 f f f f t f v 1 0 2278 "2281" _null_ _null_ _null_ _null_ tsm_system_end _null_ _null_ _null_ ));
+DESCR("tsm_system_end(internal)");
+DATA(insert OID = 3299 ( tsm_system_reset PGNSP PGUID 12 1 0 0 0 f f f f t f v 1 0 2278 "2281" _null_ _null_ _null_ _null_ tsm_system_reset _null_ _null_ _null_ ));
+DESCR("tsm_system_reset(internal)");
+DATA(insert OID = 3300 ( tsm_system_cost PGNSP PGUID 12 1 0 0 0 f f f f t f v 7 0 2278 "2281 2281 2281 2281 2281 2281 2281" _null_ _null_ _null_ _null_ tsm_system_cost _null_ _null_ _null_ ));
+DESCR("tsm_system_cost(internal)");
+
+DATA(insert OID = 3301 ( tsm_bernoulli_init PGNSP PGUID 12 1 0 0 0 f f f f t f v 3 0 2278 "2281 23 700" _null_ _null_ _null_ _null_ tsm_bernoulli_init _null_ _null_ _null_ ));
+DESCR("tsm_bernoulli_init(internal)");
+DATA(insert OID = 3302 ( tsm_bernoulli_nextblock PGNSP PGUID 12 1 0 0 0 f f f f t f v 2 0 23 "2281 16" _null_ _null_ _null_ _null_ tsm_bernoulli_nextblock _null_ _null_ _null_ ));
+DESCR("tsm_bernoulli_nextblock(internal)");
+DATA(insert OID = 3303 ( tsm_bernoulli_nexttuple PGNSP PGUID 12 1 0 0 0 f f f f t f v 4 0 21 "2281 23 21 16" _null_ _null_ _null_ _null_ tsm_bernoulli_nexttuple _null_ _null_ _null_ ));
+DESCR("tsm_bernoulli_nexttuple(internal)");
+DATA(insert OID = 3304 ( tsm_bernoulli_end PGNSP PGUID 12 1 0 0 0 f f f f t f v 1 0 2278 "2281" _null_ _null_ _null_ _null_ tsm_bernoulli_end _null_ _null_ _null_ ));
+DESCR("tsm_bernoulli_end(internal)");
+DATA(insert OID = 3306 ( tsm_bernoulli_reset PGNSP PGUID 12 1 0 0 0 f f f f t f v 1 0 2278 "2281" _null_ _null_ _null_ _null_ tsm_bernoulli_reset _null_ _null_ _null_ ));
+DESCR("tsm_bernoulli_reset(internal)");
+DATA(insert OID = 3307 ( tsm_bernoulli_cost PGNSP PGUID 12 1 0 0 0 f f f f t f v 7 0 2278 "2281 2281 2281 2281 2281 2281 2281" _null_ _null_ _null_ _null_ tsm_bernoulli_cost _null_ _null_ _null_ ));
+DESCR("tsm_bernoulli_cost(internal)");
/*
* Symbolic values for provolatile column: these indicate whether the result
diff --git a/src/include/catalog/pg_tablesample_method.h b/src/include/catalog/pg_tablesample_method.h
new file mode 100644
index 0000000..a58e1cf
--- /dev/null
+++ b/src/include/catalog/pg_tablesample_method.h
@@ -0,0 +1,78 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_tablesample_method.h
+ * definition of the table scan methods.
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/catalog/pg_tablesample_method.h
+ *
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PG_TABLESAMPLE_METHOD_H
+#define PG_TABLESAMPLE_METHOD_H
+
+#include "catalog/genbki.h"
+#include "catalog/objectaddress.h"
+
+/* ----------------
+ * pg_tablesample_method definition. cpp turns this into
+ * typedef struct FormData_pg_tablesample_method
+ * ----------------
+ */
+#define TableSampleMethodRelationId 3290
+
+CATALOG(pg_tablesample_method,3290)
+{
+ NameData tsmname; /* tablesample method name */
+ bool tsmseqscan; /* does this method scan whole table sequentially? */
+ bool tsmpagemode; /* does this method scan page at a time? */
+ regproc tsminit; /* init scan function */
+ regproc tsmnextblock; /* function returning next block to sample
+ or InvalidBlockOffset if finished */
+ regproc tsmnexttuple; /* function returning next tuple offset from current block
+ or InvalidOffsetNumber if end of the block was reacher */
+ regproc tsmexaminetuple; /* optional function which can examine tuple contents and
+ decide if tuple should be returned or not */
+ regproc tsmend; /* end scan function*/
+ regproc tsmreset; /* reset state - used by rescan */
+ regproc tsmcost; /* costing function */
+} FormData_pg_tablesample_method;
+
+/* ----------------
+ * Form_pg_tablesample_method corresponds to a pointer to a tuple with
+ * the format of pg_tablesample_method relation.
+ * ----------------
+ */
+typedef FormData_pg_tablesample_method *Form_pg_tablesample_method;
+
+/* ----------------
+ * compiler constants for pg_tablesample_method
+ * ----------------
+ */
+#define Natts_pg_tablesample_method 10
+#define Anum_pg_tablesample_method_tsmname 1
+#define Anum_pg_tablesample_method_tsmseqscan 2
+#define Anum_pg_tablesample_method_tsmpagemode 3
+#define Anum_pg_tablesample_method_tsminit 4
+#define Anum_pg_tablesample_method_tsmnextblock 5
+#define Anum_pg_tablesample_method_tsmnexttuple 6
+#define Anum_pg_tablesample_method_tsmexaminetuple 7
+#define Anum_pg_tablesample_method_tsmend 8
+#define Anum_pg_tablesample_method_tsmreset 9
+#define Anum_pg_tablesample_method_tsmcost 10
+
+/* ----------------
+ * initial contents of pg_tablesample_method
+ * ----------------
+ */
+
+DATA(insert OID = 3293 ( system false true tsm_system_init tsm_system_nextblock tsm_system_nexttuple - tsm_system_end tsm_system_reset tsm_system_cost ));
+DESCR("SYSTEM table sampling method");
+DATA(insert OID = 3294 ( bernoulli true false tsm_bernoulli_init tsm_bernoulli_nextblock tsm_bernoulli_nexttuple - tsm_bernoulli_end tsm_bernoulli_reset tsm_bernoulli_cost ));
+DESCR("BERNOULLI table sampling method");
+
+#endif /* PG_TABLESAMPLE_METHOD_H */
diff --git a/src/include/executor/nodeSamplescan.h b/src/include/executor/nodeSamplescan.h
new file mode 100644
index 0000000..4b769da
--- /dev/null
+++ b/src/include/executor/nodeSamplescan.h
@@ -0,0 +1,24 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeSamplescan.h
+ *
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/executor/nodeSamplescan.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef NODESAMPLESCAN_H
+#define NODESAMPLESCAN_H
+
+#include "nodes/execnodes.h"
+
+extern SampleScanState *ExecInitSampleScan(SampleScan *node, EState *estate, int eflags);
+extern TupleTableSlot *ExecSampleScan(SampleScanState *node);
+extern void ExecEndSampleScan(SampleScanState *node);
+extern void ExecReScanSampleScan(SampleScanState *node);
+
+#endif /* NODESAMPLESCAN_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index ac75f86..3697e21 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1216,6 +1216,15 @@ typedef struct ScanState
typedef ScanState SeqScanState;
/*
+ * SampleScan
+ */
+typedef struct SampleScanState
+{
+ ScanState ss;
+ struct TableSampleDesc *tsdesc;
+} SampleScanState;
+
+/*
* These structs store information about index quals that don't have simple
* constant right-hand sides. See comments for ExecIndexBuildScanKeys()
* for discussion.
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 38469ef..caaedbf 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -61,6 +61,7 @@ typedef enum NodeTag
T_ValuesScan,
T_CteScan,
T_WorkTableScan,
+ T_SampleScan,
T_ForeignScan,
T_CustomScan,
T_Join,
@@ -97,6 +98,7 @@ typedef enum NodeTag
T_BitmapOrState,
T_ScanState,
T_SeqScanState,
+ T_SampleScanState,
T_IndexScanState,
T_IndexOnlyScanState,
T_BitmapIndexScanState,
@@ -414,6 +416,8 @@ typedef enum NodeTag
T_WithClause,
T_CommonTableExpr,
T_RoleSpec,
+ T_RangeTableSample,
+ T_TableSampleClause,
/*
* TAGS FOR REPLICATION GRAMMAR PARSE NODES (replnodes.h)
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 0e257ac..aea499e 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -334,6 +334,26 @@ typedef struct FuncCall
} FuncCall;
/*
+ * TableSampleClause - a sampling method information
+ */
+typedef struct TableSampleClause
+{
+ NodeTag type;
+ Oid tsmid;
+ bool tsmseqscan;
+ bool tsmpagemode;
+ Oid tsminit;
+ Oid tsmnextblock;
+ Oid tsmnexttuple;
+ Oid tsmexaminetuple;
+ Oid tsmend;
+ Oid tsmreset;
+ Oid tsmcost;
+ Node *repeatable;
+ List *args;
+} TableSampleClause;
+
+/*
* A_Star - '*' representing all columns of a table or compound field
*
* This can appear within ColumnRef.fields, A_Indirection.indirection, and
@@ -534,6 +554,22 @@ typedef struct RangeFunction
} RangeFunction;
/*
+ * RangeTableSample - represents <table> TABLESAMPLE <method> (<params>) REPEATABLE (<num>)
+ *
+ * SQL Standard specifies only one parameter which is percentage. But we allow
+ * custom tablesample methods which may need different input arguments so we
+ * accept list of arguments.
+ */
+typedef struct RangeTableSample
+{
+ NodeTag type;
+ RangeVar *relation;
+ char *method; /* sampling method */
+ Node *repeatable;
+ List *args; /* arguments for sampling method */
+} RangeTableSample;
+
+/*
* ColumnDef - column definition (used in various creates)
*
* If the column has a default value, we may have the value expression
@@ -769,6 +805,7 @@ typedef struct RangeTblEntry
*/
Oid relid; /* OID of the relation */
char relkind; /* relation kind (see pg_class.relkind) */
+ TableSampleClause *tablesample; /* sampling method and parameters */
/*
* Fields valid for a subquery RTE (else NULL):
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 21cbfa8..ddc3708 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -279,6 +279,12 @@ typedef struct Scan
typedef Scan SeqScan;
/* ----------------
+ * table sample scan node
+ * ----------------
+ */
+typedef Scan SampleScan;
+
+/* ----------------
* index scan node
*
* indexqualorig is an implicitly-ANDed list of index qual expressions, each
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 9c2000b..24003ae 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -68,6 +68,7 @@ extern double index_pages_fetched(double tuples_fetched, BlockNumber pages,
double index_pages, PlannerInfo *root);
extern void cost_seqscan(Path *path, PlannerInfo *root, RelOptInfo *baserel,
ParamPathInfo *param_info);
+extern void cost_samplescan(Path *path, PlannerInfo *root, RelOptInfo *baserel);
extern void cost_index(IndexPath *path, PlannerInfo *root,
double loop_count);
extern void cost_bitmap_heap_scan(Path *path, PlannerInfo *root, RelOptInfo *baserel,
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 9923f0e..89c8ded 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -32,6 +32,8 @@ extern bool add_path_precheck(RelOptInfo *parent_rel,
extern Path *create_seqscan_path(PlannerInfo *root, RelOptInfo *rel,
Relids required_outer);
+extern Path *create_samplescan_path(PlannerInfo *root, RelOptInfo *rel,
+ Relids required_outer);
extern IndexPath *create_index_path(PlannerInfo *root,
IndexOptInfo *index,
List *indexclauses,
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index 7c243ec..ae90df8 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -366,6 +366,7 @@ PG_KEYWORD("sysid", SYSID, UNRESERVED_KEYWORD)
PG_KEYWORD("system", SYSTEM_P, UNRESERVED_KEYWORD)
PG_KEYWORD("table", TABLE, RESERVED_KEYWORD)
PG_KEYWORD("tables", TABLES, UNRESERVED_KEYWORD)
+PG_KEYWORD("tablesample", TABLESAMPLE, TYPE_FUNC_NAME_KEYWORD)
PG_KEYWORD("tablespace", TABLESPACE, UNRESERVED_KEYWORD)
PG_KEYWORD("temp", TEMP, UNRESERVED_KEYWORD)
PG_KEYWORD("template", TEMPLATE, UNRESERVED_KEYWORD)
diff --git a/src/include/parser/parse_func.h b/src/include/parser/parse_func.h
index 3264691..40c007c 100644
--- a/src/include/parser/parse_func.h
+++ b/src/include/parser/parse_func.h
@@ -33,6 +33,11 @@ typedef enum
extern Node *ParseFuncOrColumn(ParseState *pstate, List *funcname, List *fargs,
FuncCall *fn, int location);
+extern TableSampleClause *ParseTableSample(ParseState *pstate,
+ char *samplemethod,
+ Node *repeatable, List *args,
+ int location);
+
extern FuncDetailCode func_get_detail(List *funcname,
List *fargs, List *fargnames,
int nargs, Oid *argtypes,
diff --git a/src/include/port.h b/src/include/port.h
index 3787cbf..71113c0 100644
--- a/src/include/port.h
+++ b/src/include/port.h
@@ -357,6 +357,10 @@ extern off_t ftello(FILE *stream);
#endif
#endif
+#define RAND48_SEED_0 (0x330e)
+#define RAND48_SEED_1 (0xabcd)
+#define RAND48_SEED_2 (0x1234)
+
extern double pg_erand48(unsigned short xseed[3]);
extern long pg_lrand48(void);
extern void pg_srand48(long seed);
diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h
index 15bb6d9..ea1aa11 100644
--- a/src/include/utils/lsyscache.h
+++ b/src/include/utils/lsyscache.h
@@ -153,6 +153,7 @@ extern void free_attstatsslot(Oid atttype,
extern char *get_namespace_name(Oid nspid);
extern char *get_namespace_name_or_temp(Oid nspid);
extern Oid get_range_subtype(Oid rangeOid);
+extern char *get_tablesample_method_name(Oid tsmid);
#define type_is_array(typid) (get_element_type(typid) != InvalidOid)
/* type_is_array_domain accepts both plain arrays and domains over arrays */
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index 9e17d87..fd40366 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -63,7 +63,6 @@ typedef struct RelationAmInfo
FmgrInfo amcanreturn;
} RelationAmInfo;
-
/*
* Here are the contents of a relation cache entry.
*/
diff --git a/src/include/utils/sampling.h b/src/include/utils/sampling.h
index e3e7f9c..4ac208d 100644
--- a/src/include/utils/sampling.h
+++ b/src/include/utils/sampling.h
@@ -15,7 +15,12 @@
#include "storage/bufmgr.h"
-extern double sampler_random_fract(void);
+/* Random generator for sampling code */
+typedef unsigned short SamplerRandomState[3];
+
+extern void sampler_random_init_state(long seed,
+ SamplerRandomState randstate);
+extern double sampler_random_fract(SamplerRandomState randstate);
/* Block sampling methods */
/* Data structure for Algorithm S from Knuth 3.4.2 */
@@ -25,6 +30,7 @@ typedef struct
int n; /* desired sample size */
BlockNumber t; /* current block number */
int m; /* blocks selected so far */
+ SamplerRandomState randstate; /* random generator state */
} BlockSamplerData;
typedef BlockSamplerData *BlockSampler;
@@ -35,7 +41,12 @@ extern bool BlockSampler_HasMore(BlockSampler bs);
extern BlockNumber BlockSampler_Next(BlockSampler bs);
/* Reservoid sampling methods */
-typedef double ReservoirStateData;
+typedef struct
+{
+ double W;
+ SamplerRandomState randstate; /* random generator state */
+} ReservoirStateData;
+
typedef ReservoirStateData *ReservoirState;
extern void reservoir_init_selection_state(ReservoirState rs, int n);
diff --git a/src/include/utils/syscache.h b/src/include/utils/syscache.h
index ba0b090..6b628f6 100644
--- a/src/include/utils/syscache.h
+++ b/src/include/utils/syscache.h
@@ -79,6 +79,8 @@ enum SysCacheIdentifier
RELOID,
RULERELNAME,
STATRELATTINH,
+ TABLESAMPLEMETHODNAME,
+ TABLESAMPLEMETHODOID,
TABLESPACEOID,
TSCONFIGMAP,
TSCONFIGNAMENSP,
diff --git a/src/port/erand48.c b/src/port/erand48.c
index 9d47119..12efd81 100644
--- a/src/port/erand48.c
+++ b/src/port/erand48.c
@@ -33,9 +33,6 @@
#include <math.h>
-#define RAND48_SEED_0 (0x330e)
-#define RAND48_SEED_1 (0xabcd)
-#define RAND48_SEED_2 (0x1234)
#define RAND48_MULT_0 (0xe66d)
#define RAND48_MULT_1 (0xdeec)
#define RAND48_MULT_2 (0x0005)
diff --git a/src/test/regress/expected/rowsecurity.out b/src/test/regress/expected/rowsecurity.out
index 44e8dab..79e98f7 100644
--- a/src/test/regress/expected/rowsecurity.out
+++ b/src/test/regress/expected/rowsecurity.out
@@ -101,6 +101,17 @@ NOTICE: f_leak => great manga
44 | 8 | 1 | rls_regress_user2 | great manga | manga
(4 rows)
+SELECT * FROM document TABLESAMPLE BERNOULLI (50) REPEATABLE(1) WHERE f_leak(dtitle) ORDER BY did;
+NOTICE: f_leak => my first novel
+NOTICE: f_leak => my first manga
+NOTICE: f_leak => great science fiction
+ did | cid | dlevel | dauthor | dtitle
+-----+-----+--------+-------------------+-----------------------
+ 1 | 11 | 1 | rls_regress_user1 | my first novel
+ 4 | 44 | 1 | rls_regress_user1 | my first manga
+ 6 | 22 | 1 | rls_regress_user2 | great science fiction
+(3 rows)
+
-- viewpoint from rls_regress_user2
SET SESSION AUTHORIZATION rls_regress_user2;
SELECT * FROM document WHERE f_leak(dtitle) ORDER BY did;
@@ -145,6 +156,21 @@ NOTICE: f_leak => great manga
44 | 8 | 1 | rls_regress_user2 | great manga | manga
(8 rows)
+SELECT * FROM document TABLESAMPLE BERNOULLI (50) REPEATABLE(1) WHERE f_leak(dtitle) ORDER BY did;
+NOTICE: f_leak => my first novel
+NOTICE: f_leak => my second novel
+NOTICE: f_leak => my first manga
+NOTICE: f_leak => great science fiction
+NOTICE: f_leak => great technology book
+ did | cid | dlevel | dauthor | dtitle
+-----+-----+--------+-------------------+-----------------------
+ 1 | 11 | 1 | rls_regress_user1 | my first novel
+ 2 | 11 | 2 | rls_regress_user1 | my second novel
+ 4 | 44 | 1 | rls_regress_user1 | my first manga
+ 6 | 22 | 1 | rls_regress_user2 | great science fiction
+ 7 | 33 | 2 | rls_regress_user2 | great technology book
+(5 rows)
+
EXPLAIN (COSTS OFF) SELECT * FROM document WHERE f_leak(dtitle);
QUERY PLAN
----------------------------------------------------------
diff --git a/src/test/regress/expected/sanity_check.out b/src/test/regress/expected/sanity_check.out
index c7be273..5946edf 100644
--- a/src/test/regress/expected/sanity_check.out
+++ b/src/test/regress/expected/sanity_check.out
@@ -127,6 +127,7 @@ pg_shdepend|t
pg_shdescription|t
pg_shseclabel|t
pg_statistic|t
+pg_tablesample_method|t
pg_tablespace|t
pg_trigger|t
pg_ts_config|t
diff --git a/src/test/regress/expected/tablesample.out b/src/test/regress/expected/tablesample.out
new file mode 100644
index 0000000..271638d
--- /dev/null
+++ b/src/test/regress/expected/tablesample.out
@@ -0,0 +1,216 @@
+CREATE TABLE test_tablesample (id int, name text) WITH (fillfactor=10); -- force smaller pages so we don't have to load too much data to get multiple pages
+INSERT INTO test_tablesample SELECT i, repeat(i::text, 200) FROM generate_series(0, 9) s(i) ORDER BY i;
+SELECT t.id FROM test_tablesample AS t TABLESAMPLE SYSTEM (50) REPEATABLE (10);
+ id
+----
+ 0
+ 1
+ 2
+ 3
+ 4
+ 5
+ 9
+(7 rows)
+
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (100.0/11) REPEATABLE (9999);
+ id
+----
+ 6
+ 7
+ 8
+(3 rows)
+
+SELECT count(*) FROM test_tablesample TABLESAMPLE SYSTEM (100);
+ count
+-------
+ 10
+(1 row)
+
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50) REPEATABLE (100);
+ id
+----
+ 0
+ 1
+ 2
+ 6
+ 7
+ 8
+ 9
+(7 rows)
+
+SELECT id FROM test_tablesample TABLESAMPLE BERNOULLI (50) REPEATABLE (100);
+ id
+----
+ 0
+ 1
+ 3
+ 4
+ 5
+(5 rows)
+
+SELECT id FROM test_tablesample TABLESAMPLE BERNOULLI (5.5) REPEATABLE (1);
+ id
+----
+ 0
+ 5
+(2 rows)
+
+CREATE VIEW test_tablesample_v1 AS SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (10*2) REPEATABLE (2);
+CREATE VIEW test_tablesample_v2 AS SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (99);
+SELECT pg_get_viewdef('test_tablesample_v1'::regclass);
+ pg_get_viewdef
+--------------------------------------------------------------------------------
+ SELECT test_tablesample.id +
+ FROM test_tablesample TABLESAMPLE system (((10 * 2))::real) REPEATABLE (2);
+(1 row)
+
+SELECT pg_get_viewdef('test_tablesample_v2'::regclass);
+ pg_get_viewdef
+-----------------------------------------------------------
+ SELECT test_tablesample.id +
+ FROM test_tablesample TABLESAMPLE system ((99)::real);
+(1 row)
+
+BEGIN;
+DECLARE tablesample_cur CURSOR FOR SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50) REPEATABLE (100);
+FETCH FIRST FROM tablesample_cur;
+ id
+----
+ 0
+(1 row)
+
+FETCH NEXT FROM tablesample_cur;
+ id
+----
+ 1
+(1 row)
+
+FETCH NEXT FROM tablesample_cur;
+ id
+----
+ 2
+(1 row)
+
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50) REPEATABLE (10);
+ id
+----
+ 0
+ 1
+ 2
+ 3
+ 4
+ 5
+ 9
+(7 rows)
+
+FETCH NEXT FROM tablesample_cur;
+ id
+----
+ 6
+(1 row)
+
+FETCH NEXT FROM tablesample_cur;
+ id
+----
+ 7
+(1 row)
+
+FETCH NEXT FROM tablesample_cur;
+ id
+----
+ 8
+(1 row)
+
+FETCH FIRST FROM tablesample_cur;
+ id
+----
+ 0
+(1 row)
+
+FETCH NEXT FROM tablesample_cur;
+ id
+----
+ 1
+(1 row)
+
+FETCH NEXT FROM tablesample_cur;
+ id
+----
+ 2
+(1 row)
+
+FETCH NEXT FROM tablesample_cur;
+ id
+----
+ 6
+(1 row)
+
+FETCH NEXT FROM tablesample_cur;
+ id
+----
+ 7
+(1 row)
+
+FETCH NEXT FROM tablesample_cur;
+ id
+----
+ 8
+(1 row)
+
+CLOSE tablesample_cur;
+END;
+EXPLAIN SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50) REPEATABLE (10);
+ QUERY PLAN
+-------------------------------------------------------------------------------
+ Sample Scan (system) on test_tablesample (cost=0.00..26.35 rows=635 width=4)
+(1 row)
+
+EXPLAIN SELECT * FROM test_tablesample_v1;
+ QUERY PLAN
+-------------------------------------------------------------------------------
+ Sample Scan (system) on test_tablesample (cost=0.00..10.54 rows=254 width=4)
+(1 row)
+
+-- errors
+SELECT id FROM test_tablesample TABLESAMPLE FOOBAR (1);
+ERROR: tablesample method "foobar" does not exist
+LINE 1: SELECT id FROM test_tablesample TABLESAMPLE FOOBAR (1);
+ ^
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50) REPEATABLE (NULL);
+ERROR: REPEATABLE clause must be NOT NULL numeric value
+LINE 1: ... test_tablesample TABLESAMPLE SYSTEM (50) REPEATABLE (NULL);
+ ^
+SELECT id FROM test_tablesample TABLESAMPLE BERNOULLI (-1);
+ERROR: invalid sample size
+HINT: Sample size must be numeric value between 0 and 100 (inclusive).
+SELECT id FROM test_tablesample TABLESAMPLE BERNOULLI (200);
+ERROR: invalid sample size
+HINT: Sample size must be numeric value between 0 and 100 (inclusive).
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (-1);
+ERROR: invalid sample size
+HINT: Sample size must be numeric value between 0 and 100 (inclusive).
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (200);
+ERROR: invalid sample size
+HINT: Sample size must be numeric value between 0 and 100 (inclusive).
+SELECT id FROM test_tablesample_v1 TABLESAMPLE BERNOULLI (1);
+ERROR: TABLESAMPLE clause can only be used on tables and materialized views
+LINE 1: SELECT id FROM test_tablesample_v1 TABLESAMPLE BERNOULLI (1)...
+ ^
+INSERT INTO test_tablesample_v1 VALUES(1);
+ERROR: cannot insert into view "test_tablesample_v1"
+DETAIL: Views containing TABLESAMPLE are not automatically updatable.
+HINT: To enable inserting into the view, provide an INSTEAD OF INSERT trigger or an unconditional ON INSERT DO INSTEAD rule.
+WITH query_select AS (SELECT * FROM test_tablesample)
+SELECT * FROM query_select TABLESAMPLE BERNOULLI (5.5) REPEATABLE (1);
+ERROR: TABLESAMPLE clause can only be used on tables and materialized views
+LINE 2: SELECT * FROM query_select TABLESAMPLE BERNOULLI (5.5) REPEA...
+ ^
+SELECT q.* FROM (SELECT * FROM test_tablesample) as q TABLESAMPLE BERNOULLI (5);
+ERROR: syntax error at or near "TABLESAMPLE"
+LINE 1: ...CT q.* FROM (SELECT * FROM test_tablesample) as q TABLESAMPL...
+ ^
+-- done
+DROP TABLE test_tablesample CASCADE;
+NOTICE: drop cascades to 2 other objects
+DETAIL: drop cascades to view test_tablesample_v1
+drop cascades to view test_tablesample_v2
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 6d3b865..300e1fb 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -83,7 +83,7 @@ test: select_into select_distinct select_distinct_on select_implicit select_havi
# ----------
# Another group of parallel tests
# ----------
-test: brin gin gist spgist privileges security_label collate matview lock replica_identity rowsecurity object_address
+test: brin gin gist spgist privileges security_label collate matview lock replica_identity rowsecurity object_address tablesample
# ----------
# Another group of parallel tests
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 8326894..d815496 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -153,3 +153,4 @@ test: with
test: xml
test: event_trigger
test: stats
+test: tablesample
diff --git a/src/test/regress/sql/rowsecurity.sql b/src/test/regress/sql/rowsecurity.sql
index ed7adbf..71ad21f 100644
--- a/src/test/regress/sql/rowsecurity.sql
+++ b/src/test/regress/sql/rowsecurity.sql
@@ -94,11 +94,15 @@ SET row_security TO ON;
SELECT * FROM document WHERE f_leak(dtitle) ORDER BY did;
SELECT * FROM document NATURAL JOIN category WHERE f_leak(dtitle) ORDER BY did;
+SELECT * FROM document TABLESAMPLE BERNOULLI (50) REPEATABLE(1) WHERE f_leak(dtitle) ORDER BY did;
+
-- viewpoint from rls_regress_user2
SET SESSION AUTHORIZATION rls_regress_user2;
SELECT * FROM document WHERE f_leak(dtitle) ORDER BY did;
SELECT * FROM document NATURAL JOIN category WHERE f_leak(dtitle) ORDER BY did;
+SELECT * FROM document TABLESAMPLE BERNOULLI (50) REPEATABLE(1) WHERE f_leak(dtitle) ORDER BY did;
+
EXPLAIN (COSTS OFF) SELECT * FROM document WHERE f_leak(dtitle);
EXPLAIN (COSTS OFF) SELECT * FROM document NATURAL JOIN category WHERE f_leak(dtitle);
diff --git a/src/test/regress/sql/tablesample.sql b/src/test/regress/sql/tablesample.sql
new file mode 100644
index 0000000..2f4b7de
--- /dev/null
+++ b/src/test/regress/sql/tablesample.sql
@@ -0,0 +1,61 @@
+CREATE TABLE test_tablesample (id int, name text) WITH (fillfactor=10); -- force smaller pages so we don't have to load too much data to get multiple pages
+
+INSERT INTO test_tablesample SELECT i, repeat(i::text, 200) FROM generate_series(0, 9) s(i) ORDER BY i;
+
+SELECT t.id FROM test_tablesample AS t TABLESAMPLE SYSTEM (50) REPEATABLE (10);
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (100.0/11) REPEATABLE (9999);
+SELECT count(*) FROM test_tablesample TABLESAMPLE SYSTEM (100);
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50) REPEATABLE (100);
+SELECT id FROM test_tablesample TABLESAMPLE BERNOULLI (50) REPEATABLE (100);
+SELECT id FROM test_tablesample TABLESAMPLE BERNOULLI (5.5) REPEATABLE (1);
+
+CREATE VIEW test_tablesample_v1 AS SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (10*2) REPEATABLE (2);
+CREATE VIEW test_tablesample_v2 AS SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (99);
+SELECT pg_get_viewdef('test_tablesample_v1'::regclass);
+SELECT pg_get_viewdef('test_tablesample_v2'::regclass);
+
+BEGIN;
+DECLARE tablesample_cur CURSOR FOR SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50) REPEATABLE (100);
+FETCH FIRST FROM tablesample_cur;
+FETCH NEXT FROM tablesample_cur;
+FETCH NEXT FROM tablesample_cur;
+
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50) REPEATABLE (10);
+
+FETCH NEXT FROM tablesample_cur;
+FETCH NEXT FROM tablesample_cur;
+FETCH NEXT FROM tablesample_cur;
+
+FETCH FIRST FROM tablesample_cur;
+FETCH NEXT FROM tablesample_cur;
+FETCH NEXT FROM tablesample_cur;
+FETCH NEXT FROM tablesample_cur;
+FETCH NEXT FROM tablesample_cur;
+FETCH NEXT FROM tablesample_cur;
+
+CLOSE tablesample_cur;
+END;
+
+EXPLAIN SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50) REPEATABLE (10);
+EXPLAIN SELECT * FROM test_tablesample_v1;
+
+-- errors
+SELECT id FROM test_tablesample TABLESAMPLE FOOBAR (1);
+
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50) REPEATABLE (NULL);
+
+SELECT id FROM test_tablesample TABLESAMPLE BERNOULLI (-1);
+SELECT id FROM test_tablesample TABLESAMPLE BERNOULLI (200);
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (-1);
+SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (200);
+
+SELECT id FROM test_tablesample_v1 TABLESAMPLE BERNOULLI (1);
+INSERT INTO test_tablesample_v1 VALUES(1);
+
+WITH query_select AS (SELECT * FROM test_tablesample)
+SELECT * FROM query_select TABLESAMPLE BERNOULLI (5.5) REPEATABLE (1);
+
+SELECT q.* FROM (SELECT * FROM test_tablesample) as q TABLESAMPLE BERNOULLI (5);
+
+-- done
+DROP TABLE test_tablesample CASCADE;
--
1.9.1
0006-tablesample-ddl-v8.patchbinary/octet-stream; name=0006-tablesample-ddl-v8.patchDownload
>From eb0e40665f3bb2946cac2d539dc4e9dd4d135ca8 Mon Sep 17 00:00:00 2001
From: Petr Jelinek <pjmodos@pjmodos.net>
Date: Wed, 7 Jan 2015 23:51:44 +0100
Subject: [PATCH 6/6] tablesample-ddl v8
---
doc/src/sgml/ref/allfiles.sgml | 2 +
doc/src/sgml/ref/create_tablesamplemethod.sgml | 184 ++++++++++
doc/src/sgml/ref/drop_tablesamplemethod.sgml | 87 +++++
doc/src/sgml/reference.sgml | 2 +
src/backend/catalog/dependency.c | 15 +-
src/backend/catalog/objectaddress.c | 65 +++-
src/backend/commands/Makefile | 6 +-
src/backend/commands/dropcmds.c | 4 +
src/backend/commands/event_trigger.c | 3 +
src/backend/commands/tablecmds.c | 1 +
src/backend/commands/tablesample.c | 398 +++++++++++++++++++++
src/backend/parser/gram.y | 14 +-
src/backend/tcop/utility.c | 12 +
src/backend/utils/cache/lsyscache.c | 31 ++
src/bin/pg_dump/common.c | 5 +
src/bin/pg_dump/pg_dump.c | 177 +++++++++
src/bin/pg_dump/pg_dump.h | 11 +-
src/bin/pg_dump/pg_dump_sort.c | 11 +-
src/include/catalog/dependency.h | 1 +
src/include/catalog/pg_tablesample_method.h | 10 +
src/include/nodes/parsenodes.h | 1 +
src/include/parser/kwlist.h | 1 +
src/include/utils/lsyscache.h | 1 +
src/test/modules/Makefile | 3 +-
src/test/modules/tablesample/.gitignore | 4 +
src/test/modules/tablesample/Makefile | 21 ++
.../modules/tablesample/expected/tablesample.out | 38 ++
src/test/modules/tablesample/sql/tablesample.sql | 14 +
src/test/modules/tablesample/tsm_test--1.0.sql | 52 +++
src/test/modules/tablesample/tsm_test.c | 224 ++++++++++++
src/test/modules/tablesample/tsm_test.control | 5 +
31 files changed, 1392 insertions(+), 11 deletions(-)
create mode 100644 doc/src/sgml/ref/create_tablesamplemethod.sgml
create mode 100644 doc/src/sgml/ref/drop_tablesamplemethod.sgml
create mode 100644 src/backend/commands/tablesample.c
create mode 100644 src/test/modules/tablesample/.gitignore
create mode 100644 src/test/modules/tablesample/Makefile
create mode 100644 src/test/modules/tablesample/expected/tablesample.out
create mode 100644 src/test/modules/tablesample/sql/tablesample.sql
create mode 100644 src/test/modules/tablesample/tsm_test--1.0.sql
create mode 100644 src/test/modules/tablesample/tsm_test.c
create mode 100644 src/test/modules/tablesample/tsm_test.control
diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index 211a3c4..93950c5 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -78,6 +78,7 @@ Complete list of usable sgml source files in this directory.
<!ENTITY createServer SYSTEM "create_server.sgml">
<!ENTITY createTable SYSTEM "create_table.sgml">
<!ENTITY createTableAs SYSTEM "create_table_as.sgml">
+<!ENTITY createTablesampleMethod SYSTEM "create_tablesamplemethod.sgml">
<!ENTITY createTableSpace SYSTEM "create_tablespace.sgml">
<!ENTITY createTrigger SYSTEM "create_trigger.sgml">
<!ENTITY createTSConfig SYSTEM "create_tsconfig.sgml">
@@ -119,6 +120,7 @@ Complete list of usable sgml source files in this directory.
<!ENTITY dropSequence SYSTEM "drop_sequence.sgml">
<!ENTITY dropServer SYSTEM "drop_server.sgml">
<!ENTITY dropTable SYSTEM "drop_table.sgml">
+<!ENTITY dropTablesampleMethod SYSTEM "drop_tablesamplemethod.sgml">
<!ENTITY dropTableSpace SYSTEM "drop_tablespace.sgml">
<!ENTITY dropTrigger SYSTEM "drop_trigger.sgml">
<!ENTITY dropTSConfig SYSTEM "drop_tsconfig.sgml">
diff --git a/doc/src/sgml/ref/create_tablesamplemethod.sgml b/doc/src/sgml/ref/create_tablesamplemethod.sgml
new file mode 100644
index 0000000..ff105d2
--- /dev/null
+++ b/doc/src/sgml/ref/create_tablesamplemethod.sgml
@@ -0,0 +1,184 @@
+<!--
+doc/src/sgml/ref/create_tablesamplemethod.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="SQL-CREATETABLESAMPLEMETHOD">
+ <indexterm zone="sql-createtablesamplemethod">
+ <primary>CREATE TABLESAMPLE METHOD</primary>
+ </indexterm>
+
+ <refmeta>
+ <refentrytitle>CREATE TABLESAMPLE METHOD</refentrytitle>
+ <manvolnum>7</manvolnum>
+ <refmiscinfo>SQL - Language Statements</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+ <refname>CREATE TABLESAMPLE METHOD</refname>
+ <refpurpose>define custom tablesample method</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+<synopsis>
+CREATE TABLESAMPLE METHOD <replaceable class="parameter">name</replaceable> (
+ INIT = <replaceable class="parameter">init_function</replaceable> ,
+ NEXTBLOCK = <replaceable class="parameter">nextblock_function</replaceable> ,
+ NEXTTUPLE = <replaceable class="parameter">nexttuple_function</replaceable> ,
+ END = <replaceable class="parameter">end_function</replaceable> ,
+ RESET = <replaceable class="parameter">reset_function</replaceable> ,
+ COST = <replaceable class="parameter">cost_function</replaceable>
+ [ , EXAMINETUPLE = <replaceable class="parameter">examinetuple_function</replaceable> ]
+ [ , SEQSCAN = <replaceable class="parameter">seqscan</replaceable> ]
+ [ , PAGEMODE = <replaceable class="parameter">pagemode</replaceable> ]
+)
+</synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+ <title>Description</title>
+
+ <para>
+ <command>CREATE TABLESAMPLE METHOD</command> creates a tablesample method.
+ A tablesample method provides alrorithm for reading sample part of a table
+ when used in <command>TABLESAMPLE</> clause of a <command>SELECT</>
+ statement.
+ </para>
+
+ <para>
+ You must be a superuser to use <command>CREATE TABLESAMPLE METHOD</command>.
+ </para>
+ </refsect1>
+
+ <refsect1>
+ <title>Parameters</title>
+
+ <variablelist>
+ <varlistentry>
+ <term><replaceable class="parameter">name</replaceable></term>
+ <listitem>
+ <para>
+ The name of the tablesample method to be created. This name must be
+ unique within the database.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable class="parameter">init_function</replaceable></term>
+ <listitem>
+ <para>
+ The name of the init function for the tablesample method.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable class="parameter">nextblock_function</replaceable></term>
+ <listitem>
+ <para>
+ The name of the get-next-block function for the tablesample method.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable class="parameter">nexttuple_function</replaceable></term>
+ <listitem>
+ <para>
+ The name of the get-next-tuple function for the tablesample method.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable class="parameter">end_function</replaceable></term>
+ <listitem>
+ <para>
+ The name of the end function for the tablesample method.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable class="parameter">reset_function</replaceable></term>
+ <listitem>
+ <para>
+ The name of the reset function for the tablesample method.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable class="parameter">cost_function</replaceable></term>
+ <listitem>
+ <para>
+ The name of the costing function for the tablesample method.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable class="parameter">examinetuple_function</replaceable></term>
+ <listitem>
+ <para>
+ The name of the function for inspecting the tuple contents in order
+ to make decision if it should be returned or not. This parameter
+ is optional.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable class="parameter">seqscan</replaceable></term>
+ <listitem>
+ <para>
+ True if the sampling method will do sequential scan of the whole table.
+ Used for cost estimation and syncscan. The default value if not specified
+ is False.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable class="parameter">pagemode</replaceable></term>
+ <listitem>
+ <para>
+ True if the sampling method will read whole page at a time. The default
+ value if not specified is False.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ </variablelist>
+
+ <para>
+ The function names can be schema-qualified if necessary. Argument types
+ are not given, since the argument list for each type of function is
+ predetermined. All functions are required.
+ </para>
+
+ <para>
+ The arguments can appear in any order, not only the one shown above.
+ </para>
+ </refsect1>
+
+ <refsect1>
+ <title>Compatibility</title>
+
+ <para>
+ There is no
+ <command>CREATE TABLESAMPLE METHOD</command> statement in the SQL
+ standard.
+ </para>
+ </refsect1>
+
+ <refsect1>
+ <title>See Also</title>
+
+ <simplelist type="inline">
+ <member><xref linkend="sql-droptablesamplemethod"></member>
+ <member><xref linkend="sql-select"></member>
+ </simplelist>
+ </refsect1>
+</refentry>
diff --git a/doc/src/sgml/ref/drop_tablesamplemethod.sgml b/doc/src/sgml/ref/drop_tablesamplemethod.sgml
new file mode 100644
index 0000000..dffd2ec
--- /dev/null
+++ b/doc/src/sgml/ref/drop_tablesamplemethod.sgml
@@ -0,0 +1,87 @@
+<!--
+doc/src/sgml/ref/drop_tablesamplemethod.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="SQL-DROPTABLESAMPLEMETHOD">
+ <indexterm zone="sql-droptablesamplemethod">
+ <primary>DROP TABLESAMPLE METHOD</primary>
+ </indexterm>
+
+ <refmeta>
+ <refentrytitle>DROP TABLESAMPLE METHOD</refentrytitle>
+ <manvolnum>7</manvolnum>
+ <refmiscinfo>SQL - Language Statements</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+ <refname>DROP TABLESAMPLE METHOD</refname>
+ <refpurpose>remove a custom tablesample method</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+<synopsis>
+DROP TABLESAMPLE METHOD [ IF EXISTS ] <replaceable class="parameter">name</replaceable>
+</synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+ <title>Description</title>
+
+ <para>
+ <command>DROP TABLESAMPLE METHOD</command> drop an existing tablesample
+ method.
+ </para>
+
+ <para>
+ You must be a superuser to use <command>CREATE TABLESAMPLE METHOD</command>.
+ </para>
+ </refsect1>
+
+ <refsect1>
+ <title>Parameters</title>
+
+ <variablelist>
+
+ <varlistentry>
+ <term><literal>IF EXISTS</literal></term>
+ <listitem>
+ <para>
+ Do not throw an error if the tablesample method does not exist.
+ A notice is issued in this case.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable class="parameter">name</replaceable></term>
+ <listitem>
+ <para>
+ The name of an existing tablesample method to be removed.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ </variablelist>
+
+ </refsect1>
+
+ <refsect1>
+ <title>Compatibility</title>
+
+ <para>
+ There is no
+ <command>DROP TABLESAMPLE METHOD</command> statement in the SQL
+ standard.
+ </para>
+ </refsect1>
+
+ <refsect1>
+ <title>See Also</title>
+
+ <simplelist type="inline">
+ <member><xref linkend="sql-createtablesamplemethod"></member>
+ <member><xref linkend="sql-select"></member>
+ </simplelist>
+ </refsect1>
+</refentry>
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index fb18d94..0b576ad 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -106,6 +106,7 @@
&createServer;
&createTable;
&createTableAs;
+ &createTablesampleMethod;
&createTableSpace;
&createTSConfig;
&createTSDictionary;
@@ -147,6 +148,7 @@
&dropSequence;
&dropServer;
&dropTable;
+ &dropTablesampleMethod;
&dropTableSpace;
&dropTSConfig;
&dropTSDictionary;
diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index bacb242..6acb5b3 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -46,6 +46,7 @@
#include "catalog/pg_policy.h"
#include "catalog/pg_proc.h"
#include "catalog/pg_rewrite.h"
+#include "catalog/pg_tablesample_method.h"
#include "catalog/pg_tablespace.h"
#include "catalog/pg_trigger.h"
#include "catalog/pg_ts_config.h"
@@ -157,7 +158,8 @@ static const Oid object_classes[MAX_OCLASS] = {
DefaultAclRelationId, /* OCLASS_DEFACL */
ExtensionRelationId, /* OCLASS_EXTENSION */
EventTriggerRelationId, /* OCLASS_EVENT_TRIGGER */
- PolicyRelationId /* OCLASS_POLICY */
+ PolicyRelationId, /* OCLASS_POLICY */
+ TableSampleMethodRelationId /* OCLASS_TABLESAMPLEMETHOD */
};
@@ -1265,6 +1267,10 @@ doDeletion(const ObjectAddress *object, int flags)
RemovePolicyById(object->objectId);
break;
+ case OCLASS_TABLESAMPLEMETHOD:
+ RemoveTablesampleMethodById(object->objectId);
+ break;
+
default:
elog(ERROR, "unrecognized object class: %u",
object->classId);
@@ -1794,6 +1800,10 @@ find_expr_references_walker(Node *node,
case RTE_RELATION:
add_object_address(OCLASS_CLASS, rte->relid, 0,
context->addrs);
+ if (rte->tablesample)
+ add_object_address(OCLASS_TABLESAMPLEMETHOD,
+ rte->tablesample->tsmid, 0,
+ context->addrs);
break;
default:
break;
@@ -2373,6 +2383,9 @@ getObjectClass(const ObjectAddress *object)
case PolicyRelationId:
return OCLASS_POLICY;
+
+ case TableSampleMethodRelationId:
+ return OCLASS_TABLESAMPLEMETHOD;
}
/* shouldn't get here */
diff --git a/src/backend/catalog/objectaddress.c b/src/backend/catalog/objectaddress.c
index 30cb699..f936332 100644
--- a/src/backend/catalog/objectaddress.c
+++ b/src/backend/catalog/objectaddress.c
@@ -44,6 +44,7 @@
#include "catalog/pg_proc.h"
#include "catalog/pg_policy.h"
#include "catalog/pg_rewrite.h"
+#include "catalog/pg_tablesample_method.h"
#include "catalog/pg_tablespace.h"
#include "catalog/pg_trigger.h"
#include "catalog/pg_ts_config.h"
@@ -429,7 +430,19 @@ static const ObjectPropertyType ObjectProperty[] =
Anum_pg_type_typacl,
ACL_KIND_TYPE,
true
- }
+ },
+ {
+ TableSampleMethodRelationId,
+ TableSampleMethodOidIndexId,
+ TABLESAMPLEMETHODOID,
+ TABLESAMPLEMETHODNAME,
+ Anum_pg_tablesample_method_tsmname,
+ InvalidAttrNumber,
+ InvalidAttrNumber,
+ InvalidAttrNumber,
+ -1,
+ true
+ },
};
/*
@@ -528,7 +541,9 @@ ObjectTypeMap[] =
/* OCLASS_EVENT_TRIGGER */
{ "event trigger", OBJECT_EVENT_TRIGGER },
/* OCLASS_POLICY */
- { "policy", OBJECT_POLICY }
+ { "policy", OBJECT_POLICY },
+ /* OCLASS_TABLESAMPLEMETHOD */
+ { "tablesample method", OBJECT_TABLESAMPLEMETHOD }
};
const ObjectAddress InvalidObjectAddress =
@@ -683,6 +698,7 @@ get_object_address(ObjectType objtype, List *objname, List *objargs,
case OBJECT_FDW:
case OBJECT_FOREIGN_SERVER:
case OBJECT_EVENT_TRIGGER:
+ case OBJECT_TABLESAMPLEMETHOD:
address = get_object_address_unqualified(objtype,
objname, missing_ok);
break;
@@ -921,6 +937,9 @@ get_object_address_unqualified(ObjectType objtype,
case OBJECT_EVENT_TRIGGER:
msg = gettext_noop("event trigger name cannot be qualified");
break;
+ case OBJECT_TABLESAMPLEMETHOD:
+ msg = gettext_noop("tablesample method name cannot be qualified");
+ break;
default:
elog(ERROR, "unrecognized objtype: %d", (int) objtype);
msg = NULL; /* placate compiler */
@@ -981,6 +1000,11 @@ get_object_address_unqualified(ObjectType objtype,
address.objectId = get_event_trigger_oid(name, missing_ok);
address.objectSubId = 0;
break;
+ case OBJECT_TABLESAMPLEMETHOD:
+ address.classId = TableSampleMethodRelationId;
+ address.objectId = get_tablesample_method_oid(name, missing_ok);
+ address.objectSubId = 0;
+ break;
default:
elog(ERROR, "unrecognized objtype: %d", (int) objtype);
/* placate compiler, which doesn't know elog won't return */
@@ -2044,6 +2068,7 @@ check_object_ownership(Oid roleid, ObjectType objtype, ObjectAddress address,
break;
case OBJECT_TSPARSER:
case OBJECT_TSTEMPLATE:
+ case OBJECT_TABLESAMPLEMETHOD:
/* We treat these object types as being owned by superusers */
if (!superuser_arg(roleid))
ereport(ERROR,
@@ -2982,6 +3007,21 @@ getObjectDescription(const ObjectAddress *object)
break;
}
+ case OCLASS_TABLESAMPLEMETHOD:
+ {
+ HeapTuple tup;
+
+ tup = SearchSysCache1(TABLESAMPLEMETHODOID,
+ ObjectIdGetDatum(object->objectId));
+ if (!HeapTupleIsValid(tup))
+ elog(ERROR, "cache lookup failed for tablesample method %u",
+ object->objectId);
+ appendStringInfo(&buffer, _("tablesample method %s"),
+ NameStr(((Form_pg_tablesample_method) GETSTRUCT(tup))->tsmname));
+ ReleaseSysCache(tup);
+ break;
+ }
+
default:
appendStringInfo(&buffer, "unrecognized object %u %u %d",
object->classId,
@@ -3459,6 +3499,10 @@ getObjectTypeDescription(const ObjectAddress *object)
appendStringInfoString(&buffer, "policy");
break;
+ case OCLASS_TABLESAMPLEMETHOD:
+ appendStringInfoString(&buffer, "tablesample method");
+ break;
+
default:
appendStringInfo(&buffer, "unrecognized %u", object->classId);
break;
@@ -4381,6 +4425,23 @@ getObjectIdentityParts(const ObjectAddress *object,
break;
}
+ case OCLASS_TABLESAMPLEMETHOD:
+ {
+ HeapTuple tup;
+ Form_pg_tablesample_method tsmForm;
+
+ tup = SearchSysCache1(TABLESAMPLEMETHODOID,
+ ObjectIdGetDatum(object->objectId));
+ if (!HeapTupleIsValid(tup))
+ elog(ERROR, "cache lookup failed for tablesample method %u",
+ object->objectId);
+ tsmForm = (Form_pg_tablesample_method) GETSTRUCT(tup);
+ appendStringInfoString(&buffer,
+ quote_identifier(NameStr(tsmForm->tsmname)));
+ ReleaseSysCache(tup);
+ break;
+ }
+
default:
appendStringInfo(&buffer, "unrecognized object %u %u %d",
object->classId,
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index b1ac704..04fcd8c 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -18,8 +18,8 @@ OBJS = aggregatecmds.o alter.o analyze.o async.o cluster.o comment.o \
event_trigger.o explain.o extension.o foreigncmds.o functioncmds.o \
indexcmds.o lockcmds.o matview.o operatorcmds.o opclasscmds.o \
policy.o portalcmds.o prepare.o proclang.o \
- schemacmds.o seclabel.o sequence.o tablecmds.o tablespace.o trigger.o \
- tsearchcmds.o typecmds.o user.o vacuum.o vacuumlazy.o \
- variable.o view.o
+ schemacmds.o seclabel.o sequence.o tablecmds.o tablesample.o \
+ tablespace.o trigger.o tsearchcmds.o typecmds.o user.o vacuum.o \
+ vacuumlazy.o variable.o view.o
include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/commands/dropcmds.c b/src/backend/commands/dropcmds.c
index a1b0d4d..c307dcf 100644
--- a/src/backend/commands/dropcmds.c
+++ b/src/backend/commands/dropcmds.c
@@ -429,6 +429,10 @@ does_not_exist_skipping(ObjectType objtype, List *objname, List *objargs)
}
}
break;
+ case OBJECT_TABLESAMPLEMETHOD:
+ msg = gettext_noop("tablesample method \"%s\" does not exist, skipping");
+ name = NameListToString(objname);
+ break;
default:
elog(ERROR, "unexpected object type (%d)", (int) objtype);
break;
diff --git a/src/backend/commands/event_trigger.c b/src/backend/commands/event_trigger.c
index f07fd06..5a8d286 100644
--- a/src/backend/commands/event_trigger.c
+++ b/src/backend/commands/event_trigger.c
@@ -97,6 +97,7 @@ static event_trigger_support_data event_trigger_support[] = {
{"SEQUENCE", true},
{"SERVER", true},
{"TABLE", true},
+ {"TABLESAMPLE METHOD", true},
{"TABLESPACE", false},
{"TRIGGER", true},
{"TEXT SEARCH CONFIGURATION", true},
@@ -1090,6 +1091,7 @@ EventTriggerSupportsObjectType(ObjectType obtype)
case OBJECT_SEQUENCE:
case OBJECT_TABCONSTRAINT:
case OBJECT_TABLE:
+ case OBJECT_TABLESAMPLEMETHOD:
case OBJECT_TRIGGER:
case OBJECT_TSCONFIGURATION:
case OBJECT_TSDICTIONARY:
@@ -1147,6 +1149,7 @@ EventTriggerSupportsObjectClass(ObjectClass objclass)
case OCLASS_DEFACL:
case OCLASS_EXTENSION:
case OCLASS_POLICY:
+ case OCLASS_TABLESAMPLEMETHOD:
return true;
case MAX_OCLASS:
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 06e4332..e527caf 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -8236,6 +8236,7 @@ ATExecAlterColumnType(AlteredTableInfo *tab, Relation rel,
case OCLASS_USER_MAPPING:
case OCLASS_DEFACL:
case OCLASS_EXTENSION:
+ case OCLASS_TABLESAMPLEMETHOD:
/*
* We don't expect any of these sorts of objects to depend on
diff --git a/src/backend/commands/tablesample.c b/src/backend/commands/tablesample.c
new file mode 100644
index 0000000..33581f6
--- /dev/null
+++ b/src/backend/commands/tablesample.c
@@ -0,0 +1,398 @@
+/*-------------------------------------------------------------------------
+ *
+ * tablesample.c
+ * Commands to manipulate tablesample methods
+ *
+ * Table sampling methods provide algorithms for doing sample scan over
+ * the table.
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ * src/backend/commands/tablesample.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <unistd.h>
+#include <dirent.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "catalog/dependency.h"
+#include "catalog/indexing.h"
+#include "catalog/objectaccess.h"
+#include "catalog/pg_proc.h"
+#include "catalog/pg_tablesample_method.h"
+#include "catalog/pg_type.h"
+#include "commands/defrem.h"
+#include "miscadmin.h"
+#include "parser/parse_func.h"
+#include "utils/builtins.h"
+#include "utils/rel.h"
+#include "utils/lsyscache.h"
+#include "utils/syscache.h"
+
+
+static Datum
+get_tablesample_method_func(DefElem *defel, int attnum)
+{
+ List *funcName = defGetQualifiedName(defel);
+ /* Big enough size for our needs. */
+ Oid *typeId = palloc0(7 * sizeof(Oid));
+ Oid retTypeId;
+ int nargs;
+ Oid procOid = InvalidOid;
+ FuncCandidateList clist;
+
+ switch (attnum)
+ {
+ case Anum_pg_tablesample_method_tsminit:
+ /*
+ * tsminit needs special handling because it is defined as function
+ * with 3 or more arguments and only first two arguments must have
+ * specific type, the rest is up to the tablesample method creator.
+ */
+ {
+ nargs = 2;
+ typeId[0] = INTERNALOID;
+ typeId[1] = INT4OID;
+ retTypeId = VOIDOID;
+
+ clist = FuncnameGetCandidates(funcName, -1, NIL, false, false, false);
+
+ while (clist)
+ {
+ if (clist->nargs >= 3 &&
+ memcmp(typeId, clist->args, nargs * sizeof(Oid)) == 0)
+ {
+ procOid = clist->oid;
+ /* Save real function signature for future errors. */
+ nargs = clist->nargs;
+ pfree(typeId);
+ typeId = clist->args;
+ break;
+ }
+ clist = clist->next;
+ }
+
+ if (!OidIsValid(procOid))
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_FUNCTION),
+ errmsg("function \"%s\" does not exist or does not have valid signature",
+ NameListToString(funcName)),
+ errhint("The tamplesample method init function "
+ "must have at least 3 input parameters "
+ "with first one of type INTERNAL and second of type INTEGER.")));
+ }
+ break;
+
+ case Anum_pg_tablesample_method_tsmnextblock:
+ nargs = 1;
+ typeId[0] = INTERNALOID;
+ retTypeId = INT4OID;
+ break;
+ case Anum_pg_tablesample_method_tsmnexttuple:
+ nargs = 3;
+ typeId[0] = INTERNALOID;
+ typeId[1] = INT4OID;
+ typeId[2] = INT2OID;
+ retTypeId = INT2OID;
+ break;
+ case Anum_pg_tablesample_method_tsmexaminetuple:
+ nargs = 4;
+ typeId[0] = INTERNALOID;
+ typeId[1] = INT4OID;
+ typeId[2] = INTERNALOID;
+ typeId[3] = BOOLOID;
+ retTypeId = BOOLOID;
+ break;
+ case Anum_pg_tablesample_method_tsmend:
+ case Anum_pg_tablesample_method_tsmreset:
+ nargs = 1;
+ typeId[0] = INTERNALOID;
+ retTypeId = VOIDOID;
+ break;
+ case Anum_pg_tablesample_method_tsmcost:
+ nargs = 7;
+ typeId[0] = INTERNALOID;
+ typeId[1] = INTERNALOID;
+ typeId[2] = INTERNALOID;
+ typeId[3] = INTERNALOID;
+ typeId[4] = INTERNALOID;
+ typeId[5] = INTERNALOID;
+ typeId[6] = INTERNALOID;
+ retTypeId = VOIDOID;
+ break;
+ default:
+ /* should not be here */
+ elog(ERROR, "unrecognized attribute for tablesample method: %d",
+ attnum);
+ nargs = 0; /* keep compiler quiet */
+ }
+
+ if (!OidIsValid(procOid))
+ procOid = LookupFuncName(funcName, nargs, typeId, false);
+ if (get_func_rettype(procOid) != retTypeId)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("function %s should return type %s",
+ func_signature_string(funcName, nargs, NIL, typeId),
+ format_type_be(retTypeId))));
+
+ return ObjectIdGetDatum(procOid);
+}
+
+/*
+ * make pg_depend entries for a new pg_tablesample_method entry
+ */
+static void
+makeTablesampleMethodDeps(HeapTuple tuple)
+{
+ Form_pg_tablesample_method tsm = (Form_pg_tablesample_method) GETSTRUCT(tuple);
+ ObjectAddress myself,
+ referenced;
+
+ myself.classId = TableSampleMethodRelationId;
+ myself.objectId = HeapTupleGetOid(tuple);
+ myself.objectSubId = 0;
+
+ /* dependency on extension */
+ recordDependencyOnCurrentExtension(&myself, false);
+
+ /* dependencies on functions */
+ referenced.classId = ProcedureRelationId;
+ referenced.objectSubId = 0;
+
+ referenced.objectId = tsm->tsminit;
+ recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);
+
+ referenced.objectId = tsm->tsmnextblock;
+ recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);
+
+ referenced.objectId = tsm->tsmnexttuple;
+ recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);
+
+ if (OidIsValid(tsm->tsmexaminetuple))
+ {
+ referenced.objectId = tsm->tsmexaminetuple;
+ recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);
+ }
+
+ referenced.objectId = tsm->tsmend;
+ recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);
+
+ referenced.objectId = tsm->tsmreset;
+ recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);
+
+ referenced.objectId = tsm->tsmcost;
+ recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);
+}
+
+/*
+ * Create a table sampling method
+ *
+ * Only superusers can create a table sampling methods.
+ */
+ObjectAddress
+DefineTablesampleMethod(List *names, List *parameters)
+{
+ char *tsmname = strVal(linitial(names));
+ Oid tsmoid;
+ ListCell *pl;
+ Relation rel;
+ Datum values[Natts_pg_tablesample_method];
+ bool nulls[Natts_pg_tablesample_method];
+ HeapTuple tuple;
+ ObjectAddress address;
+
+ /* Must be super user. */
+ if (!superuser())
+ ereport(ERROR,
+ (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+ errmsg("permission denied to create tablesample method \"%s\"",
+ tsmname),
+ errhint("Must be superuser to create a tablesample method.")));
+
+ /* Must not already exist. */
+ tsmoid = get_tablesample_method_oid(tsmname, true);
+ if (OidIsValid(tsmoid))
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_OBJECT),
+ errmsg("tablesample method \"%s\" already exists",
+ tsmname)));
+
+ /* Initialize the values. */
+ memset(values, 0, sizeof(values));
+ memset(nulls, false, sizeof(nulls));
+
+ values[Anum_pg_tablesample_method_tsmname - 1] =
+ DirectFunctionCall1(namein, CStringGetDatum(tsmname));
+
+ /*
+ * loop over the definition list and extract the information we need.
+ */
+ foreach(pl, parameters)
+ {
+ DefElem *defel = (DefElem *) lfirst(pl);
+
+ if (pg_strcasecmp(defel->defname, "seqscan") == 0)
+ {
+ values[Anum_pg_tablesample_method_tsmseqscan - 1] =
+ BoolGetDatum(defGetBoolean(defel));
+ }
+ else if (pg_strcasecmp(defel->defname, "pagemode") == 0)
+ {
+ values[Anum_pg_tablesample_method_tsmpagemode - 1] =
+ BoolGetDatum(defGetBoolean(defel));
+ }
+ else if (pg_strcasecmp(defel->defname, "init") == 0)
+ {
+ values[Anum_pg_tablesample_method_tsminit - 1] =
+ get_tablesample_method_func(defel,
+ Anum_pg_tablesample_method_tsminit);
+ }
+ else if (pg_strcasecmp(defel->defname, "nextblock") == 0)
+ {
+ values[Anum_pg_tablesample_method_tsmnextblock - 1] =
+ get_tablesample_method_func(defel,
+ Anum_pg_tablesample_method_tsmnextblock);
+ }
+ else if (pg_strcasecmp(defel->defname, "nexttuple") == 0)
+ {
+ values[Anum_pg_tablesample_method_tsmnexttuple - 1] =
+ get_tablesample_method_func(defel,
+ Anum_pg_tablesample_method_tsmnexttuple);
+ }
+ else if (pg_strcasecmp(defel->defname, "examinetuple") == 0)
+ {
+ values[Anum_pg_tablesample_method_tsmexaminetuple - 1] =
+ get_tablesample_method_func(defel,
+ Anum_pg_tablesample_method_tsmexaminetuple);
+ }
+ else if (pg_strcasecmp(defel->defname, "end") == 0)
+ {
+ values[Anum_pg_tablesample_method_tsmend - 1] =
+ get_tablesample_method_func(defel,
+ Anum_pg_tablesample_method_tsmend);
+ }
+ else if (pg_strcasecmp(defel->defname, "reset") == 0)
+ {
+ values[Anum_pg_tablesample_method_tsmreset - 1] =
+ get_tablesample_method_func(defel,
+ Anum_pg_tablesample_method_tsmreset);
+ }
+ else if (pg_strcasecmp(defel->defname, "cost") == 0)
+ {
+ values[Anum_pg_tablesample_method_tsmcost - 1] =
+ get_tablesample_method_func(defel,
+ Anum_pg_tablesample_method_tsmcost);
+ }
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("tablesample method parameter \"%s\" not recognized",
+ defel->defname)));
+ }
+
+ /*
+ * Validation.
+ */
+ if (!OidIsValid(DatumGetObjectId(values[Anum_pg_tablesample_method_tsminit - 1])))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("tablesample method init function is required")));
+
+ if (!OidIsValid(DatumGetObjectId(values[Anum_pg_tablesample_method_tsmnextblock - 1])))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("tablesample method nextblock function is required")));
+
+ if (!OidIsValid(DatumGetObjectId(values[Anum_pg_tablesample_method_tsmnexttuple - 1])))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("tablesample method nexttuple function is required")));
+
+ if (!OidIsValid(DatumGetObjectId(values[Anum_pg_tablesample_method_tsmend - 1])))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("tablesample method end function is required")));
+
+ if (!OidIsValid(DatumGetObjectId(values[Anum_pg_tablesample_method_tsmreset - 1])))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("tablesample method reset function is required")));
+
+ if (!OidIsValid(DatumGetObjectId(values[Anum_pg_tablesample_method_tsmcost - 1])))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("tablesample method cost function is required")));
+
+ /*
+ * Insert tuple into pg_tablesample_method.
+ */
+ rel = heap_open(TableSampleMethodRelationId, RowExclusiveLock);
+
+ tuple = heap_form_tuple(rel->rd_att, values, nulls);
+
+ tsmoid = simple_heap_insert(rel, tuple);
+
+ CatalogUpdateIndexes(rel, tuple);
+
+ makeTablesampleMethodDeps(tuple);
+
+ heap_freetuple(tuple);
+
+ /* Post creation hook for new tablesample method */
+ InvokeObjectPostCreateHook(TableSampleMethodRelationId, tsmoid, 0);
+
+ ObjectAddressSet(address, TableSampleMethodRelationId, tsmoid);
+
+ heap_close(rel, RowExclusiveLock);
+
+ return address;
+}
+
+/*
+ * Drop a tablesample method.
+ */
+void
+RemoveTablesampleMethodById(Oid tsmoid)
+{
+ Relation rel;
+ HeapTuple tuple;
+ Form_pg_tablesample_method tsm;
+
+ /*
+ * Find the target tuple
+ */
+ rel = heap_open(TableSampleMethodRelationId, RowExclusiveLock);
+
+ tuple = SearchSysCache1(TABLESAMPLEMETHODOID, ObjectIdGetDatum(tsmoid));
+ if (!HeapTupleIsValid(tuple))
+ elog(ERROR, "cache lookup failed for tablesample method %u",
+ tsmoid);
+
+ tsm = (Form_pg_tablesample_method) GETSTRUCT(tuple);
+ /* Can't drop builtin tablesample methods. */
+ if (tsmoid == TABLESAMPLE_METHOD_SYSTEM_OID ||
+ tsmoid == TABLESAMPLE_METHOD_BERNOULLI_OID)
+ ereport(ERROR,
+ (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+ errmsg("permission denied for tablesample method %s",
+ NameStr(tsm->tsmname))));
+
+ /*
+ * Remove the pg_tablespace tuple (this will roll back if we fail below)
+ */
+ simple_heap_delete(rel, &tuple->t_self);
+
+ ReleaseSysCache(tuple);
+
+ heap_close(rel, RowExclusiveLock);
+}
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index d5405ad..acb1f18 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -590,7 +590,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
LEADING LEAKPROOF LEAST LEFT LEVEL LIKE LIMIT LISTEN LOAD LOCAL
LOCALTIME LOCALTIMESTAMP LOCATION LOCK_P LOCKED LOGGED
- MAPPING MATCH MATERIALIZED MAXVALUE MINUTE_P MINVALUE MODE MONTH_P MOVE
+ MAPPING MATCH MATERIALIZED MAXVALUE METHOD MINUTE_P MINVALUE MODE MONTH_P
+ MOVE
NAME_P NAMES NATIONAL NATURAL NCHAR NEXT NO NONE
NOT NOTHING NOTIFY NOTNULL NOWAIT NULL_P NULLIF
@@ -5103,6 +5104,15 @@ DefineStmt:
n->definition = list_make1(makeDefElem("from", (Node *) $5));
$$ = (Node *)n;
}
+ | CREATE TABLESAMPLE METHOD name definition
+ {
+ DefineStmt *n = makeNode(DefineStmt);
+ n->kind = OBJECT_TABLESAMPLEMETHOD;
+ n->args = NIL;
+ n->defnames = list_make1(makeString($4));
+ n->definition = $5;
+ $$ = (Node *)n;
+ }
;
definition: '(' def_list ')' { $$ = $2; }
@@ -5557,6 +5567,7 @@ drop_type: TABLE { $$ = OBJECT_TABLE; }
| TEXT_P SEARCH DICTIONARY { $$ = OBJECT_TSDICTIONARY; }
| TEXT_P SEARCH TEMPLATE { $$ = OBJECT_TSTEMPLATE; }
| TEXT_P SEARCH CONFIGURATION { $$ = OBJECT_TSCONFIGURATION; }
+ | TABLESAMPLE METHOD { $$ = OBJECT_TABLESAMPLEMETHOD; }
;
any_name_list:
@@ -13410,6 +13421,7 @@ unreserved_keyword:
| MATCH
| MATERIALIZED
| MAXVALUE
+ | METHOD
| MINUTE_P
| MINVALUE
| MODE
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index fd09d3a..cadf6b4 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -23,6 +23,7 @@
#include "access/xlog.h"
#include "catalog/catalog.h"
#include "catalog/namespace.h"
+#include "catalog/pg_tablesample_method.h"
#include "catalog/toasting.h"
#include "commands/alter.h"
#include "commands/async.h"
@@ -1136,6 +1137,11 @@ ProcessUtilitySlow(Node *parsetree,
Assert(stmt->args == NIL);
DefineCollation(stmt->defnames, stmt->definition);
break;
+ case OBJECT_TABLESAMPLEMETHOD:
+ Assert(stmt->args == NIL);
+ Assert(list_length(stmt->defnames) == 1);
+ DefineTablesampleMethod(stmt->defnames, stmt->definition);
+ break;
default:
elog(ERROR, "unrecognized define stmt type: %d",
(int) stmt->kind);
@@ -2004,6 +2010,9 @@ CreateCommandTag(Node *parsetree)
case OBJECT_POLICY:
tag = "DROP POLICY";
break;
+ case OBJECT_TABLESAMPLEMETHOD:
+ tag = "DROP TABLESAMPLE METHOD";
+ break;
default:
tag = "???";
}
@@ -2100,6 +2109,9 @@ CreateCommandTag(Node *parsetree)
case OBJECT_COLLATION:
tag = "CREATE COLLATION";
break;
+ case OBJECT_TABLESAMPLEMETHOD:
+ tag = "CREATE TABLESAMPLE METHOD";
+ break;
default:
tag = "???";
}
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index 9be3d64..2c90997 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -2930,6 +2930,37 @@ get_range_subtype(Oid rangeOid)
/* ---------- PG_TABLESAMPLE_METHOD CACHE ---------- */
/*
+ * get_tablesample_method_oid - given a tablesample method name,
+ * look up the OID
+ *
+ * If missing_ok is false, throw an error if tablesample method name not found.
+ * If true, just return InvalidOid.
+ */
+Oid
+get_tablesample_method_oid(const char *tsmname, bool missing_ok)
+{
+ Oid result;
+ HeapTuple tuple;
+
+ tuple = SearchSysCache1(TABLESAMPLEMETHODNAME, PointerGetDatum(tsmname));
+ if (HeapTupleIsValid(tuple))
+ {
+ result = HeapTupleGetOid(tuple);
+ ReleaseSysCache(tuple);
+ }
+ else
+ result = InvalidOid;
+
+ if (!OidIsValid(result) && !missing_ok)
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_OBJECT),
+ errmsg("tablesample method \"%s\" does not exist",
+ tsmname)));
+
+ return result;
+}
+
+/*
* get_tablesample_method_name - given a tablesample method OID,
* look up the name or NULL if not found
*/
diff --git a/src/bin/pg_dump/common.c b/src/bin/pg_dump/common.c
index 1a0a587..8a64e4b 100644
--- a/src/bin/pg_dump/common.c
+++ b/src/bin/pg_dump/common.c
@@ -103,6 +103,7 @@ getSchemaData(Archive *fout, DumpOptions *dopt, int *numTablesPtr)
int numForeignServers;
int numDefaultACLs;
int numEventTriggers;
+ int numTSMs;
if (g_verbose)
write_msg(NULL, "reading schemas\n");
@@ -251,6 +252,10 @@ getSchemaData(Archive *fout, DumpOptions *dopt, int *numTablesPtr)
write_msg(NULL, "reading policies\n");
getPolicies(fout, tblinfo, numTables);
+ if (g_verbose)
+ write_msg(NULL, "reading tablesample methods\n");
+ getTableSampleMethods(fout, &numTSMs);
+
*numTablesPtr = numTables;
return tblinfo;
}
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index fe08c1b..69ca267 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -182,6 +182,7 @@ static void dumpSequenceData(Archive *fout, TableDataInfo *tdinfo);
static void dumpIndex(Archive *fout, DumpOptions *dopt, IndxInfo *indxinfo);
static void dumpConstraint(Archive *fout, DumpOptions *dopt, ConstraintInfo *coninfo);
static void dumpTableConstraintComment(Archive *fout, DumpOptions *dopt, ConstraintInfo *coninfo);
+static void dumpTableSampleMethod(Archive *fout, DumpOptions *dopt, TSMInfo *tbinfo);
static void dumpTSParser(Archive *fout, DumpOptions *dopt, TSParserInfo *prsinfo);
static void dumpTSDictionary(Archive *fout, DumpOptions *dopt, TSDictInfo *dictinfo);
static void dumpTSTemplate(Archive *fout, DumpOptions *dopt, TSTemplateInfo *tmplinfo);
@@ -7134,6 +7135,78 @@ getTableAttrs(Archive *fout, DumpOptions *dopt, TableInfo *tblinfo, int numTable
}
/*
+ * getTableSampleMethods:
+ * read all tablesample methods in the system catalogs and return them
+ * in the TSMInfo* structure
+ *
+ * numTSMs is set to the number of tablesample methods read in
+ */
+TSMInfo *
+getTableSampleMethods(Archive *fout, int *numTSMs)
+{
+ PGresult *res;
+ int ntups;
+ int i;
+ PQExpBuffer query;
+ TSMInfo *tsminfo;
+ int i_tableoid,
+ i_oid,
+ i_tsmname,
+ i_tsmseqscan,
+ i_tsmpagemode;
+
+ /* Before 9.5, there were no tablesample methods */
+ if (fout->remoteVersion < 90500)
+ {
+ *numTSMs = 0;
+ return NULL;
+ }
+
+ query = createPQExpBuffer();
+
+ appendPQExpBuffer(query,
+ "SELECT tableoid, oid, tsmname, tsmseqscan, tsmpagemode "
+ "FROM pg_catalog.pg_tablesample_method "
+ "WHERE oid >= '%u'::pg_catalog.oid",
+ FirstNormalObjectId);
+
+ res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+
+ ntups = PQntuples(res);
+ *numTSMs = ntups;
+
+ tsminfo = (TSMInfo *) pg_malloc(ntups * sizeof(TSMInfo));
+
+ i_tableoid = PQfnumber(res, "tableoid");
+ i_oid = PQfnumber(res, "oid");
+ i_tsmname = PQfnumber(res, "tsmname");
+ i_tsmseqscan = PQfnumber(res, "tsmseqscan");
+ i_tsmpagemode = PQfnumber(res, "tsmpagemode");
+
+ for (i = 0; i < ntups; i++)
+ {
+ tsminfo[i].dobj.objType = DO_TABLESAMPLE_METHOD;
+ tsminfo[i].dobj.catId.tableoid = atooid(PQgetvalue(res, i, i_tableoid));
+ tsminfo[i].dobj.catId.oid = atooid(PQgetvalue(res, i, i_oid));
+ AssignDumpId(&tsminfo[i].dobj);
+ tsminfo[i].dobj.name = pg_strdup(PQgetvalue(res, i, i_tsmname));
+ tsminfo[i].dobj.namespace = NULL;
+ tsminfo[i].tsmseqscan = PQgetvalue(res, i, i_tsmseqscan)[0] == 't';
+ tsminfo[i].tsmpagemode = PQgetvalue(res, i, i_tsmpagemode)[0] == 't';
+
+ /* Decide whether we want to dump it */
+ selectDumpableObject(&(tsminfo[i].dobj));
+ }
+
+ PQclear(res);
+
+ destroyPQExpBuffer(query);
+
+ return tsminfo;
+}
+
+
+/*
* Test whether a column should be printed as part of table's CREATE TABLE.
* Column number is zero-based.
*
@@ -8226,6 +8299,9 @@ dumpDumpableObject(Archive *fout, DumpOptions *dopt, DumpableObject *dobj)
case DO_POLICY:
dumpPolicy(fout, dopt, (PolicyInfo *) dobj);
break;
+ case DO_TABLESAMPLE_METHOD:
+ dumpTableSampleMethod(fout, dopt, (TSMInfo *) dobj);
+ break;
case DO_PRE_DATA_BOUNDARY:
case DO_POST_DATA_BOUNDARY:
/* never dumped, nothing to do */
@@ -12226,6 +12302,106 @@ dumpAgg(Archive *fout, DumpOptions *dopt, AggInfo *agginfo)
}
/*
+ * dumpTableSampleMethod
+ * write the declaration of one user-defined tablesample method
+ */
+static void
+dumpTableSampleMethod(Archive *fout, DumpOptions *dopt, TSMInfo *tsminfo)
+{
+ PGresult *res;
+ PQExpBuffer q;
+ PQExpBuffer delq;
+ PQExpBuffer labelq;
+ PQExpBuffer query;
+ char *tsminit;
+ char *tsmnextblock;
+ char *tsmnexttuple;
+ char *tsmexaminetuple;
+ char *tsmend;
+ char *tsmreset;
+ char *tsmcost;
+
+ /* Skip if not to be dumped */
+ if (!tsminfo->dobj.dump || dopt->dataOnly)
+ return;
+
+ q = createPQExpBuffer();
+ delq = createPQExpBuffer();
+ labelq = createPQExpBuffer();
+ query = createPQExpBuffer();
+
+ /* Make sure we are in proper schema */
+ selectSourceSchema(fout, "pg_catalog");
+
+ appendPQExpBuffer(query, "SELECT tsminit, tsmnextblock, "
+ "tsmnexttuple, tsmexaminetuple, "
+ "tsmend, tsmreset, tsmcost "
+ "FROM pg_catalog.pg_tablesample_method "
+ "WHERE oid = '%u'::pg_catalog.oid",
+ tsminfo->dobj.catId.oid);
+
+ res = ExecuteSqlQueryForSingleRow(fout, query->data);
+
+ tsminit = PQgetvalue(res, 0, PQfnumber(res, "tsminit"));
+ tsmnexttuple = PQgetvalue(res, 0, PQfnumber(res, "tsmnexttuple"));
+ tsmnextblock = PQgetvalue(res, 0, PQfnumber(res, "tsmnextblock"));
+ tsmexaminetuple = PQgetvalue(res, 0, PQfnumber(res, "tsmexaminetuple"));
+ tsmend = PQgetvalue(res, 0, PQfnumber(res, "tsmend"));
+ tsmreset = PQgetvalue(res, 0, PQfnumber(res, "tsmreset"));
+ tsmcost = PQgetvalue(res, 0, PQfnumber(res, "tsmcost"));
+
+ appendPQExpBuffer(q, "CREATE TABLESAMPLE METHOD %s (\n",
+ fmtId(tsminfo->dobj.name));
+
+ appendPQExpBuffer(q, " INIT = %s,\n", tsminit);
+ appendPQExpBuffer(q, " NEXTTUPLE = %s,\n", tsmnexttuple);
+ appendPQExpBuffer(q, " NEXTBLOCK = %s,\n", tsmnextblock);
+ appendPQExpBuffer(q, " END = %s,\n", tsmend);
+ appendPQExpBuffer(q, " RESET = %s,\n", tsmreset);
+ appendPQExpBuffer(q, " COST = %s", tsmcost);
+
+ if (strcmp(tsmexaminetuple, "-") != 0)
+ appendPQExpBuffer(q, ",\n EXAMINETUPLE = %s", tsmexaminetuple);
+
+ if (tsminfo->tsmseqscan)
+ appendPQExpBufferStr(q, ",\n SEQSCAN = true");
+
+ if (tsminfo->tsmpagemode)
+ appendPQExpBufferStr(q, ",\n PAGEMODE = true");
+
+ appendPQExpBufferStr(q, "\n);");
+
+ appendPQExpBuffer(delq, "DROP TABLESAMPLE METHOD %s",
+ fmtId(tsminfo->dobj.name));
+
+ appendPQExpBuffer(labelq, "TABLESAMPLE METHOD %s",
+ fmtId(tsminfo->dobj.name));
+
+ if (dopt->binary_upgrade)
+ binary_upgrade_extension_member(q, &tsminfo->dobj, labelq->data);
+
+ ArchiveEntry(fout, tsminfo->dobj.catId, tsminfo->dobj.dumpId,
+ tsminfo->dobj.name,
+ NULL,
+ NULL,
+ "",
+ false, "TABLESAMPLE METHOD", SECTION_PRE_DATA,
+ q->data, delq->data, NULL,
+ NULL, 0,
+ NULL, NULL);
+
+ /* Dump Parser Comments */
+ dumpComment(fout, dopt, labelq->data,
+ NULL, "",
+ tsminfo->dobj.catId, 0, tsminfo->dobj.dumpId);
+
+ PQclear(res);
+ destroyPQExpBuffer(q);
+ destroyPQExpBuffer(delq);
+ destroyPQExpBuffer(labelq);
+}
+
+/*
* dumpTSParser
* write out a single text search parser
*/
@@ -15659,6 +15835,7 @@ addBoundaryDependencies(DumpableObject **dobjs, int numObjs,
case DO_FDW:
case DO_FOREIGN_SERVER:
case DO_BLOB:
+ case DO_TABLESAMPLE_METHOD:
/* Pre-data objects: must come before the pre-data boundary */
addObjectDependency(preDataBound, dobj->dumpId);
break;
diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h
index a9d3c10..87bef24 100644
--- a/src/bin/pg_dump/pg_dump.h
+++ b/src/bin/pg_dump/pg_dump.h
@@ -76,7 +76,8 @@ typedef enum
DO_POST_DATA_BOUNDARY,
DO_EVENT_TRIGGER,
DO_REFRESH_MATVIEW,
- DO_POLICY
+ DO_POLICY,
+ DO_TABLESAMPLE_METHOD
} DumpableObjectType;
typedef struct _dumpableObject
@@ -383,6 +384,13 @@ typedef struct _inhInfo
Oid inhparent; /* OID of its parent */
} InhInfo;
+typedef struct _tsmInfo
+{
+ DumpableObject dobj;
+ bool tsmseqscan;
+ bool tsmpagemode;
+} TSMInfo;
+
typedef struct _prsInfo
{
DumpableObject dobj;
@@ -536,6 +544,7 @@ extern ProcLangInfo *getProcLangs(Archive *fout, int *numProcLangs);
extern CastInfo *getCasts(Archive *fout, DumpOptions *dopt, int *numCasts);
extern void getTableAttrs(Archive *fout, DumpOptions *dopt, TableInfo *tbinfo, int numTables);
extern bool shouldPrintColumn(DumpOptions *dopt, TableInfo *tbinfo, int colno);
+extern TSMInfo *getTableSampleMethods(Archive *fout, int *numTSMs);
extern TSParserInfo *getTSParsers(Archive *fout, int *numTSParsers);
extern TSDictInfo *getTSDictionaries(Archive *fout, int *numTSDicts);
extern TSTemplateInfo *getTSTemplates(Archive *fout, int *numTSTemplates);
diff --git a/src/bin/pg_dump/pg_dump_sort.c b/src/bin/pg_dump/pg_dump_sort.c
index c5ed593..9567cf6 100644
--- a/src/bin/pg_dump/pg_dump_sort.c
+++ b/src/bin/pg_dump/pg_dump_sort.c
@@ -73,7 +73,8 @@ static const int oldObjectTypePriority[] =
13, /* DO_POST_DATA_BOUNDARY */
20, /* DO_EVENT_TRIGGER */
15, /* DO_REFRESH_MATVIEW */
- 21 /* DO_POLICY */
+ 21, /* DO_POLICY */
+ 5 /* DO_TABLESAMPLE_METHOD */
};
/*
@@ -122,7 +123,8 @@ static const int newObjectTypePriority[] =
25, /* DO_POST_DATA_BOUNDARY */
32, /* DO_EVENT_TRIGGER */
33, /* DO_REFRESH_MATVIEW */
- 34 /* DO_POLICY */
+ 34, /* DO_POLICY */
+ 17 /* DO_TABLESAMPLE_METHOD */
};
static DumpId preDataBoundId;
@@ -1460,6 +1462,11 @@ describeDumpableObject(DumpableObject *obj, char *buf, int bufsize)
"POLICY (ID %d OID %u)",
obj->dumpId, obj->catId.oid);
return;
+ case DO_TABLESAMPLE_METHOD:
+ snprintf(buf, bufsize,
+ "TABLESAMPLE METHOD %s (ID %d OID %u)",
+ obj->name, obj->dumpId, obj->catId.oid);
+ return;
case DO_PRE_DATA_BOUNDARY:
snprintf(buf, bufsize,
"PRE-DATA BOUNDARY (ID %d)",
diff --git a/src/include/catalog/dependency.h b/src/include/catalog/dependency.h
index 6481ac8..30653f8 100644
--- a/src/include/catalog/dependency.h
+++ b/src/include/catalog/dependency.h
@@ -148,6 +148,7 @@ typedef enum ObjectClass
OCLASS_EXTENSION, /* pg_extension */
OCLASS_EVENT_TRIGGER, /* pg_event_trigger */
OCLASS_POLICY, /* pg_policy */
+ OCLASS_TABLESAMPLEMETHOD, /* pg_tablesample_method */
MAX_OCLASS /* MUST BE LAST */
} ObjectClass;
diff --git a/src/include/catalog/pg_tablesample_method.h b/src/include/catalog/pg_tablesample_method.h
index a58e1cf..82c15f3 100644
--- a/src/include/catalog/pg_tablesample_method.h
+++ b/src/include/catalog/pg_tablesample_method.h
@@ -72,7 +72,17 @@ typedef FormData_pg_tablesample_method *Form_pg_tablesample_method;
DATA(insert OID = 3293 ( system false true tsm_system_init tsm_system_nextblock tsm_system_nexttuple - tsm_system_end tsm_system_reset tsm_system_cost ));
DESCR("SYSTEM table sampling method");
+#define TABLESAMPLE_METHOD_SYSTEM_OID 3293
DATA(insert OID = 3294 ( bernoulli true false tsm_bernoulli_init tsm_bernoulli_nextblock tsm_bernoulli_nexttuple - tsm_bernoulli_end tsm_bernoulli_reset tsm_bernoulli_cost ));
DESCR("BERNOULLI table sampling method");
+#define TABLESAMPLE_METHOD_BERNOULLI_OID 3294
+
+/* ----------------
+ * functions for manipulation of pg_tablesample_method
+ * ----------------
+ */
+
+extern ObjectAddress DefineTablesampleMethod(List *names, List *parameters);
+extern void RemoveTablesampleMethodById(Oid tsmoid);
#endif /* PG_TABLESAMPLE_METHOD_H */
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index aea499e..a2bb920 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -1292,6 +1292,7 @@ typedef enum ObjectType
OBJECT_SEQUENCE,
OBJECT_TABCONSTRAINT,
OBJECT_TABLE,
+ OBJECT_TABLESAMPLEMETHOD,
OBJECT_TABLESPACE,
OBJECT_TRIGGER,
OBJECT_TSCONFIGURATION,
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index ae90df8..902c189 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -236,6 +236,7 @@ PG_KEYWORD("mapping", MAPPING, UNRESERVED_KEYWORD)
PG_KEYWORD("match", MATCH, UNRESERVED_KEYWORD)
PG_KEYWORD("materialized", MATERIALIZED, UNRESERVED_KEYWORD)
PG_KEYWORD("maxvalue", MAXVALUE, UNRESERVED_KEYWORD)
+PG_KEYWORD("method", METHOD, UNRESERVED_KEYWORD)
PG_KEYWORD("minute", MINUTE_P, UNRESERVED_KEYWORD)
PG_KEYWORD("minvalue", MINVALUE, UNRESERVED_KEYWORD)
PG_KEYWORD("mode", MODE, UNRESERVED_KEYWORD)
diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h
index ea1aa11..518f27f 100644
--- a/src/include/utils/lsyscache.h
+++ b/src/include/utils/lsyscache.h
@@ -153,6 +153,7 @@ extern void free_attstatsslot(Oid atttype,
extern char *get_namespace_name(Oid nspid);
extern char *get_namespace_name_or_temp(Oid nspid);
extern Oid get_range_subtype(Oid rangeOid);
+extern Oid get_tablesample_method_oid(const char *tsmname, bool missing_ok);
extern char *get_tablesample_method_name(Oid tsmid);
#define type_is_array(typid) (get_element_type(typid) != InvalidOid)
diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile
index 93d93af..37ea524 100644
--- a/src/test/modules/Makefile
+++ b/src/test/modules/Makefile
@@ -9,7 +9,8 @@ SUBDIRS = \
worker_spi \
dummy_seclabel \
test_shm_mq \
- test_parser
+ test_parser \
+ tablesample
all: submake-errcodes
diff --git a/src/test/modules/tablesample/.gitignore b/src/test/modules/tablesample/.gitignore
new file mode 100644
index 0000000..5dcb3ff
--- /dev/null
+++ b/src/test/modules/tablesample/.gitignore
@@ -0,0 +1,4 @@
+# Generated subdirectories
+/log/
+/results/
+/tmp_check/
diff --git a/src/test/modules/tablesample/Makefile b/src/test/modules/tablesample/Makefile
new file mode 100644
index 0000000..469b004
--- /dev/null
+++ b/src/test/modules/tablesample/Makefile
@@ -0,0 +1,21 @@
+# src/test/modules/tsm_test/Makefile
+
+MODULE_big = tsm_test
+OBJS = tsm_test.o $(WIN32RES)
+PGFILEDESC = "tsm_test - example of a custom tablesample method"
+
+EXTENSION = tsm_test
+DATA = tsm_test--1.0.sql
+
+REGRESS = tablesample
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/tablesample
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/src/test/modules/tablesample/expected/tablesample.out b/src/test/modules/tablesample/expected/tablesample.out
new file mode 100644
index 0000000..ad62e32
--- /dev/null
+++ b/src/test/modules/tablesample/expected/tablesample.out
@@ -0,0 +1,38 @@
+CREATE EXTENSION tsm_test;
+CREATE TABLE test_tsm AS SELECT md5(i::text) a, 0.5::float b FROM generate_series(1,10) g(i);
+SELECT * FROM test_tsm TABLESAMPLE tsm_test('b') REPEATABLE (1);
+ a | b
+----------------------------------+-----
+ c4ca4238a0b923820dcc509a6f75849b | 0.5
+ c81e728d9d4c2f636f067f89cc14862c | 0.5
+ a87ff679a2f3e71d9181a67b7542122c | 0.5
+ 1679091c5a880faf6fb5e6087eb1b2dc | 0.5
+ 8f14e45fceea167a5a36dedd4bea2543 | 0.5
+ d3d9446802a44259755d38e6d163e820 | 0.5
+(6 rows)
+
+CREATE VIEW test_tsm_v AS SELECT * FROM test_tsm TABLESAMPLE tsm_test('b') REPEATABLE (9999);
+SELECT * FROM test_tsm_v;
+ a | b
+----------------------------------+-----
+ c4ca4238a0b923820dcc509a6f75849b | 0.5
+ e4da3b7fbbce2345d7772b0674a318d5 | 0.5
+ 1679091c5a880faf6fb5e6087eb1b2dc | 0.5
+ 8f14e45fceea167a5a36dedd4bea2543 | 0.5
+ c9f0f895fb98ab9159f51fd0297e236d | 0.5
+(5 rows)
+
+DROP TABLESAMPLE METHOD tsm_test;
+ERROR: cannot drop tablesample method tsm_test because extension tsm_test requires it
+HINT: You can drop extension tsm_test instead.
+DROP EXTENSION tsm_test;
+ERROR: cannot drop extension tsm_test because other objects depend on it
+DETAIL: view test_tsm_v depends on tablesample method tsm_test
+HINT: Use DROP ... CASCADE to drop the dependent objects too.
+DROP EXTENSION tsm_test CASCADE;
+NOTICE: drop cascades to view test_tsm_v
+SELECT * FROM pg_tablesample_method WHERE tsmname = 'tsm_test';
+ tsmname | tsmseqscan | tsmpagemode | tsminit | tsmnextblock | tsmnexttuple | tsmexaminetuple | tsmend | tsmreset | tsmcost
+---------+------------+-------------+---------+--------------+--------------+-----------------+--------+----------+---------
+(0 rows)
+
diff --git a/src/test/modules/tablesample/sql/tablesample.sql b/src/test/modules/tablesample/sql/tablesample.sql
new file mode 100644
index 0000000..b1104d6
--- /dev/null
+++ b/src/test/modules/tablesample/sql/tablesample.sql
@@ -0,0 +1,14 @@
+CREATE EXTENSION tsm_test;
+
+CREATE TABLE test_tsm AS SELECT md5(i::text) a, 0.5::float b FROM generate_series(1,10) g(i);
+
+SELECT * FROM test_tsm TABLESAMPLE tsm_test('b') REPEATABLE (1);
+
+CREATE VIEW test_tsm_v AS SELECT * FROM test_tsm TABLESAMPLE tsm_test('b') REPEATABLE (9999);
+SELECT * FROM test_tsm_v;
+
+DROP TABLESAMPLE METHOD tsm_test;
+DROP EXTENSION tsm_test;
+DROP EXTENSION tsm_test CASCADE;
+
+SELECT * FROM pg_tablesample_method WHERE tsmname = 'tsm_test';
diff --git a/src/test/modules/tablesample/tsm_test--1.0.sql b/src/test/modules/tablesample/tsm_test--1.0.sql
new file mode 100644
index 0000000..e5a9ae8
--- /dev/null
+++ b/src/test/modules/tablesample/tsm_test--1.0.sql
@@ -0,0 +1,52 @@
+/* src/test/modules/tablesample/tsm_test--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION tsm_test" to load this file. \quit
+
+CREATE FUNCTION tsm_test_init(internal, int4, text)
+RETURNS void
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION tsm_test_nextblock(internal)
+RETURNS int4
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION tsm_test_nexttuple(internal, int4, int2)
+RETURNS int2
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION tsm_test_examinetuple(internal, int4, internal, bool)
+RETURNS bool
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION tsm_test_end(internal)
+RETURNS void
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION tsm_test_reset(internal)
+RETURNS void
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION tsm_test_cost(internal, internal, internal, internal, internal, internal, internal)
+RETURNS void
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+
+CREATE TABLESAMPLE METHOD tsm_test (
+ SEQSCAN = true,
+ PAGEMODE = true,
+ INIT = tsm_test_init,
+ NEXTBLOCK = tsm_test_nextblock,
+ NEXTTUPLE = tsm_test_nexttuple,
+ EXAMINETUPLE = tsm_test_examinetuple,
+ END = tsm_test_end,
+ RESET = tsm_test_reset,
+ COST = tsm_test_cost
+);
diff --git a/src/test/modules/tablesample/tsm_test.c b/src/test/modules/tablesample/tsm_test.c
new file mode 100644
index 0000000..5d76c78
--- /dev/null
+++ b/src/test/modules/tablesample/tsm_test.c
@@ -0,0 +1,224 @@
+/*-------------------------------------------------------------------------
+ *
+ * tsm_test.c
+ * Simple example of a custom tablesample method
+ *
+ * Copyright (c) 2007-2014, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/test/modules/tablesample/tsm_test.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "fmgr.h"
+
+#include "access/tablesample.h"
+#include "access/htup_details.h"
+#include "access/relscan.h"
+#include "access/tupdesc.h"
+#include "catalog/pg_type.h"
+#include "nodes/execnodes.h"
+#include "nodes/relation.h"
+#include "optimizer/clauses.h"
+#include "storage/bufmgr.h"
+#include "utils/builtins.h"
+#include "utils/lsyscache.h"
+#include "utils/rel.h"
+#include "utils/sampling.h"
+
+PG_MODULE_MAGIC;
+
+/* State */
+typedef struct
+{
+ uint32 seed; /* random seed */
+ AttrNumber attnum; /* column to check */
+ BlockNumber startblock; /* starting block, we use ths for syncscan support */
+ BlockNumber nblocks; /* total blocks in relation */
+ BlockNumber blockno; /* current block */
+ OffsetNumber lt; /* last tuple returned from current block */
+ SamplerRandomState randstate; /* random generator state */
+} TestSamplerState;
+
+
+PG_FUNCTION_INFO_V1(tsm_test_init);
+PG_FUNCTION_INFO_V1(tsm_test_nextblock);
+PG_FUNCTION_INFO_V1(tsm_test_nexttuple);
+PG_FUNCTION_INFO_V1(tsm_test_examinetuple);
+PG_FUNCTION_INFO_V1(tsm_test_end);
+PG_FUNCTION_INFO_V1(tsm_test_reset);
+PG_FUNCTION_INFO_V1(tsm_test_cost);
+
+/*
+ * Initialize the state.
+ */
+Datum
+tsm_test_init(PG_FUNCTION_ARGS)
+{
+ TableSampleDesc *tsdesc = (TableSampleDesc *) PG_GETARG_POINTER(0);
+ uint32 seed = PG_GETARG_UINT32(1);
+ char *attname;
+ AttrNumber attnum;
+ Oid atttype;
+ HeapScanDesc scan = tsdesc->heapScan;
+ TestSamplerState *sampler;
+
+ if (PG_ARGISNULL(2))
+ ereport(ERROR,
+ (errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED),
+ errmsg("invalid parameter for tablesample method tsm_test"),
+ errhint("attnum cannot be NULL.")));
+
+ attname = text_to_cstring(PG_GETARG_TEXT_P(2));
+
+ attnum = get_attnum(scan->rs_rd->rd_id, attname);
+ if (!AttrNumberIsForUserDefinedAttr(attnum))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid parameter for tablesample method tsm_test"),
+ errhint("column %s does not exist", attname)));
+
+ atttype = get_atttype(scan->rs_rd->rd_id, attnum);
+ if (atttype != FLOAT8OID)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid parameter for tablesample method tsm_test"),
+ errhint("column %s is not of type float.", attname)));
+
+ sampler = palloc0(sizeof(TestSamplerState));
+
+ /* Remember initial values for reinit */
+ sampler->seed = seed;
+ sampler->attnum = attnum;
+ sampler->startblock = scan->rs_startblock;
+ sampler->nblocks = scan->rs_nblocks;
+ sampler->blockno = InvalidBlockNumber;
+ sampler->lt = InvalidOffsetNumber;
+ sampler_random_init_state(sampler->seed, sampler->randstate);
+
+ tsdesc->tsmdata = sampler;
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Get next block number to read or InvalidBlockNumber if we are at the
+ * end of the relation.
+ */
+Datum
+tsm_test_nextblock(PG_FUNCTION_ARGS)
+{
+ TableSampleDesc *tsdesc = (TableSampleDesc *) PG_GETARG_POINTER(0);
+ TestSamplerState *sampler = (TestSamplerState *) tsdesc->tsmdata;
+
+ /* Cycle from startblock to startblock to support syncscan. */
+ if (sampler->blockno == InvalidBlockNumber)
+ sampler->blockno = sampler->startblock;
+ else
+ {
+ sampler->blockno++;
+
+ if (sampler->blockno >= sampler->nblocks)
+ sampler->blockno = 0;
+
+ if (sampler->blockno == sampler->startblock)
+ PG_RETURN_UINT32(InvalidBlockNumber);
+ }
+
+ PG_RETURN_UINT32(sampler->blockno);
+}
+
+/*
+ * Get next tuple from current block.
+ */
+Datum
+tsm_test_nexttuple(PG_FUNCTION_ARGS)
+{
+ TableSampleDesc *tsdesc = (TableSampleDesc *) PG_GETARG_POINTER(0);
+ OffsetNumber maxoffset = PG_GETARG_UINT16(2);
+ TestSamplerState *sampler = (TestSamplerState *) tsdesc->tsmdata;
+
+ if (sampler->lt == InvalidOffsetNumber)
+ sampler->lt = FirstOffsetNumber;
+ else if (++sampler->lt > maxoffset)
+ PG_RETURN_UINT16(InvalidOffsetNumber);
+
+ PG_RETURN_UINT16(sampler->lt);
+}
+
+/*
+ * Examine tuple and decide if it should be returned.
+ */
+Datum
+tsm_test_examinetuple(PG_FUNCTION_ARGS)
+{
+ TableSampleDesc *tsdesc = (TableSampleDesc *) PG_GETARG_POINTER(0);
+ HeapTuple tuple = (HeapTuple) PG_GETARG_POINTER(2);
+ bool visible = PG_GETARG_BOOL(3);
+ TestSamplerState *sampler = (TestSamplerState *) tsdesc->tsmdata;
+ bool isnull;
+ float8 val, rand;
+
+ if (!visible)
+ PG_RETURN_BOOL(false);
+
+ val = DatumGetFloat8(heap_getattr(tuple, sampler->attnum, tsdesc->tupDesc, &isnull));
+ rand = sampler_random_fract(sampler->randstate);
+ if (isnull || val < rand)
+ PG_RETURN_BOOL(false);
+ else
+ PG_RETURN_BOOL(true);
+}
+
+/*
+ * Cleanup method.
+ */
+Datum
+tsm_test_end(PG_FUNCTION_ARGS)
+{
+ TableSampleDesc *tsdesc = (TableSampleDesc *) PG_GETARG_POINTER(0);
+
+ pfree(tsdesc->tsmdata);
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Reset state (called by ReScan).
+ */
+Datum
+tsm_test_reset(PG_FUNCTION_ARGS)
+{
+ TableSampleDesc *tsdesc = (TableSampleDesc *) PG_GETARG_POINTER(0);
+ TestSamplerState *sampler = (TestSamplerState *) tsdesc->tsmdata;
+
+ sampler->blockno = InvalidBlockNumber;
+ sampler->lt = InvalidOffsetNumber;
+
+ sampler_random_init_state(sampler->seed, sampler->randstate);
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Costing function.
+ */
+Datum
+tsm_test_cost(PG_FUNCTION_ARGS)
+{
+ Path *path = (Path *) PG_GETARG_POINTER(1);
+ RelOptInfo *baserel = (RelOptInfo *) PG_GETARG_POINTER(2);
+ BlockNumber *pages = (BlockNumber *) PG_GETARG_POINTER(4);
+ double *tuples = (double *) PG_GETARG_POINTER(5);
+
+ *pages = baserel->pages;
+
+ /* This is very bad estimation */
+ *tuples = path->rows = path->rows/2;
+
+ PG_RETURN_VOID();
+}
+
diff --git a/src/test/modules/tablesample/tsm_test.control b/src/test/modules/tablesample/tsm_test.control
new file mode 100644
index 0000000..a7b2741
--- /dev/null
+++ b/src/test/modules/tablesample/tsm_test.control
@@ -0,0 +1,5 @@
+# tsm_test extension
+comment = 'test module for custom tablesample method'
+default_version = '1.0'
+module_pathname = '$libdir/tsm_test'
+relocatable = true
--
1.9.1
0005-tablesample-api-doc-v2.patchbinary/octet-stream; name=0005-tablesample-api-doc-v2.patchDownload
>From c36749fe10ceb98f2cd787ed5ad1659b8497ffc0 Mon Sep 17 00:00:00 2001
From: Petr Jelinek <pjmodos@pjmodos.net>
Date: Thu, 16 Apr 2015 20:35:14 +0200
Subject: [PATCH 5/6] tablesample-api-doc v2
---
doc/src/sgml/filelist.sgml | 1 +
doc/src/sgml/postgres.sgml | 1 +
doc/src/sgml/tablesample-method.sgml | 139 +++++++++++++++++++++++++++++++++++
3 files changed, 141 insertions(+)
create mode 100644 doc/src/sgml/tablesample-method.sgml
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 2d7514c..4f6d868 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -98,6 +98,7 @@
<!ENTITY protocol SYSTEM "protocol.sgml">
<!ENTITY sources SYSTEM "sources.sgml">
<!ENTITY storage SYSTEM "storage.sgml">
+<!ENTITY tablesample-method SYSTEM "tablesample-method">
<!-- contrib information -->
<!ENTITY contrib SYSTEM "contrib.sgml">
diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index e378d69..46c2f93 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -249,6 +249,7 @@
&spgist;
&gin;
&brin;
+ &tablesample-method;
&storage;
&bki;
&planstats;
diff --git a/doc/src/sgml/tablesample-method.sgml b/doc/src/sgml/tablesample-method.sgml
new file mode 100644
index 0000000..48eb7fe
--- /dev/null
+++ b/doc/src/sgml/tablesample-method.sgml
@@ -0,0 +1,139 @@
+<!-- doc/src/sgml/tablesample-method.sgml -->
+
+<chapter id="tablesample-method">
+ <title>Writing A TABLESAMPLE Sampling Method</title>
+
+ <indexterm zone="tablesample-method">
+ <primary>tablesample method</primary>
+ </indexterm>
+
+ <para>
+ The <command>TABLESAMPLE</command> clause implementation in
+ <productname>PostgreSQL</> supports creating a custom sampling methods.
+ These methods control what sample of the table will be returned when the
+ <command>TABLESAMPLE</command> clause is used.
+ </para>
+
+ <sect1 id="tablesample-method-functions">
+ <title>Tablesample Method Functions</title>
+
+ <para>
+ The tablesample method must provide following set of functions:
+ </para>
+
+ <para>
+<programlisting>
+void
+tsm_init (TableSampleDesc *desc,
+ uint32 seed, ...);
+</programlisting>
+ Initialize the tablesample scan. The function is called at the beginning
+ of each relation scan.
+ </para>
+ <para>
+ Note that the first two parameters are required but you can specify
+ additional parameters which then will be used by the <command>TABLESAMPLE</>
+ clause to determine the required user input in the query itself.
+ This means that if your function will specify additional float4 parameter
+ named percent, the user will have to call the tablesample method with
+ expression which evaluates (or can be coerced) to float4.
+ For example this definition:
+<programlisting>
+tsm_init (TableSampleDesc *desc,
+ uint32 seed, float4 pct);
+</programlisting>
+Will lead to SQL call like this:
+<programlisting>
+... TABLESAMPLE yourmethod(0.5) ...
+</programlisting>
+ </para>
+
+ <para>
+<programlisting>
+BlockNumber
+tsm_nextblock (TableSampleDesc *desc);
+</programlisting>
+ Returns the block number of next page to be scanned. InvalidBlockNumber
+ should be returned if the sampling has reached end of the relation.
+ </para>
+
+ <para>
+<programlisting>
+OffsetNumber
+tsm_nexttuple (TableSampleDesc *desc, BlockNumber blockno,
+ OffsetNumber maxoffset);
+</programlisting>
+ Return next tuple offset for the current page. InvalidOffsetNumber should
+ be returned if the sampling has reached end of the page.
+ </para>
+
+ <para>
+<programlisting>
+void
+tsm_end (TableSampleDesc *desc);
+</programlisting>
+ The scan has finished, cleanup any left over state.
+ </para>
+
+ <para>
+<programlisting>
+void
+tsm_reset (TableSampleDesc *desc);
+</programlisting>
+ The scan needs to rescan the relation again, reset any tablesample method
+ state.
+ </para>
+
+ <para>
+<programlisting>
+void
+tsm_cost (PlannerInfo *root, Path *path, RelOptInfo *baserel,
+ List *args, BlockNumber *pages, double *tuples);
+</programlisting>
+ This function is used by optimizer to decide best plan and is also used
+ for output of <command>EXPLAIN</>.
+ </para>
+
+ <para>
+ There is one more function which tablesampling method can implement in order
+ to gain more fine grained control over sampling. This function is optional:
+ </para>
+
+ <para>
+<programlisting>
+bool
+tsm_examinetuple (TableSampleDesc *desc, BlockNumber blockno,
+ HeapTuple tuple, bool visible);
+</programlisting>
+ Function that enables the sampling method to examine contents of the tuple
+ (for example to collect some internal statistics). The return value of this
+ function is used to determine if the tuple should be returned to client.
+ Note that this function will receive even invisible tuples but it is not
+ allowed to return true for such tuple (if it does,
+ <productname>PostgreSQL</> will raise an error).
+ </para>
+
+ <para>
+ As you can see most of the tablesample method interfaces get the
+ <structname>TableSampleDesc</> as a first parameter. This structure holds
+ state of the current scan and also provides storage for the tablesample
+ method's state. It is defined as following:
+<programlisting>
+typedef struct TableSampleDesc {
+ HeapScanDesc heapScan;
+ TupleDesc tupDesc;
+
+ void *tsmdata;
+} TableSampleDesc;
+</programlisting>
+ Where <structfield>heapScan</> is the descriptor of the physical table scan.
+ It's possible to get table size info from it. The <structfield>tupDesc</>
+ represents the tuple descriptor of the tuples returned by the scan and passed
+ to the <function>tsm_examinetuple()</> interface. The <structfield>tsmdata</>
+ can be used by tablesample method itself to store any state info it might
+ need during the scan. If used by the method, it should be <function>pfree</>d
+ in <function>tsm_end()</> function.
+ </para>
+ </sect1>
+
+</chapter>
--
1.9.1
0001-separate-block-sampling-functions-v2.patchbinary/octet-stream; name=0001-separate-block-sampling-functions-v2.patchDownload
>From 686f1adc77833f1d28f1a6d7dac5f3d65fd3e9e2 Mon Sep 17 00:00:00 2001
From: Petr Jelinek <pjmodos@pjmodos.net>
Date: Wed, 7 Jan 2015 23:36:56 +0100
Subject: [PATCH 1/6] separate block sampling functions v2
---
contrib/file_fdw/file_fdw.c | 9 +-
contrib/postgres_fdw/postgres_fdw.c | 10 +-
src/backend/commands/analyze.c | 225 +----------------------------------
src/backend/utils/misc/Makefile | 2 +-
src/backend/utils/misc/sampling.c | 226 ++++++++++++++++++++++++++++++++++++
src/include/commands/vacuum.h | 3 -
src/include/utils/sampling.h | 44 +++++++
7 files changed, 287 insertions(+), 232 deletions(-)
create mode 100644 src/backend/utils/misc/sampling.c
create mode 100644 src/include/utils/sampling.h
diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index 4368897..249d541 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -34,6 +34,7 @@
#include "optimizer/var.h"
#include "utils/memutils.h"
#include "utils/rel.h"
+#include "utils/sampling.h"
PG_MODULE_MAGIC;
@@ -1005,7 +1006,7 @@ file_acquire_sample_rows(Relation onerel, int elevel,
{
int numrows = 0;
double rowstoskip = -1; /* -1 means not set yet */
- double rstate;
+ ReservoirStateData rstate;
TupleDesc tupDesc;
Datum *values;
bool *nulls;
@@ -1043,7 +1044,7 @@ file_acquire_sample_rows(Relation onerel, int elevel,
ALLOCSET_DEFAULT_MAXSIZE);
/* Prepare for sampling rows */
- rstate = anl_init_selection_state(targrows);
+ reservoir_init_selection_state(&rstate, targrows);
/* Set up callback to identify error line number. */
errcallback.callback = CopyFromErrorCallback;
@@ -1087,7 +1088,7 @@ file_acquire_sample_rows(Relation onerel, int elevel,
* not-yet-incremented value of totalrows as t.
*/
if (rowstoskip < 0)
- rowstoskip = anl_get_next_S(*totalrows, targrows, &rstate);
+ rowstoskip = reservoir_get_next_S(&rstate, *totalrows, targrows);
if (rowstoskip <= 0)
{
@@ -1095,7 +1096,7 @@ file_acquire_sample_rows(Relation onerel, int elevel,
* Found a suitable tuple, so save it, replacing one old tuple
* at random
*/
- int k = (int) (targrows * anl_random_fract());
+ int k = (int) (targrows * sampler_random_fract());
Assert(k >= 0 && k < targrows);
heap_freetuple(rows[k]);
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 478e124..74ef792 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -37,6 +37,7 @@
#include "utils/lsyscache.h"
#include "utils/memutils.h"
#include "utils/rel.h"
+#include "utils/sampling.h"
PG_MODULE_MAGIC;
@@ -202,7 +203,7 @@ typedef struct PgFdwAnalyzeState
/* for random sampling */
double samplerows; /* # of rows fetched */
double rowstoskip; /* # of rows to skip before next sample */
- double rstate; /* random state */
+ ReservoirStateData rstate; /* state for reservoir sampling*/
/* working memory contexts */
MemoryContext anl_cxt; /* context for per-analyze lifespan data */
@@ -2397,7 +2398,7 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,
astate.numrows = 0;
astate.samplerows = 0;
astate.rowstoskip = -1; /* -1 means not set yet */
- astate.rstate = anl_init_selection_state(targrows);
+ reservoir_init_selection_state(&astate.rstate, targrows);
/* Remember ANALYZE context, and create a per-tuple temp context */
astate.anl_cxt = CurrentMemoryContext;
@@ -2537,13 +2538,12 @@ analyze_row_processor(PGresult *res, int row, PgFdwAnalyzeState *astate)
* analyze.c; see Jeff Vitter's paper.
*/
if (astate->rowstoskip < 0)
- astate->rowstoskip = anl_get_next_S(astate->samplerows, targrows,
- &astate->rstate);
+ astate->rowstoskip = reservoir_get_next_S(&astate->rstate, astate->samplerows, targrows);
if (astate->rowstoskip <= 0)
{
/* Choose a random reservoir element to replace. */
- pos = (int) (targrows * anl_random_fract());
+ pos = (int) (targrows * sampler_random_fract());
Assert(pos >= 0 && pos < targrows);
heap_freetuple(astate->rows[pos]);
}
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 15ec0ad..952cf20 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -50,23 +50,13 @@
#include "utils/lsyscache.h"
#include "utils/memutils.h"
#include "utils/pg_rusage.h"
+#include "utils/sampling.h"
#include "utils/sortsupport.h"
#include "utils/syscache.h"
#include "utils/timestamp.h"
#include "utils/tqual.h"
-/* Data structure for Algorithm S from Knuth 3.4.2 */
-typedef struct
-{
- BlockNumber N; /* number of blocks, known in advance */
- int n; /* desired sample size */
- BlockNumber t; /* current block number */
- int m; /* blocks selected so far */
-} BlockSamplerData;
-
-typedef BlockSamplerData *BlockSampler;
-
/* Per-index data for ANALYZE */
typedef struct AnlIndexData
{
@@ -89,10 +79,6 @@ static void do_analyze_rel(Relation onerel, int options,
VacuumParams *params, List *va_cols,
AcquireSampleRowsFunc acquirefunc, BlockNumber relpages,
bool inh, bool in_outer_xact, int elevel);
-static void BlockSampler_Init(BlockSampler bs, BlockNumber nblocks,
- int samplesize);
-static bool BlockSampler_HasMore(BlockSampler bs);
-static BlockNumber BlockSampler_Next(BlockSampler bs);
static void compute_index_stats(Relation onerel, double totalrows,
AnlIndexData *indexdata, int nindexes,
HeapTuple *rows, int numrows,
@@ -951,94 +937,6 @@ examine_attribute(Relation onerel, int attnum, Node *index_expr)
}
/*
- * BlockSampler_Init -- prepare for random sampling of blocknumbers
- *
- * BlockSampler is used for stage one of our new two-stage tuple
- * sampling mechanism as discussed on pgsql-hackers 2004-04-02 (subject
- * "Large DB"). It selects a random sample of samplesize blocks out of
- * the nblocks blocks in the table. If the table has less than
- * samplesize blocks, all blocks are selected.
- *
- * Since we know the total number of blocks in advance, we can use the
- * straightforward Algorithm S from Knuth 3.4.2, rather than Vitter's
- * algorithm.
- */
-static void
-BlockSampler_Init(BlockSampler bs, BlockNumber nblocks, int samplesize)
-{
- bs->N = nblocks; /* measured table size */
-
- /*
- * If we decide to reduce samplesize for tables that have less or not much
- * more than samplesize blocks, here is the place to do it.
- */
- bs->n = samplesize;
- bs->t = 0; /* blocks scanned so far */
- bs->m = 0; /* blocks selected so far */
-}
-
-static bool
-BlockSampler_HasMore(BlockSampler bs)
-{
- return (bs->t < bs->N) && (bs->m < bs->n);
-}
-
-static BlockNumber
-BlockSampler_Next(BlockSampler bs)
-{
- BlockNumber K = bs->N - bs->t; /* remaining blocks */
- int k = bs->n - bs->m; /* blocks still to sample */
- double p; /* probability to skip block */
- double V; /* random */
-
- Assert(BlockSampler_HasMore(bs)); /* hence K > 0 and k > 0 */
-
- if ((BlockNumber) k >= K)
- {
- /* need all the rest */
- bs->m++;
- return bs->t++;
- }
-
- /*----------
- * It is not obvious that this code matches Knuth's Algorithm S.
- * Knuth says to skip the current block with probability 1 - k/K.
- * If we are to skip, we should advance t (hence decrease K), and
- * repeat the same probabilistic test for the next block. The naive
- * implementation thus requires an anl_random_fract() call for each block
- * number. But we can reduce this to one anl_random_fract() call per
- * selected block, by noting that each time the while-test succeeds,
- * we can reinterpret V as a uniform random number in the range 0 to p.
- * Therefore, instead of choosing a new V, we just adjust p to be
- * the appropriate fraction of its former value, and our next loop
- * makes the appropriate probabilistic test.
- *
- * We have initially K > k > 0. If the loop reduces K to equal k,
- * the next while-test must fail since p will become exactly zero
- * (we assume there will not be roundoff error in the division).
- * (Note: Knuth suggests a "<=" loop condition, but we use "<" just
- * to be doubly sure about roundoff error.) Therefore K cannot become
- * less than k, which means that we cannot fail to select enough blocks.
- *----------
- */
- V = anl_random_fract();
- p = 1.0 - (double) k / (double) K;
- while (V < p)
- {
- /* skip */
- bs->t++;
- K--; /* keep K == N - t */
-
- /* adjust p to be new cutoff point in reduced range */
- p *= 1.0 - (double) k / (double) K;
- }
-
- /* select */
- bs->m++;
- return bs->t++;
-}
-
-/*
* acquire_sample_rows -- acquire a random sample of rows from the table
*
* Selected rows are returned in the caller-allocated array rows[], which
@@ -1084,7 +982,7 @@ acquire_sample_rows(Relation onerel, int elevel,
BlockNumber totalblocks;
TransactionId OldestXmin;
BlockSamplerData bs;
- double rstate;
+ ReservoirStateData rstate;
Assert(targrows > 0);
@@ -1094,9 +992,9 @@ acquire_sample_rows(Relation onerel, int elevel,
OldestXmin = GetOldestXmin(onerel, true);
/* Prepare for sampling block numbers */
- BlockSampler_Init(&bs, totalblocks, targrows);
+ BlockSampler_Init(&bs, totalblocks, targrows, random());
/* Prepare for sampling rows */
- rstate = anl_init_selection_state(targrows);
+ reservoir_init_selection_state(&rstate, targrows);
/* Outer loop over blocks to sample */
while (BlockSampler_HasMore(&bs))
@@ -1244,8 +1142,7 @@ acquire_sample_rows(Relation onerel, int elevel,
* t.
*/
if (rowstoskip < 0)
- rowstoskip = anl_get_next_S(samplerows, targrows,
- &rstate);
+ rowstoskip = reservoir_get_next_S(&rstate, samplerows, targrows);
if (rowstoskip <= 0)
{
@@ -1253,7 +1150,7 @@ acquire_sample_rows(Relation onerel, int elevel,
* Found a suitable tuple, so save it, replacing one
* old tuple at random
*/
- int k = (int) (targrows * anl_random_fract());
+ int k = (int) (targrows * sampler_random_fract());
Assert(k >= 0 && k < targrows);
heap_freetuple(rows[k]);
@@ -1312,116 +1209,6 @@ acquire_sample_rows(Relation onerel, int elevel,
return numrows;
}
-/* Select a random value R uniformly distributed in (0 - 1) */
-double
-anl_random_fract(void)
-{
- return ((double) random() + 1) / ((double) MAX_RANDOM_VALUE + 2);
-}
-
-/*
- * These two routines embody Algorithm Z from "Random sampling with a
- * reservoir" by Jeffrey S. Vitter, in ACM Trans. Math. Softw. 11, 1
- * (Mar. 1985), Pages 37-57. Vitter describes his algorithm in terms
- * of the count S of records to skip before processing another record.
- * It is computed primarily based on t, the number of records already read.
- * The only extra state needed between calls is W, a random state variable.
- *
- * anl_init_selection_state computes the initial W value.
- *
- * Given that we've already read t records (t >= n), anl_get_next_S
- * determines the number of records to skip before the next record is
- * processed.
- */
-double
-anl_init_selection_state(int n)
-{
- /* Initial value of W (for use when Algorithm Z is first applied) */
- return exp(-log(anl_random_fract()) / n);
-}
-
-double
-anl_get_next_S(double t, int n, double *stateptr)
-{
- double S;
-
- /* The magic constant here is T from Vitter's paper */
- if (t <= (22.0 * n))
- {
- /* Process records using Algorithm X until t is large enough */
- double V,
- quot;
-
- V = anl_random_fract(); /* Generate V */
- S = 0;
- t += 1;
- /* Note: "num" in Vitter's code is always equal to t - n */
- quot = (t - (double) n) / t;
- /* Find min S satisfying (4.1) */
- while (quot > V)
- {
- S += 1;
- t += 1;
- quot *= (t - (double) n) / t;
- }
- }
- else
- {
- /* Now apply Algorithm Z */
- double W = *stateptr;
- double term = t - (double) n + 1;
-
- for (;;)
- {
- double numer,
- numer_lim,
- denom;
- double U,
- X,
- lhs,
- rhs,
- y,
- tmp;
-
- /* Generate U and X */
- U = anl_random_fract();
- X = t * (W - 1.0);
- S = floor(X); /* S is tentatively set to floor(X) */
- /* Test if U <= h(S)/cg(X) in the manner of (6.3) */
- tmp = (t + 1) / term;
- lhs = exp(log(((U * tmp * tmp) * (term + S)) / (t + X)) / n);
- rhs = (((t + X) / (term + S)) * term) / t;
- if (lhs <= rhs)
- {
- W = rhs / lhs;
- break;
- }
- /* Test if U <= f(S)/cg(X) */
- y = (((U * (t + 1)) / term) * (t + S + 1)) / (t + X);
- if ((double) n < S)
- {
- denom = t;
- numer_lim = term + S;
- }
- else
- {
- denom = t - (double) n + S;
- numer_lim = t + 1;
- }
- for (numer = t + S; numer >= numer_lim; numer -= 1)
- {
- y *= numer / denom;
- denom -= 1;
- }
- W = exp(-log(anl_random_fract()) / n); /* Generate W in advance */
- if (exp(log(y) / n) <= (t + X) / t)
- break;
- }
- *stateptr = W;
- }
- return S;
-}
-
/*
* qsort comparator for sorting rows[] array
*/
diff --git a/src/backend/utils/misc/Makefile b/src/backend/utils/misc/Makefile
index 378b77e..7889101 100644
--- a/src/backend/utils/misc/Makefile
+++ b/src/backend/utils/misc/Makefile
@@ -15,7 +15,7 @@ include $(top_builddir)/src/Makefile.global
override CPPFLAGS := -I. -I$(srcdir) $(CPPFLAGS)
OBJS = guc.o help_config.o pg_rusage.o ps_status.o rls.o \
- superuser.o timeout.o tzparser.o
+ sampling.o superuser.o timeout.o tzparser.o
# This location might depend on the installation directories. Therefore
# we can't subsitute it into pg_config.h.
diff --git a/src/backend/utils/misc/sampling.c b/src/backend/utils/misc/sampling.c
new file mode 100644
index 0000000..1eeabaf
--- /dev/null
+++ b/src/backend/utils/misc/sampling.c
@@ -0,0 +1,226 @@
+/*-------------------------------------------------------------------------
+ *
+ * sampling.c
+ * Relation block sampling routines.
+ *
+ * Portions Copyright (c) 1996-2012, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ * src/backend/utils/misc/sampling.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include <math.h>
+
+#include "utils/sampling.h"
+
+
+/*
+ * BlockSampler_Init -- prepare for random sampling of blocknumbers
+ *
+ * BlockSampler provides algorithm for block level sampling of a relation
+ * as discussed on pgsql-hackers 2004-04-02 (subject "Large DB")
+ * It selects a random sample of samplesize blocks out of
+ * the nblocks blocks in the table. If the table has less than
+ * samplesize blocks, all blocks are selected.
+ *
+ * Since we know the total number of blocks in advance, we can use the
+ * straightforward Algorithm S from Knuth 3.4.2, rather than Vitter's
+ * algorithm.
+ */
+void
+BlockSampler_Init(BlockSampler bs, BlockNumber nblocks, int samplesize,
+ long randseed)
+{
+ bs->N = nblocks; /* measured table size */
+
+ /*
+ * If we decide to reduce samplesize for tables that have less or not much
+ * more than samplesize blocks, here is the place to do it.
+ */
+ bs->n = samplesize;
+ bs->t = 0; /* blocks scanned so far */
+ bs->m = 0; /* blocks selected so far */
+}
+
+bool
+BlockSampler_HasMore(BlockSampler bs)
+{
+ return (bs->t < bs->N) && (bs->m < bs->n);
+}
+
+BlockNumber
+BlockSampler_Next(BlockSampler bs)
+{
+ BlockNumber K = bs->N - bs->t; /* remaining blocks */
+ int k = bs->n - bs->m; /* blocks still to sample */
+ double p; /* probability to skip block */
+ double V; /* random */
+
+ Assert(BlockSampler_HasMore(bs)); /* hence K > 0 and k > 0 */
+
+ if ((BlockNumber) k >= K)
+ {
+ /* need all the rest */
+ bs->m++;
+ return bs->t++;
+ }
+
+ /*----------
+ * It is not obvious that this code matches Knuth's Algorithm S.
+ * Knuth says to skip the current block with probability 1 - k/K.
+ * If we are to skip, we should advance t (hence decrease K), and
+ * repeat the same probabilistic test for the next block. The naive
+ * implementation thus requires an sampler_random_fract() call for each
+ * block number. But we can reduce this to one sampler_random_fract()
+ * call per selected block, by noting that each time the while-test
+ * succeeds, we can reinterpret V as a uniform random number in the range
+ * 0 to p. Therefore, instead of choosing a new V, we just adjust p to be
+ * the appropriate fraction of its former value, and our next loop
+ * makes the appropriate probabilistic test.
+ *
+ * We have initially K > k > 0. If the loop reduces K to equal k,
+ * the next while-test must fail since p will become exactly zero
+ * (we assume there will not be roundoff error in the division).
+ * (Note: Knuth suggests a "<=" loop condition, but we use "<" just
+ * to be doubly sure about roundoff error.) Therefore K cannot become
+ * less than k, which means that we cannot fail to select enough blocks.
+ *----------
+ */
+ V = sampler_random_fract();
+ p = 1.0 - (double) k / (double) K;
+ while (V < p)
+ {
+ /* skip */
+ bs->t++;
+ K--; /* keep K == N - t */
+
+ /* adjust p to be new cutoff point in reduced range */
+ p *= 1.0 - (double) k / (double) K;
+ }
+
+ /* select */
+ bs->m++;
+ return bs->t++;
+}
+
+/*
+ * These two routines embody Algorithm Z from "Random sampling with a
+ * reservoir" by Jeffrey S. Vitter, in ACM Trans. Math. Softw. 11, 1
+ * (Mar. 1985), Pages 37-57. Vitter describes his algorithm in terms
+ * of the count S of records to skip before processing another record.
+ * It is computed primarily based on t, the number of records already read.
+ * The only extra state needed between calls is W, a random state variable.
+ *
+ * reservoir_init_selection_state computes the initial W value.
+ *
+ * Given that we've already read t records (t >= n), reservoir_get_next_S
+ * determines the number of records to skip before the next record is
+ * processed.
+ */
+void
+reservoir_init_selection_state(ReservoirState rs, int n)
+{
+ /* Initial value of W (for use when Algorithm Z is first applied) */
+ *rs = exp(-log(sampler_random_fract()) / n);
+}
+
+double
+reservoir_get_next_S(ReservoirState rs, double t, int n)
+{
+ double S;
+
+ /* The magic constant here is T from Vitter's paper */
+ if (t <= (22.0 * n))
+ {
+ /* Process records using Algorithm X until t is large enough */
+ double V,
+ quot;
+
+ V = sampler_random_fract(); /* Generate V */
+ S = 0;
+ t += 1;
+ /* Note: "num" in Vitter's code is always equal to t - n */
+ quot = (t - (double) n) / t;
+ /* Find min S satisfying (4.1) */
+ while (quot > V)
+ {
+ S += 1;
+ t += 1;
+ quot *= (t - (double) n) / t;
+ }
+ }
+ else
+ {
+ /* Now apply Algorithm Z */
+ double W = *rs;
+ double term = t - (double) n + 1;
+
+ for (;;)
+ {
+ double numer,
+ numer_lim,
+ denom;
+ double U,
+ X,
+ lhs,
+ rhs,
+ y,
+ tmp;
+
+ /* Generate U and X */
+ U = sampler_random_fract();
+ X = t * (W - 1.0);
+ S = floor(X); /* S is tentatively set to floor(X) */
+ /* Test if U <= h(S)/cg(X) in the manner of (6.3) */
+ tmp = (t + 1) / term;
+ lhs = exp(log(((U * tmp * tmp) * (term + S)) / (t + X)) / n);
+ rhs = (((t + X) / (term + S)) * term) / t;
+ if (lhs <= rhs)
+ {
+ W = rhs / lhs;
+ break;
+ }
+ /* Test if U <= f(S)/cg(X) */
+ y = (((U * (t + 1)) / term) * (t + S + 1)) / (t + X);
+ if ((double) n < S)
+ {
+ denom = t;
+ numer_lim = term + S;
+ }
+ else
+ {
+ denom = t - (double) n + S;
+ numer_lim = t + 1;
+ }
+ for (numer = t + S; numer >= numer_lim; numer -= 1)
+ {
+ y *= numer / denom;
+ denom -= 1;
+ }
+ W = exp(-log(sampler_random_fract()) / n); /* Generate W in advance */
+ if (exp(log(y) / n) <= (t + X) / t)
+ break;
+ }
+ *rs = W;
+ }
+ return S;
+}
+
+
+/*----------
+ * Random number generator used by sampling
+ *----------
+ */
+
+/* Select a random value R uniformly distributed in (0 - 1) */
+double
+sampler_random_fract()
+{
+ return ((double) random() + 1) / ((double) MAX_RANDOM_VALUE + 2);
+}
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index 71f0165..ce7b28d 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -197,8 +197,5 @@ extern void analyze_rel(Oid relid, RangeVar *relation, int options,
VacuumParams *params, List *va_cols, bool in_outer_xact,
BufferAccessStrategy bstrategy);
extern bool std_typanalyze(VacAttrStats *stats);
-extern double anl_random_fract(void);
-extern double anl_init_selection_state(int n);
-extern double anl_get_next_S(double t, int n, double *stateptr);
#endif /* VACUUM_H */
diff --git a/src/include/utils/sampling.h b/src/include/utils/sampling.h
new file mode 100644
index 0000000..e3e7f9c
--- /dev/null
+++ b/src/include/utils/sampling.h
@@ -0,0 +1,44 @@
+/*-------------------------------------------------------------------------
+ *
+ * sampling.h
+ * definitions for sampling functions
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/utils/sampling.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef SAMPLING_H
+#define SAMPLING_H
+
+#include "storage/bufmgr.h"
+
+extern double sampler_random_fract(void);
+
+/* Block sampling methods */
+/* Data structure for Algorithm S from Knuth 3.4.2 */
+typedef struct
+{
+ BlockNumber N; /* number of blocks, known in advance */
+ int n; /* desired sample size */
+ BlockNumber t; /* current block number */
+ int m; /* blocks selected so far */
+} BlockSamplerData;
+
+typedef BlockSamplerData *BlockSampler;
+
+extern void BlockSampler_Init(BlockSampler bs, BlockNumber nblocks,
+ int samplesize, long randseed);
+extern bool BlockSampler_HasMore(BlockSampler bs);
+extern BlockNumber BlockSampler_Next(BlockSampler bs);
+
+/* Reservoid sampling methods */
+typedef double ReservoirStateData;
+typedef ReservoirStateData *ReservoirState;
+
+extern void reservoir_init_selection_state(ReservoirState rs, int n);
+extern double reservoir_get_next_S(ReservoirState rs, double t, int n);
+
+#endif /* SAMPLING_H */
--
1.9.1
On 17 April 2015 at 14:54, Petr Jelinek <petr@2ndquadrant.com> wrote:
I agree that DDL patch is not that important to get in (and I made it last
patch in the series now), which does not mean somebody can't write the
extension with new tablesample method.In any case attached another version.
Changes:
- I addressed the comments from Michael- I moved the interface between nodeSampleScan and the actual sampling
method to it's own .c file and added TableSampleDesc struct for it. This
makes the interface cleaner and will make it more straightforward to extend
for subqueries in the future (nothing really changes just some functions
were renamed and moved). Amit suggested this at some point and I thought
it's not needed at that time but with the possible future extension to
subquery support I changed my mind.- renamed heap_beginscan_ss to heap_beginscan_sampling to avoid confusion
with sync scan- reworded some things and more typo fixes
- Added two sample contrib modules demonstrating row limited and time
limited sampling. I am using linear probing for both of those as the
builtin block sampling is not well suited for row limited or time limited
sampling. For row limited I originally thought of using the Vitter's
reservoir sampling but that does not fit well with the executor as it needs
to keep the reservoir of all the output tuples in memory which would have
horrible memory requirements if the limit was high. The linear probing
seems to work quite well for the use case of "give me 500 random rows from
table".
For me, the DDL changes are something we can leave out for now, as a way to
minimize the change surface.
I'm now moving to final review of patches 1-5. Michael requested patch 1 to
be split out. If I commit, I will keep that split, but I am considering all
of this as a single patchset. I've already spent a few days reviewing, so I
don't expect this will take much longer.
--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Apr 17, 2015 at 10:54 PM, Petr Jelinek wrote:
On 10/04/15 06:46, Michael Paquier wrote:
13) Some regression tests with pg_tablesample_method would be welcome.
Not sure what you mean by that.
I meant a sanity check on pg_tablesample_method to be sure that
tsminit, tsmnextblock and tsmnexttuple are always defined as they are
mandatory functions. So the idea is to add a query like and and to be
sure that it returns no rows:
SELECT tsmname FROM pg_tablesample_method WHERE tsminit IS NOT NULL OR
tsmnextblock IS NOT NULL OR tsmnexttuple IS NOT NULL;
- Added two sample contrib modules demonstrating row limited and time
limited sampling. I am using linear probing for both of those as the builtin
block sampling is not well suited for row limited or time limited sampling.
For row limited I originally thought of using the Vitter's reservoir
sampling but that does not fit well with the executor as it needs to keep
the reservoir of all the output tuples in memory which would have horrible
memory requirements if the limit was high. The linear probing seems to work
quite well for the use case of "give me 500 random rows from table".
Patch 4 is interesting, it shows a direct use of examinetuple to
filter the output.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, Apr 18, 2015 at 8:38 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
On Fri, Apr 17, 2015 at 10:54 PM, Petr Jelinek wrote:
On 10/04/15 06:46, Michael Paquier wrote:
13) Some regression tests with pg_tablesample_method would be welcome.
Not sure what you mean by that.
I meant a sanity check on pg_tablesample_method to be sure that
tsminit, tsmnextblock and tsmnexttuple are always defined as they are
mandatory functions. So the idea is to add a query like and and to be
sure that it returns no rows:
SELECT tsmname FROM pg_tablesample_method WHERE tsminit IS NOT NULL OR
tsmnextblock IS NOT NULL OR tsmnexttuple IS NOT NULL;
Yesterday was a long day. I meant IS NULL and not IS NOT NULL, but I
am sure you guessed it that way already..
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 19/04/15 01:24, Michael Paquier wrote:
On Sat, Apr 18, 2015 at 8:38 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:On Fri, Apr 17, 2015 at 10:54 PM, Petr Jelinek wrote:
On 10/04/15 06:46, Michael Paquier wrote:
13) Some regression tests with pg_tablesample_method would be welcome.
Not sure what you mean by that.
I meant a sanity check on pg_tablesample_method to be sure that
tsminit, tsmnextblock and tsmnexttuple are always defined as they are
mandatory functions. So the idea is to add a query like and and to be
sure that it returns no rows:
SELECT tsmname FROM pg_tablesample_method WHERE tsminit IS NOT NULL OR
tsmnextblock IS NOT NULL OR tsmnexttuple IS NOT NULL;Yesterday was a long day. I meant IS NULL and not IS NOT NULL, but I
am sure you guessed it that way already..
Yes I guessed that and it's very reasonable request, I guess it should
look like the attached (I don't want to send new version of everything
just for this).
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
0007-tablesample-add-catalog-regression-test.patchtext/x-diff; name=0007-tablesample-add-catalog-regression-test.patchDownload
>From d059c792a1e864e38f321cea51169ea0b3c5caab Mon Sep 17 00:00:00 2001
From: Petr Jelinek <pjmodos@pjmodos.net>
Date: Wed, 22 Apr 2015 21:28:29 +0200
Subject: [PATCH 7/7] tablesample: add catalog regression test
---
src/test/regress/expected/tablesample.out | 15 +++++++++++++++
src/test/regress/sql/tablesample.sql | 13 +++++++++++++
2 files changed, 28 insertions(+)
diff --git a/src/test/regress/expected/tablesample.out b/src/test/regress/expected/tablesample.out
index 271638d..04e5eb8 100644
--- a/src/test/regress/expected/tablesample.out
+++ b/src/test/regress/expected/tablesample.out
@@ -209,6 +209,21 @@ SELECT q.* FROM (SELECT * FROM test_tablesample) as q TABLESAMPLE BERNOULLI (5);
ERROR: syntax error at or near "TABLESAMPLE"
LINE 1: ...CT q.* FROM (SELECT * FROM test_tablesample) as q TABLESAMPL...
^
+-- catalog sanity
+SELECT *
+FROM pg_tablesample_method
+WHERE tsminit IS NULL
+ OR tsmseqscan IS NULL
+ OR tsmpagemode IS NULL
+ OR tsmnextblock IS NULL
+ OR tsmnexttuple IS NULL
+ OR tsmend IS NULL
+ OR tsmreset IS NULL
+ OR tsmcost IS NULL;
+ tsmname | tsmseqscan | tsmpagemode | tsminit | tsmnextblock | tsmnexttuple | tsmexaminetuple | tsmend | tsmreset | tsmcost
+---------+------------+-------------+---------+--------------+--------------+-----------------+--------+----------+---------
+(0 rows)
+
-- done
DROP TABLE test_tablesample CASCADE;
NOTICE: drop cascades to 2 other objects
diff --git a/src/test/regress/sql/tablesample.sql b/src/test/regress/sql/tablesample.sql
index 2f4b7de..7b3eb9b 100644
--- a/src/test/regress/sql/tablesample.sql
+++ b/src/test/regress/sql/tablesample.sql
@@ -57,5 +57,18 @@ SELECT * FROM query_select TABLESAMPLE BERNOULLI (5.5) REPEATABLE (1);
SELECT q.* FROM (SELECT * FROM test_tablesample) as q TABLESAMPLE BERNOULLI (5);
+-- catalog sanity
+
+SELECT *
+FROM pg_tablesample_method
+WHERE tsminit IS NULL
+ OR tsmseqscan IS NULL
+ OR tsmpagemode IS NULL
+ OR tsmnextblock IS NULL
+ OR tsmnexttuple IS NULL
+ OR tsmend IS NULL
+ OR tsmreset IS NULL
+ OR tsmcost IS NULL;
+
-- done
DROP TABLE test_tablesample CASCADE;
--
1.9.1
On Thu, Apr 23, 2015 at 4:31 AM, Petr Jelinek <petr@2ndquadrant.com> wrote:
On 19/04/15 01:24, Michael Paquier wrote:
On Sat, Apr 18, 2015 at 8:38 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:On Fri, Apr 17, 2015 at 10:54 PM, Petr Jelinek wrote:
On 10/04/15 06:46, Michael Paquier wrote:
13) Some regression tests with pg_tablesample_method would be welcome.
Not sure what you mean by that.
I meant a sanity check on pg_tablesample_method to be sure that
tsminit, tsmnextblock and tsmnexttuple are always defined as they are
mandatory functions. So the idea is to add a query like and and to be
sure that it returns no rows:
SELECT tsmname FROM pg_tablesample_method WHERE tsminit IS NOT NULL OR
tsmnextblock IS NOT NULL OR tsmnexttuple IS NOT NULL;Yesterday was a long day. I meant IS NULL and not IS NOT NULL, but I
am sure you guessed it that way already..Yes I guessed that and it's very reasonable request, I guess it should look
like the attached (I don't want to send new version of everything just for
this).
Thanks. That's exactly the idea.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers