>From 813e641c9ae202b8bfb4be4b978bb2c22a60eea4 Mon Sep 17 00:00:00 2001
From: Petr Jelinek <pjmodos@pjmodos.net>
Date: Sun, 15 Mar 2015 17:39:22 +0100
Subject: [PATCH 4/4] tablesample api doc v1

---
 doc/src/sgml/filelist.sgml           |   1 +
 doc/src/sgml/postgres.sgml           |   1 +
 doc/src/sgml/tablesample-method.sgml | 169 +++++++++++++++++++++++++++++++++++
 3 files changed, 171 insertions(+)
 create mode 100644 doc/src/sgml/tablesample-method.sgml

diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 89fff77..23d932d 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -98,6 +98,7 @@
 <!ENTITY protocol   SYSTEM "protocol.sgml">
 <!ENTITY sources    SYSTEM "sources.sgml">
 <!ENTITY storage    SYSTEM "storage.sgml">
+<!ENTITY tablesample-method SYSTEM "tablesample-method.sgml">
 
 <!-- contrib information -->
 <!ENTITY contrib         SYSTEM "contrib.sgml">
diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index e378d69..dc1f020 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -250,6 +250,7 @@
   &gin;
   &brin;
   &storage;
+  &tablesample-method;
   &bki;
   &planstats;
 
diff --git a/doc/src/sgml/tablesample-method.sgml b/doc/src/sgml/tablesample-method.sgml
new file mode 100644
index 0000000..2d6d323
--- /dev/null
+++ b/doc/src/sgml/tablesample-method.sgml
@@ -0,0 +1,169 @@
+<!-- doc/src/sgml/tablesample-method.sgml -->
+
+<chapter id="tablesample-method">
+ <title>Writing A TABLESAMPLE Sampling Method</title>
+
+ <indexterm zone="tablesample-method">
+  <primary>tablesample method</primary>
+ </indexterm>
+
+ <para>
+  The <command>TABLESAMPLE</command> clause implementation in
+  <productname>PostgreSQL</> supports creating a custom sampling methods.
+  These methods control what sample of the table will be returned when the
+  <command>TABLESAMPLE</command> clause is used.
+ </para>
+
+ <sect1 id="tablesample-method-functions">
+  <title>Tablesample Method Functions</title>
+
+  <para>
+   The tablesample method must provide following set of functions:
+  </para>
+
+  <para>
+<programlisting>
+void
+tsm_init (SampleScanState *scanstate,
+         uint32 seed, ...);
+</programlisting>
+   Initialize the tablesample scan. The function is called at the beginning
+   of each relation scan.
+  </para>
+  <para>
+   Note that the first two parameters are required but you can specify
+   additional parameters which then will be used by the <command>TABLESAMPLE</>
+   clause to determine the required user input in the query itself.
+   This means that if your function will specify additional float4 parameter
+   named percent, the user will have to call the tablesample method with
+   expression which evaluates (or can be coerced) to float4.
+   For example this definition:
+<programlisting>
+tsm_init (SampleScanState *scanstate,
+          uint32 seed, float4 pct);
+</programlisting>
+Will lead to SQL call like this:
+<programlisting>
+... TABLESAMPLE yourmethod(0.5) ...
+</programlisting>
+  </para>
+
+  <para>
+<programlisting>
+BlockNumber
+tsm_nextblock (SampleScanState *scanstate);
+</programlisting>
+   Returns the block number of next page to be scanned. InvalidBlockNumber
+   should be returned if the sampling has reached end of the relation.
+  </para>
+
+  <para>
+<programlisting>
+OffsetNumber
+tsm_nexttuple (SampleScanState *scanstate, BlockNumber blockno,
+               OffsetNumber maxoffset);
+</programlisting>
+   Return next tuple offset for the current page. InvalidOffsetNumber should
+   be returned if the sampling has reached end of the page.
+  </para>
+
+  <para>
+<programlisting>
+void
+tsm_end (SampleScanState *scanstate);
+</programlisting>
+   The scan has finished, cleanup any left over state.
+  </para>
+
+  <para>
+<programlisting>
+void
+tsm_reset (SampleScanState *scanstate);
+</programlisting>
+   The scan needs to rescan the relation again, reset any tablesample method
+   state.
+  </para>
+
+  <para>
+<programlisting>
+void
+tsm_cost (PlannerInfo *root, Path *path, RelOptInfo *baserel,
+          List *args, BlockNumber *pages, double *tuples);
+</programlisting>
+   This function is used by optimizer to decide best plan and is also used
+   for output of <command>EXPLAIN</>.
+  </para>
+
+  <para>
+   There is function that tablesampling method can implement in order to gain
+   more fine grained control over sampling. This function is optional:
+  </para>
+
+  <para>
+<programlisting>
+bool
+tsm_examinetuple (SampleScanState *scanstate, BlockNumber blockno,
+                  HeapTuple tuple, bool visible);
+</programlisting>
+   Function that enables the sampling method to examine contents of the tuple
+   (for example to collect some internal statistics). The return value of this
+   function is used to determine if the tuple should be returned to client.
+   Note that this function will receive even invisible tuples but it is not
+   allowed to return true for such tuple (if it does,
+   <productname>PostgreSQL</> will raise an error).
+  </para>
+
+  <para>
+  As you can see most of the tablesample method interfaces get the
+  <structname>SampleScanState</> as a first parameter. This structure holds
+  state of the current scan and also provides storage for the tablesample
+  method's state. It is defined as following:
+<programlisting>
+typedef struct SampleScanState
+{
+    ScanState   ss;
+    FmgrInfo    tsminit;
+    FmgrInfo    tsmnextblock;
+    FmgrInfo    tsmnexttuple;
+    FmgrInfo    tsmexaminetuple;
+    FmgrInfo    tsmend;
+    FmgrInfo    tsmreset;
+    void       *tsmdata;
+} SampleScanState;
+</programlisting>
+  Where <structfield>ss</> is the <structname>ScanState</> itself. From it, you
+  can get <structfield>ss_currentRelation</> (currently scanned relation) and
+  <structfield>ss_currentScanDesc</> (information about the scan).
+  Those are usually useful for the <function>tsm_init</> function.
+  The <structfield>tsminit</>, <structfield>tsmnextblock</>,
+  <structfield>tsmnexttuple</>, <structfield>tsmend</> and
+  <structfield>tsmreset</> are pointers to the tablesample method functions for
+  use by the sample scan itself and the tablesample method does not need to be
+  concerned about these values. The <structfield>tsmdata</> can be used by
+  tablesample method to store any state info it might need during the scan.
+  </para>
+ </sect1>
+
+ <sect1 id="tablesample-method-sql">
+  <title>Tablesample Method Installation</title>
+
+  <para>
+   Once you have written and built the custom tablesample method, you can
+   install it using the SQL statement
+   <xref linkend="sql-createtablesamplemethod"> and removed again using
+   <xref linkend="sql-droptablesamplemethod">.
+  </para>
+
+ </sect1>
+
+ <sect1 id="tablesample-method-example">
+  <title>Tablesample Method Example</title>
+
+  <para>
+   Example of how to implement custom tablesample method can be found in the
+   <productname>PostgreSQL</>'s sources under
+   <filename>src/test/modules/tablesample</> directory.
+  </para>
+ </sect1>
+
+</chapter>
-- 
1.9.1

