[PATCH] New command to monitor progression of long running queries
Hello,
I've implemented a new command named PROGRESS to monitor progression of
long running SQL queries in a backend process.
Use case
=======
A use case is shown in the below example based on a table named t_10m with
10 millions rows.
The table has been created with :
CREATE TABLE T_10M ( id integer, md5 text);
INSERT INTO T_10M SELECT generate_series(1,10000000) AS id,
md5(random()::text) AS md5;
1/ Start a first psql session to run long SQL queries:
[pgadm@rco ~]$ psql -A -d test
psql (10devel)
Type "help" for help.
test=#
The option -A is used to allow rows to be output without formatting work.
Redirect output to a file in order to let the query run without terminal
interaction:
test=# \o out
Start a long running query:
test=# select * from t_10M order by md5;
2/ In a second psql session, list the backend pid and their SQL query
[pgadm@rco ~]$ psql -d test
psql (10devel)
Type "help" for help.
test=# select pid, query from pg_stat_activity ;
pid | query
-------+-------------------------------------------
19081 |
19084 |
19339 | select pid, query from pg_stat_activity ;
19341 | select * from t_10m order by md5;
19727 | select * from t_10m order by md5;
19726 | select * from t_10m order by md5;
19079 |
19078 |
19080 |
(9 rows)
test=#
Chose the pid of the backend running the long SQL query to be monitored.
Above example is a parallel SQL query. Lowest pid is the main backend of
the query.
test=# PROGRESS 19341;
PLAN PROGRESS
------------------------------------------------------------
-------------------------------
Gather Merge
-> Sort=> dumping tuples to tapes
rows r/w merge 0/0 rows r/w effective 0/2722972 0%
Sort Key: md5
-> Parallel Seq Scan on t_10m => rows 2751606/3954135 69% blks
125938/161222 78%
(5 rows)
test=#
The query of the monitored backend is:
test=# select * from t_10M order by md5;
Because the table has 10 millions of rows, the sort is done on tapes.
Design of the command
=================
The design of the patch/command is:
- the user issue the "PROGRESS pid" command from a psql session. The pid is
the one of the backend which runs the SQL query for which we want to get a
progression report. It can be determined from the view pg_stat_activity.
- the monitoring backend, upon receiving the "PROGRESS pid" command from
psql utility used in step above, sends a signal to the backend whose
process pid is the one provided in the PROGRESS command.
- the monitored backend receives the signal and notes the request as for
any interrupt. Then, it continues its execution of its SQL query until
interrupts can be serviced.
- when the monitored process can service the interrupts, it deals with the
progress request by collecting its execution tree with the execution
progress of each long running node. At this time, the SQL query is no more
running. The progression of each node is calculated during the execution of
the SQL query which is at this moment stopped. The execution tree is dumped
in shared memory pages allocated at the start of the server. Then, the
monitored backend set a latch on which the monitoring process is waiting
for. It then continues executing its SQL query.
- the monitoring backend collects the share memory data dumped by the
monitored backed, and sends it to its psql session, as a list of rows.
The command PROGRESS does not incur any slowness on the running query
because the execution progress is only computed upon receiving the progress
request which is supposed to be seldom used.
The code heavily reuses the one of the explain command. In order to share
as much code as possible with the EXPLAIN command, part of the EXPLAIN code
which deals with reporting quals for instance, has been moved to a new
report.c file in the src/backend/commands folder. This code in report.c is
shared between explain.c source code and PROGRESS command source code which
is in progress.c file.
The progression reported by PROGRESS command is given in terms of rows,
blocks, bytes and percents. The values displayed depend on the node type in
the execution plan.
The current patch implements all the possible nodes which could take a lot
of time:
- Sequential scan nodes with rows and block progress (node type T_SeqScan,
T_SampleScan, T_BitmapHeaepScan, T_SubqueryScan, T_FunctionScan,
T_ValuesScan, T_CteScan, T_WorkTableScan)
- Tuple id scan node with rows and blocks progress (T_TidScan)
- Limit node with rows progress (T_Limit)
- Foreign and custom scan with rows and blocks progress (T_ForeignScan,
T_CustomScan)
- Index scan, index only scan and bitmap index scan with rows and blocks
progress
Patch
====
The diff stat of the patch is:
[root@rco pg]# git diff --stat master..
contrib/auto_explain/auto_explain.c | 5 +-
contrib/postgres_fdw/postgres_fdw.c | 13 +-
src/backend/access/heap/heapam.c | 2 +
src/backend/commands/Makefile | 3 +-
src/backend/commands/explain.c | 2834
++++++++++++++----------------------------------------------
-------------------------------------------------
src/backend/commands/prepare.c | 5 +-
src/backend/commands/progress.c | 1314
+++++++++++++++++++++++++++++++++++++++++++++++++++
src/backend/commands/report.c | 2120
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++++++++++++++++++++
src/backend/executor/execProcnode.c | 31 ++
src/backend/executor/nodeBitmapHeapscan.c | 13 +-
src/backend/executor/nodeIndexonlyscan.c | 13 +-
src/backend/executor/nodeIndexscan.c | 15 +-
src/backend/executor/nodeSamplescan.c | 12 +-
src/backend/executor/nodeSeqscan.c | 16 +-
src/backend/nodes/bitmapset.c | 19 +
src/backend/nodes/outfuncs.c | 245 ++++++++++
src/backend/parser/gram.y | 99 +++-
src/backend/postmaster/postmaster.c | 1 +
src/backend/storage/file/buffile.c | 47 ++
src/backend/storage/ipc/ipci.c | 3 +
src/backend/storage/ipc/procarray.c | 57 +++
src/backend/storage/ipc/procsignal.c | 4 +
src/backend/storage/lmgr/lwlock.c | 7 +-
src/backend/storage/lmgr/lwlocknames.txt | 1 +
src/backend/tcop/postgres.c | 10 +
src/backend/tcop/pquery.c | 25 +
src/backend/tcop/utility.c | 10 +
src/backend/utils/init/globals.c | 12 +
src/backend/utils/sort/tuplesort.c | 142 +++++-
src/backend/utils/sort/tuplestore.c | 73 ++-
src/include/commands/explain.h | 67 +--
src/include/commands/prepare.h | 2 +-
src/include/commands/report.h | 136 ++++++
src/include/executor/execdesc.h | 2 +
src/include/executor/progress.h | 52 ++
src/include/foreign/fdwapi.h | 10 +-
src/include/nodes/bitmapset.h | 1 +
src/include/nodes/execnodes.h | 3 +
src/include/nodes/extensible.h | 6 +-
src/include/nodes/nodes.h | 8 +
src/include/nodes/parsenodes.h | 11 +
src/include/nodes/plannodes.h | 11 +
src/include/parser/kwlist.h | 4 +
src/include/pgstat.h | 3 +-
src/include/storage/buffile.h | 8 +
src/include/storage/procarray.h | 3 +
src/include/storage/procsignal.h | 3 +
src/include/utils/tuplesort.h | 71 ++-
src/include/utils/tuplestore.h | 33 ++
49 files changed, 4979 insertions(+), 2606 deletions(-)
[root@rco pg]#
The progress command can be used with the watch command of psql making it
more handy to monitor a long running query.
The default format of the PROGRESS command is text. It can be easily
expanded to json and xml as for EXPLAIN command.
The patch is based on commit 85a0781334a204c15c9c6ea9d3e6c75334c2beb6
(Date: Fri Apr 14 17:51:25 2017 -0400)
Use cases
========
Some further examples of use are shown below in the test_v1.txt file.
What do you make of this idea/patch?
Does it make sense?
Any suggestion is welcome.
The current patch is still work in progress. It is meanwhile stable. It can
be used with regular queries. Utilities commands are not supported for the
moment.
Documentation is not yet written.
Regards
Remi
Attachments:
progress_v1.patchtext/x-patch; charset=US-ASCII; name=progress_v1.patchDownload
diff --git a/contrib/auto_explain/auto_explain.c b/contrib/auto_explain/auto_explain.c
index 9213ffb..7defe9b 100644
--- a/contrib/auto_explain/auto_explain.c
+++ b/contrib/auto_explain/auto_explain.c
@@ -15,6 +15,7 @@
#include <limits.h>
#include "commands/explain.h"
+#include "commands/report.h"
#include "executor/instrument.h"
#include "utils/guc.h"
@@ -320,7 +321,7 @@ explain_ExecutorEnd(QueryDesc *queryDesc)
msec = queryDesc->totaltime->total * 1000.0;
if (msec >= auto_explain_log_min_duration)
{
- ExplainState *es = NewExplainState();
+ ReportState *es = NewReportState();
es->analyze = (queryDesc->instrument_options && auto_explain_log_analyze);
es->verbose = auto_explain_log_verbose;
@@ -330,7 +331,7 @@ explain_ExecutorEnd(QueryDesc *queryDesc)
es->format = auto_explain_log_format;
ExplainBeginOutput(es);
- ExplainQueryText(es, queryDesc);
+ ReportQueryText(es, queryDesc);
ExplainPrintPlan(es, queryDesc);
if (es->analyze && auto_explain_log_triggers)
ExplainPrintTriggers(es, queryDesc);
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 8d02243..68aef85 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -19,6 +19,7 @@
#include "catalog/pg_class.h"
#include "commands/defrem.h"
#include "commands/explain.h"
+#include "commands/report.h"
#include "commands/vacuum.h"
#include "foreign/fdwapi.h"
#include "funcapi.h"
@@ -323,14 +324,14 @@ static void postgresBeginDirectModify(ForeignScanState *node, int eflags);
static TupleTableSlot *postgresIterateDirectModify(ForeignScanState *node);
static void postgresEndDirectModify(ForeignScanState *node);
static void postgresExplainForeignScan(ForeignScanState *node,
- ExplainState *es);
+ ReportState *es);
static void postgresExplainForeignModify(ModifyTableState *mtstate,
ResultRelInfo *rinfo,
List *fdw_private,
int subplan_index,
- ExplainState *es);
+ ReportState *es);
static void postgresExplainDirectModify(ForeignScanState *node,
- ExplainState *es);
+ ReportState *es);
static bool postgresAnalyzeForeignTable(Relation relation,
AcquireSampleRowsFunc *func,
BlockNumber *totalpages);
@@ -2431,7 +2432,7 @@ postgresEndDirectModify(ForeignScanState *node)
* Produce extra output for EXPLAIN of a ForeignScan on a foreign table
*/
static void
-postgresExplainForeignScan(ForeignScanState *node, ExplainState *es)
+postgresExplainForeignScan(ForeignScanState *node, ReportState *es)
{
List *fdw_private;
char *sql;
@@ -2468,7 +2469,7 @@ postgresExplainForeignModify(ModifyTableState *mtstate,
ResultRelInfo *rinfo,
List *fdw_private,
int subplan_index,
- ExplainState *es)
+ ReportState *es)
{
if (es->verbose)
{
@@ -2485,7 +2486,7 @@ postgresExplainForeignModify(ModifyTableState *mtstate,
* foreign table directly
*/
static void
-postgresExplainDirectModify(ForeignScanState *node, ExplainState *es)
+postgresExplainDirectModify(ForeignScanState *node, ReportState *es)
{
List *fdw_private;
char *sql;
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 0c3e2b0..23aa929 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -232,6 +232,8 @@ initscan(HeapScanDesc scan, ScanKey key, bool keep_startblock)
else
scan->rs_nblocks = RelationGetNumberOfBlocks(scan->rs_rd);
+ //elog(LOG, "rs_nblocks (%p)=%u", scan, scan->rs_nblocks);
+
/*
* If the table is large relative to NBuffers, use a bulk-read access
* strategy and enable synchronized scanning (see syncscan.c). Although
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index 4a6c99e..7198661 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -17,7 +17,8 @@ OBJS = amcmds.o aggregatecmds.o alter.o analyze.o async.o cluster.o comment.o \
dbcommands.o define.o discard.o dropcmds.o \
event_trigger.o explain.o extension.o foreigncmds.o functioncmds.o \
indexcmds.o lockcmds.o matview.o operatorcmds.o opclasscmds.o \
- policy.o portalcmds.o prepare.o proclang.o publicationcmds.o \
+ policy.o portalcmds.o prepare.o progress.o proclang.o publicationcmds.o \
+ report.o \
schemacmds.o seclabel.o sequence.o statscmds.o subscriptioncmds.o \
tablecmds.o tablespace.o trigger.o tsearchcmds.o typecmds.o user.o \
vacuum.o vacuumlazy.o variable.o view.o
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 9359d0a..54bc455 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -19,10 +19,12 @@
#include "commands/createas.h"
#include "commands/defrem.h"
#include "commands/prepare.h"
+#include "commands/report.h"
#include "executor/hashjoin.h"
#include "foreign/fdwapi.h"
#include "nodes/extensible.h"
#include "nodes/nodeFuncs.h"
+#include "nodes/plannodes.h"
#include "optimizer/clauses.h"
#include "optimizer/planmain.h"
#include "parser/parsetree.h"
@@ -46,94 +48,16 @@ ExplainOneQuery_hook_type ExplainOneQuery_hook = NULL;
/* Hook for plugins to get control in explain_get_index_name() */
explain_get_index_name_hook_type explain_get_index_name_hook = NULL;
+static void ExplainOneQuery(Query *query, int cursorOptions, IntoClause *into,
+ ReportState *es, const char *queryString, ParamListInfo params, QueryEnvironment *queryEnv);
+static void report_triggers(ResultRelInfo *rInfo, bool show_relname, ReportState *es);
-/* OR-able flags for ExplainXMLTag() */
-#define X_OPENING 0
-#define X_CLOSING 1
-#define X_CLOSE_IMMEDIATE 2
-#define X_NOWHITESPACE 4
-
-static void ExplainOneQuery(Query *query, int cursorOptions,
- IntoClause *into, ExplainState *es,
- const char *queryString, ParamListInfo params,
- QueryEnvironment *queryEnv);
-static void report_triggers(ResultRelInfo *rInfo, bool show_relname,
- ExplainState *es);
static double elapsed_time(instr_time *starttime);
-static bool ExplainPreScanNode(PlanState *planstate, Bitmapset **rels_used);
-static void ExplainNode(PlanState *planstate, List *ancestors,
- const char *relationship, const char *plan_name,
- ExplainState *es);
-static void show_plan_tlist(PlanState *planstate, List *ancestors,
- ExplainState *es);
-static void show_expression(Node *node, const char *qlabel,
- PlanState *planstate, List *ancestors,
- bool useprefix, ExplainState *es);
-static void show_qual(List *qual, const char *qlabel,
- PlanState *planstate, List *ancestors,
- bool useprefix, ExplainState *es);
-static void show_scan_qual(List *qual, const char *qlabel,
- PlanState *planstate, List *ancestors,
- ExplainState *es);
-static void show_upper_qual(List *qual, const char *qlabel,
- PlanState *planstate, List *ancestors,
- ExplainState *es);
-static void show_sort_keys(SortState *sortstate, List *ancestors,
- ExplainState *es);
-static void show_merge_append_keys(MergeAppendState *mstate, List *ancestors,
- ExplainState *es);
-static void show_agg_keys(AggState *astate, List *ancestors,
- ExplainState *es);
-static void show_grouping_sets(PlanState *planstate, Agg *agg,
- List *ancestors, ExplainState *es);
-static void show_grouping_set_keys(PlanState *planstate,
- Agg *aggnode, Sort *sortnode,
- List *context, bool useprefix,
- List *ancestors, ExplainState *es);
-static void show_group_keys(GroupState *gstate, List *ancestors,
- ExplainState *es);
-static void show_sort_group_keys(PlanState *planstate, const char *qlabel,
- int nkeys, AttrNumber *keycols,
- Oid *sortOperators, Oid *collations, bool *nullsFirst,
- List *ancestors, ExplainState *es);
-static void show_sortorder_options(StringInfo buf, Node *sortexpr,
- Oid sortOperator, Oid collation, bool nullsFirst);
-static void show_tablesample(TableSampleClause *tsc, PlanState *planstate,
- List *ancestors, ExplainState *es);
-static void show_sort_info(SortState *sortstate, ExplainState *es);
-static void show_hash_info(HashState *hashstate, ExplainState *es);
-static void show_tidbitmap_info(BitmapHeapScanState *planstate,
- ExplainState *es);
-static void show_instrumentation_count(const char *qlabel, int which,
- PlanState *planstate, ExplainState *es);
-static void show_foreignscan_info(ForeignScanState *fsstate, ExplainState *es);
-static const char *explain_get_index_name(Oid indexId);
-static void show_buffer_usage(ExplainState *es, const BufferUsage *usage);
-static void ExplainIndexScanDetails(Oid indexid, ScanDirection indexorderdir,
- ExplainState *es);
-static void ExplainScanTarget(Scan *plan, ExplainState *es);
-static void ExplainModifyTarget(ModifyTable *plan, ExplainState *es);
-static void ExplainTargetRel(Plan *plan, Index rti, ExplainState *es);
-static void show_modifytable_info(ModifyTableState *mtstate, List *ancestors,
- ExplainState *es);
-static void ExplainMemberNodes(List *plans, PlanState **planstates,
- List *ancestors, ExplainState *es);
-static void ExplainSubPlans(List *plans, List *ancestors,
- const char *relationship, ExplainState *es);
-static void ExplainCustomChildren(CustomScanState *css,
- List *ancestors, ExplainState *es);
-static void ExplainProperty(const char *qlabel, const char *value,
- bool numeric, ExplainState *es);
-static void ExplainOpenGroup(const char *objtype, const char *labelname,
- bool labeled, ExplainState *es);
-static void ExplainCloseGroup(const char *objtype, const char *labelname,
- bool labeled, ExplainState *es);
-static void ExplainDummyGroup(const char *objtype, const char *labelname,
- ExplainState *es);
-static void ExplainXMLTag(const char *tagname, int flags, ExplainState *es);
-static void ExplainJSONLineEnding(ExplainState *es);
-static void ExplainYAMLLineStarting(ExplainState *es);
-static void escape_yaml(StringInfo buf, const char *str);
+static void ExplainNode(PlanState *planstate, List *ancestors, const char *relationship, const char *plan_name, ReportState *es);
+static void ExplainIndexScanDetails(Oid indexid, ScanDirection indexorderdir, ReportState *es);
+static void ExplainScanTarget(Scan *plan, ReportState *es);
+static void ExplainModifyTarget(ModifyTable *plan, ReportState *es);
+static void ExplainTargetRel(Plan *plan, Index rti, ReportState *es);
@@ -146,13 +70,15 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
DestReceiver *dest)
{
- ExplainState *es = NewExplainState();
+ ReportState *es = CreateReportState(0);
TupOutputState *tstate;
List *rewritten;
ListCell *lc;
bool timing_set = false;
bool summary_set = false;
+ SetReportStateCosts(es, true);
+
/* Parse options list. */
foreach(lc, stmt->options)
{
@@ -181,13 +107,13 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt, const char *queryString,
char *p = defGetString(opt);
if (strcmp(p, "text") == 0)
- es->format = EXPLAIN_FORMAT_TEXT;
+ es->format = REPORT_FORMAT_TEXT;
else if (strcmp(p, "xml") == 0)
- es->format = EXPLAIN_FORMAT_XML;
+ es->format = REPORT_FORMAT_XML;
else if (strcmp(p, "json") == 0)
- es->format = EXPLAIN_FORMAT_JSON;
+ es->format = REPORT_FORMAT_JSON;
else if (strcmp(p, "yaml") == 0)
- es->format = EXPLAIN_FORMAT_YAML;
+ es->format = REPORT_FORMAT_YAML;
else
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -235,7 +161,7 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt, const char *queryString,
rewritten = QueryRewrite(castNode(Query, copyObject(stmt->query)));
/* emit opening boilerplate */
- ExplainBeginOutput(es);
+ ReportBeginOutput(es);
if (rewritten == NIL)
{
@@ -243,7 +169,7 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt, const char *queryString,
* In the case of an INSTEAD NOTHING, tell at least that. But in
* non-text format, the output is delimited, so this isn't necessary.
*/
- if (es->format == EXPLAIN_FORMAT_TEXT)
+ if (es->format == REPORT_FORMAT_TEXT)
appendStringInfoString(es->str, "Query rewrites to nothing\n");
}
else
@@ -259,17 +185,17 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt, const char *queryString,
/* Separate plans with an appropriate separator */
if (lnext(l) != NULL)
- ExplainSeparatePlans(es);
+ ReportSeparatePlans(es);
}
}
/* emit closing boilerplate */
- ExplainEndOutput(es);
+ ReportEndOutput(es);
Assert(es->indent == 0);
/* output tuples */
tstate = begin_tup_output_tupdesc(dest, ExplainResultDesc(stmt));
- if (es->format == EXPLAIN_FORMAT_TEXT)
+ if (es->format == REPORT_FORMAT_TEXT)
do_text_output_multiline(tstate, es->str->data);
else
do_text_output_oneline(tstate, es->str->data);
@@ -279,22 +205,6 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt, const char *queryString,
}
/*
- * Create a new ExplainState struct initialized with default options.
- */
-ExplainState *
-NewExplainState(void)
-{
- ExplainState *es = (ExplainState *) palloc0(sizeof(ExplainState));
-
- /* Set default options (most fields can be left as zeroes). */
- es->costs = true;
- /* Prepare output buffer. */
- es->str = makeStringInfo();
-
- return es;
-}
-
-/*
* ExplainResultDesc -
* construct the result tupledesc for an EXPLAIN
*/
@@ -303,16 +213,15 @@ ExplainResultDesc(ExplainStmt *stmt)
{
TupleDesc tupdesc;
ListCell *lc;
- Oid result_type = TEXTOID;
+ Oid result_type = TEXTOID;
/* Check for XML format option */
foreach(lc, stmt->options)
{
DefElem *opt = (DefElem *) lfirst(lc);
- if (strcmp(opt->defname, "format") == 0)
- {
- char *p = defGetString(opt);
+ if (strcmp(opt->defname, "format") == 0) {
+ char* p = defGetString(opt);
if (strcmp(p, "xml") == 0)
result_type = XMLOID;
@@ -326,8 +235,8 @@ ExplainResultDesc(ExplainStmt *stmt)
/* Need a tuple descriptor representing a single TEXT or XML column */
tupdesc = CreateTemplateTupleDesc(1, false);
- TupleDescInitEntry(tupdesc, (AttrNumber) 1, "QUERY PLAN",
- result_type, -1, 0);
+ TupleDescInitEntry(tupdesc, (AttrNumber) 1, "QUERY PLAN", result_type, -1, 0);
+
return tupdesc;
}
@@ -339,7 +248,7 @@ ExplainResultDesc(ExplainStmt *stmt)
*/
static void
ExplainOneQuery(Query *query, int cursorOptions,
- IntoClause *into, ExplainState *es,
+ IntoClause *into, ReportState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv)
{
@@ -387,7 +296,7 @@ ExplainOneQuery(Query *query, int cursorOptions,
* EXPLAIN EXECUTE case.
*/
void
-ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
+ExplainOneUtility(Node *utilityStmt, IntoClause *into, ReportState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv)
{
@@ -436,18 +345,18 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
queryString, params, queryEnv);
else if (IsA(utilityStmt, NotifyStmt))
{
- if (es->format == EXPLAIN_FORMAT_TEXT)
+ if (es->format == REPORT_FORMAT_TEXT)
appendStringInfoString(es->str, "NOTIFY\n");
else
- ExplainDummyGroup("Notify", NULL, es);
+ ReportDummyGroup("Notify", NULL, es);
}
else
{
- if (es->format == EXPLAIN_FORMAT_TEXT)
+ if (es->format == REPORT_FORMAT_TEXT)
appendStringInfoString(es->str,
"Utility statements have no plan structure\n");
else
- ExplainDummyGroup("Utility Statement", NULL, es);
+ ReportDummyGroup("Utility Statement", NULL, es);
}
}
@@ -464,7 +373,7 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ReportState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration)
{
@@ -545,7 +454,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
totaltime += elapsed_time(&starttime);
}
- ExplainOpenGroup("Query", NULL, true, es);
+ ReportOpenGroup("Query", NULL, true, es);
/* Create textual dump of plan tree */
ExplainPrintPlan(es, queryDesc);
@@ -554,11 +463,11 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
{
double plantime = INSTR_TIME_GET_DOUBLE(*planduration);
- if (es->format == EXPLAIN_FORMAT_TEXT)
+ if (es->format == REPORT_FORMAT_TEXT)
appendStringInfo(es->str, "Planning time: %.3f ms\n",
1000.0 * plantime);
else
- ExplainPropertyFloat("Planning Time", 1000.0 * plantime, 3, es);
+ ReportPropertyFloat("Planning Time", 1000.0 * plantime, 3, es);
}
/* Print info about runtime of triggers */
@@ -591,15 +500,15 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
*/
if (es->summary && es->analyze)
{
- if (es->format == EXPLAIN_FORMAT_TEXT)
+ if (es->format == REPORT_FORMAT_TEXT)
appendStringInfo(es->str, "Execution time: %.3f ms\n",
1000.0 * totaltime);
else
- ExplainPropertyFloat("Execution Time", 1000.0 * totaltime,
+ ReportPropertyFloat("Execution Time", 1000.0 * totaltime,
3, es);
}
- ExplainCloseGroup("Query", NULL, true, es);
+ ReportCloseGroup("Query", NULL, true, es);
}
/*
@@ -614,19 +523,20 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
* NB: will not work on utility statements
*/
void
-ExplainPrintPlan(ExplainState *es, QueryDesc *queryDesc)
+ExplainPrintPlan(ReportState *es, QueryDesc *queryDesc)
{
Bitmapset *rels_used = NULL;
PlanState *ps;
- /* Set up ExplainState fields associated with this plan tree */
+ /* Set up ReportState fields associated with this plan tree */
Assert(queryDesc->plannedstmt != NULL);
es->pstmt = queryDesc->plannedstmt;
es->rtable = queryDesc->plannedstmt->rtable;
- ExplainPreScanNode(queryDesc->planstate, &rels_used);
+
+ ReportPreScanNode(queryDesc->planstate, &rels_used);
+
es->rtable_names = select_rtable_names_for_explain(es->rtable, rels_used);
- es->deparse_cxt = deparse_context_for_plan_rtable(es->rtable,
- es->rtable_names);
+ es->deparse_cxt = deparse_context_for_plan_rtable(es->rtable, es->rtable_names);
es->printed_subplans = NULL;
/*
@@ -651,7 +561,7 @@ ExplainPrintPlan(ExplainState *es, QueryDesc *queryDesc)
* initialized here.
*/
void
-ExplainPrintTriggers(ExplainState *es, QueryDesc *queryDesc)
+ExplainPrintTriggers(ReportState *es, QueryDesc *queryDesc)
{
ResultRelInfo *rInfo;
bool show_relname;
@@ -660,7 +570,7 @@ ExplainPrintTriggers(ExplainState *es, QueryDesc *queryDesc)
int nr;
ListCell *l;
- ExplainOpenGroup("Triggers", "Triggers", false, es);
+ ReportOpenGroup("Triggers", "Triggers", false, es);
show_relname = (numrels > 1 || targrels != NIL);
rInfo = queryDesc->estate->es_result_relations;
@@ -673,22 +583,7 @@ ExplainPrintTriggers(ExplainState *es, QueryDesc *queryDesc)
report_triggers(rInfo, show_relname, es);
}
- ExplainCloseGroup("Triggers", "Triggers", false, es);
-}
-
-/*
- * ExplainQueryText -
- * add a "Query Text" node that contains the actual text of the query
- *
- * The caller should have set up the options fields of *es, as well as
- * initializing the output buffer es->str.
- *
- */
-void
-ExplainQueryText(ExplainState *es, QueryDesc *queryDesc)
-{
- if (queryDesc->sourceText)
- ExplainPropertyText("Query Text", queryDesc->sourceText, es);
+ ReportCloseGroup("Triggers", "Triggers", false, es);
}
/*
@@ -696,7 +591,7 @@ ExplainQueryText(ExplainState *es, QueryDesc *queryDesc)
* report execution stats for a single relation's triggers
*/
static void
-report_triggers(ResultRelInfo *rInfo, bool show_relname, ExplainState *es)
+report_triggers(ResultRelInfo *rInfo, bool show_relname, ReportState *es)
{
int nt;
@@ -719,7 +614,7 @@ report_triggers(ResultRelInfo *rInfo, bool show_relname, ExplainState *es)
if (instr->ntuples == 0)
continue;
- ExplainOpenGroup("Trigger", NULL, true, es);
+ ReportOpenGroup("Trigger", NULL, true, es);
relname = RelationGetRelationName(rInfo->ri_RelationDesc);
if (OidIsValid(trig->tgconstraint))
@@ -730,7 +625,7 @@ report_triggers(ResultRelInfo *rInfo, bool show_relname, ExplainState *es)
* constraint name unless VERBOSE is specified. In non-text formats
* we just print everything.
*/
- if (es->format == EXPLAIN_FORMAT_TEXT)
+ if (es->format == REPORT_FORMAT_TEXT)
{
if (es->verbose || conname == NULL)
appendStringInfo(es->str, "Trigger %s", trig->tgname);
@@ -748,19 +643,19 @@ report_triggers(ResultRelInfo *rInfo, bool show_relname, ExplainState *es)
}
else
{
- ExplainPropertyText("Trigger Name", trig->tgname, es);
+ ReportPropertyText("Trigger Name", trig->tgname, es);
if (conname)
- ExplainPropertyText("Constraint Name", conname, es);
- ExplainPropertyText("Relation", relname, es);
+ ReportPropertyText("Constraint Name", conname, es);
+ ReportPropertyText("Relation", relname, es);
if (es->timing)
- ExplainPropertyFloat("Time", 1000.0 * instr->total, 3, es);
- ExplainPropertyFloat("Calls", instr->ntuples, 0, es);
+ ReportPropertyFloat("Time", 1000.0 * instr->total, 3, es);
+ ReportPropertyFloat("Calls", instr->ntuples, 0, es);
}
if (conname)
pfree(conname);
- ExplainCloseGroup("Trigger", NULL, true, es);
+ ReportCloseGroup("Trigger", NULL, true, es);
}
}
@@ -776,60 +671,6 @@ elapsed_time(instr_time *starttime)
}
/*
- * ExplainPreScanNode -
- * Prescan the planstate tree to identify which RTEs are referenced
- *
- * Adds the relid of each referenced RTE to *rels_used. The result controls
- * which RTEs are assigned aliases by select_rtable_names_for_explain.
- * This ensures that we don't confusingly assign un-suffixed aliases to RTEs
- * that never appear in the EXPLAIN output (such as inheritance parents).
- */
-static bool
-ExplainPreScanNode(PlanState *planstate, Bitmapset **rels_used)
-{
- Plan *plan = planstate->plan;
-
- switch (nodeTag(plan))
- {
- case T_SeqScan:
- case T_SampleScan:
- case T_IndexScan:
- case T_IndexOnlyScan:
- case T_BitmapHeapScan:
- case T_TidScan:
- case T_SubqueryScan:
- case T_FunctionScan:
- case T_TableFuncScan:
- case T_ValuesScan:
- case T_CteScan:
- case T_NamedTuplestoreScan:
- case T_WorkTableScan:
- *rels_used = bms_add_member(*rels_used,
- ((Scan *) plan)->scanrelid);
- break;
- case T_ForeignScan:
- *rels_used = bms_add_members(*rels_used,
- ((ForeignScan *) plan)->fs_relids);
- break;
- case T_CustomScan:
- *rels_used = bms_add_members(*rels_used,
- ((CustomScan *) plan)->custom_relids);
- break;
- case T_ModifyTable:
- *rels_used = bms_add_member(*rels_used,
- ((ModifyTable *) plan)->nominalRelation);
- if (((ModifyTable *) plan)->exclRelRTI)
- *rels_used = bms_add_member(*rels_used,
- ((ModifyTable *) plan)->exclRelRTI);
- break;
- default:
- break;
- }
-
- return planstate_tree_walker(planstate, ExplainPreScanNode, rels_used);
-}
-
-/*
* ExplainNode -
* Appends a description of a plan tree to es->str
*
@@ -847,286 +688,27 @@ ExplainPreScanNode(PlanState *planstate, Bitmapset **rels_used)
* In text format, es->indent is controlled in this function since we only
* want it to change at plan-node boundaries. In non-text formats, es->indent
* corresponds to the nesting depth of logical output groups, and therefore
- * is controlled by ExplainOpenGroup/ExplainCloseGroup.
+ * is controlled by ReportOpenGroup/ReportCloseGroup.
*/
static void
ExplainNode(PlanState *planstate, List *ancestors,
const char *relationship, const char *plan_name,
- ExplainState *es)
+ ReportState *es)
{
Plan *plan = planstate->plan;
- const char *pname; /* node type name for text output */
- const char *sname; /* node type name for non-text output */
- const char *strategy = NULL;
- const char *partialmode = NULL;
- const char *operation = NULL;
- const char *custom_name = NULL;
- int save_indent = es->indent;
- bool haschildren;
-
- switch (nodeTag(plan))
- {
- case T_Result:
- pname = sname = "Result";
- break;
- case T_ProjectSet:
- pname = sname = "ProjectSet";
- break;
- case T_ModifyTable:
- sname = "ModifyTable";
- switch (((ModifyTable *) plan)->operation)
- {
- case CMD_INSERT:
- pname = operation = "Insert";
- break;
- case CMD_UPDATE:
- pname = operation = "Update";
- break;
- case CMD_DELETE:
- pname = operation = "Delete";
- break;
- default:
- pname = "???";
- break;
- }
- break;
- case T_Append:
- pname = sname = "Append";
- break;
- case T_MergeAppend:
- pname = sname = "Merge Append";
- break;
- case T_RecursiveUnion:
- pname = sname = "Recursive Union";
- break;
- case T_BitmapAnd:
- pname = sname = "BitmapAnd";
- break;
- case T_BitmapOr:
- pname = sname = "BitmapOr";
- break;
- case T_NestLoop:
- pname = sname = "Nested Loop";
- break;
- case T_MergeJoin:
- pname = "Merge"; /* "Join" gets added by jointype switch */
- sname = "Merge Join";
- break;
- case T_HashJoin:
- pname = "Hash"; /* "Join" gets added by jointype switch */
- sname = "Hash Join";
- break;
- case T_SeqScan:
- pname = sname = "Seq Scan";
- break;
- case T_SampleScan:
- pname = sname = "Sample Scan";
- break;
- case T_Gather:
- pname = sname = "Gather";
- break;
- case T_GatherMerge:
- pname = sname = "Gather Merge";
- break;
- case T_IndexScan:
- pname = sname = "Index Scan";
- break;
- case T_IndexOnlyScan:
- pname = sname = "Index Only Scan";
- break;
- case T_BitmapIndexScan:
- pname = sname = "Bitmap Index Scan";
- break;
- case T_BitmapHeapScan:
- pname = sname = "Bitmap Heap Scan";
- break;
- case T_TidScan:
- pname = sname = "Tid Scan";
- break;
- case T_SubqueryScan:
- pname = sname = "Subquery Scan";
- break;
- case T_FunctionScan:
- pname = sname = "Function Scan";
- break;
- case T_TableFuncScan:
- pname = sname = "Table Function Scan";
- break;
- case T_ValuesScan:
- pname = sname = "Values Scan";
- break;
- case T_CteScan:
- pname = sname = "CTE Scan";
- break;
- case T_NamedTuplestoreScan:
- pname = sname = "Named Tuplestore Scan";
- break;
- case T_WorkTableScan:
- pname = sname = "WorkTable Scan";
- break;
- case T_ForeignScan:
- sname = "Foreign Scan";
- switch (((ForeignScan *) plan)->operation)
- {
- case CMD_SELECT:
- pname = "Foreign Scan";
- operation = "Select";
- break;
- case CMD_INSERT:
- pname = "Foreign Insert";
- operation = "Insert";
- break;
- case CMD_UPDATE:
- pname = "Foreign Update";
- operation = "Update";
- break;
- case CMD_DELETE:
- pname = "Foreign Delete";
- operation = "Delete";
- break;
- default:
- pname = "???";
- break;
- }
- break;
- case T_CustomScan:
- sname = "Custom Scan";
- custom_name = ((CustomScan *) plan)->methods->CustomName;
- if (custom_name)
- pname = psprintf("Custom Scan (%s)", custom_name);
- else
- pname = sname;
- break;
- case T_Material:
- pname = sname = "Materialize";
- break;
- case T_Sort:
- pname = sname = "Sort";
- break;
- case T_Group:
- pname = sname = "Group";
- break;
- case T_Agg:
- {
- Agg *agg = (Agg *) plan;
- sname = "Aggregate";
- switch (agg->aggstrategy)
- {
- case AGG_PLAIN:
- pname = "Aggregate";
- strategy = "Plain";
- break;
- case AGG_SORTED:
- pname = "GroupAggregate";
- strategy = "Sorted";
- break;
- case AGG_HASHED:
- pname = "HashAggregate";
- strategy = "Hashed";
- break;
- case AGG_MIXED:
- pname = "MixedAggregate";
- strategy = "Mixed";
- break;
- default:
- pname = "Aggregate ???";
- strategy = "???";
- break;
- }
+ PlanInfo info;
+ int save_indent = es->indent;
+ bool haschildren;
+ int ret;
- if (DO_AGGSPLIT_SKIPFINAL(agg->aggsplit))
- {
- partialmode = "Partial";
- pname = psprintf("%s %s", partialmode, pname);
- }
- else if (DO_AGGSPLIT_COMBINE(agg->aggsplit))
- {
- partialmode = "Finalize";
- pname = psprintf("%s %s", partialmode, pname);
- }
- else
- partialmode = "Simple";
- }
- break;
- case T_WindowAgg:
- pname = sname = "WindowAgg";
- break;
- case T_Unique:
- pname = sname = "Unique";
- break;
- case T_SetOp:
- sname = "SetOp";
- switch (((SetOp *) plan)->strategy)
- {
- case SETOP_SORTED:
- pname = "SetOp";
- strategy = "Sorted";
- break;
- case SETOP_HASHED:
- pname = "HashSetOp";
- strategy = "Hashed";
- break;
- default:
- pname = "SetOp ???";
- strategy = "???";
- break;
- }
- break;
- case T_LockRows:
- pname = sname = "LockRows";
- break;
- case T_Limit:
- pname = sname = "Limit";
- break;
- case T_Hash:
- pname = sname = "Hash";
- break;
- default:
- pname = sname = "???";
- break;
+ ret = planNodeInfo(plan, &info);
+ if (ret != 0) {
+ elog(LOG, "unknwon node type for plan");
}
- ExplainOpenGroup("Plan",
- relationship ? NULL : "Plan",
- true, es);
-
- if (es->format == EXPLAIN_FORMAT_TEXT)
- {
- if (plan_name)
- {
- appendStringInfoSpaces(es->str, es->indent * 2);
- appendStringInfo(es->str, "%s\n", plan_name);
- es->indent++;
- }
- if (es->indent)
- {
- appendStringInfoSpaces(es->str, es->indent * 2);
- appendStringInfoString(es->str, "-> ");
- es->indent += 2;
- }
- if (plan->parallel_aware)
- appendStringInfoString(es->str, "Parallel ");
- appendStringInfoString(es->str, pname);
- es->indent++;
- }
- else
- {
- ExplainPropertyText("Node Type", sname, es);
- if (strategy)
- ExplainPropertyText("Strategy", strategy, es);
- if (partialmode)
- ExplainPropertyText("Partial Mode", partialmode, es);
- if (operation)
- ExplainPropertyText("Operation", operation, es);
- if (relationship)
- ExplainPropertyText("Parent Relationship", relationship, es);
- if (plan_name)
- ExplainPropertyText("Subplan Name", plan_name, es);
- if (custom_name)
- ExplainPropertyText("Custom Plan Provider", custom_name, es);
- ExplainPropertyBool("Parallel Aware", plan->parallel_aware, es);
- }
+ ReportOpenGroup("Plan", relationship ? NULL : "Plan", true, es);
+ ReportProperties(plan, &info, plan_name, relationship, es);
switch (nodeTag(plan))
{
@@ -1173,10 +755,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
const char *indexname =
explain_get_index_name(bitmapindexscan->indexid);
- if (es->format == EXPLAIN_FORMAT_TEXT)
+ if (es->format == REPORT_FORMAT_TEXT)
appendStringInfo(es->str, " on %s", indexname);
else
- ExplainPropertyText("Index Name", indexname, es);
+ ReportPropertyText("Index Name", indexname, es);
}
break;
case T_ModifyTable:
@@ -1212,7 +794,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
jointype = "???";
break;
}
- if (es->format == EXPLAIN_FORMAT_TEXT)
+ if (es->format == REPORT_FORMAT_TEXT)
{
/*
* For historical reasons, the join type is interpolated
@@ -1224,7 +806,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
appendStringInfoString(es->str, " Join");
}
else
- ExplainPropertyText("Join Type", jointype, es);
+ ReportPropertyText("Join Type", jointype, es);
}
break;
case T_SetOp:
@@ -1249,10 +831,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
setopcmd = "???";
break;
}
- if (es->format == EXPLAIN_FORMAT_TEXT)
+ if (es->format == REPORT_FORMAT_TEXT)
appendStringInfo(es->str, " %s", setopcmd);
else
- ExplainPropertyText("Command", setopcmd, es);
+ ReportPropertyText("Command", setopcmd, es);
}
break;
default:
@@ -1261,7 +843,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
if (es->costs)
{
- if (es->format == EXPLAIN_FORMAT_TEXT)
+ if (es->format == REPORT_FORMAT_TEXT)
{
appendStringInfo(es->str, " (cost=%.2f..%.2f rows=%.0f width=%d)",
plan->startup_cost, plan->total_cost,
@@ -1269,10 +851,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
}
else
{
- ExplainPropertyFloat("Startup Cost", plan->startup_cost, 2, es);
- ExplainPropertyFloat("Total Cost", plan->total_cost, 2, es);
- ExplainPropertyFloat("Plan Rows", plan->plan_rows, 0, es);
- ExplainPropertyInteger("Plan Width", plan->plan_width, es);
+ ReportPropertyFloat("Startup Cost", plan->startup_cost, 2, es);
+ ReportPropertyFloat("Total Cost", plan->total_cost, 2, es);
+ ReportPropertyFloat("Plan Rows", plan->plan_rows, 0, es);
+ ReportPropertyInteger("Plan Width", plan->plan_width, es);
}
}
@@ -1297,7 +879,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
double total_sec = 1000.0 * planstate->instrument->total / nloops;
double rows = planstate->instrument->ntuples / nloops;
- if (es->format == EXPLAIN_FORMAT_TEXT)
+ if (es->format == REPORT_FORMAT_TEXT)
{
if (es->timing)
appendStringInfo(es->str,
@@ -1312,32 +894,31 @@ ExplainNode(PlanState *planstate, List *ancestors,
{
if (es->timing)
{
- ExplainPropertyFloat("Actual Startup Time", startup_sec, 3, es);
- ExplainPropertyFloat("Actual Total Time", total_sec, 3, es);
+ ReportPropertyFloat("Actual Startup Time", startup_sec, 3, es);
+ ReportPropertyFloat("Actual Total Time", total_sec, 3, es);
}
- ExplainPropertyFloat("Actual Rows", rows, 0, es);
- ExplainPropertyFloat("Actual Loops", nloops, 0, es);
+ ReportPropertyFloat("Actual Rows", rows, 0, es);
+ ReportPropertyFloat("Actual Loops", nloops, 0, es);
}
}
else if (es->analyze)
{
- if (es->format == EXPLAIN_FORMAT_TEXT)
+ if (es->format == REPORT_FORMAT_TEXT)
appendStringInfoString(es->str, " (never executed)");
else
{
if (es->timing)
{
- ExplainPropertyFloat("Actual Startup Time", 0.0, 3, es);
- ExplainPropertyFloat("Actual Total Time", 0.0, 3, es);
+ ReportPropertyFloat("Actual Startup Time", 0.0, 3, es);
+ ReportPropertyFloat("Actual Total Time", 0.0, 3, es);
}
- ExplainPropertyFloat("Actual Rows", 0.0, 0, es);
- ExplainPropertyFloat("Actual Loops", 0.0, 0, es);
+ ReportPropertyFloat("Actual Rows", 0.0, 0, es);
+ ReportPropertyFloat("Actual Loops", 0.0, 0, es);
}
}
/* in text format, first line ends here */
- if (es->format == EXPLAIN_FORMAT_TEXT)
- appendStringInfoChar(es->str, '\n');
+ ReportNewLine(es);
/* target list */
if (es->verbose)
@@ -1350,9 +931,9 @@ ExplainNode(PlanState *planstate, List *ancestors,
case T_MergeJoin:
case T_HashJoin:
/* try not to be too chatty about this in text mode */
- if (es->format != EXPLAIN_FORMAT_TEXT ||
+ if (es->format != REPORT_FORMAT_TEXT ||
(es->verbose && ((Join *) plan)->inner_unique))
- ExplainPropertyBool("Inner Unique",
+ ReportPropertyBool("Inner Unique",
((Join *) plan)->inner_unique,
es);
break;
@@ -1361,260 +942,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
}
/* quals, sort keys, etc */
- switch (nodeTag(plan))
- {
- case T_IndexScan:
- show_scan_qual(((IndexScan *) plan)->indexqualorig,
- "Index Cond", planstate, ancestors, es);
- if (((IndexScan *) plan)->indexqualorig)
- show_instrumentation_count("Rows Removed by Index Recheck", 2,
- planstate, es);
- show_scan_qual(((IndexScan *) plan)->indexorderbyorig,
- "Order By", planstate, ancestors, es);
- show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
- if (plan->qual)
- show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
- break;
- case T_IndexOnlyScan:
- show_scan_qual(((IndexOnlyScan *) plan)->indexqual,
- "Index Cond", planstate, ancestors, es);
- if (((IndexOnlyScan *) plan)->indexqual)
- show_instrumentation_count("Rows Removed by Index Recheck", 2,
- planstate, es);
- show_scan_qual(((IndexOnlyScan *) plan)->indexorderby,
- "Order By", planstate, ancestors, es);
- show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
- if (plan->qual)
- show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
- if (es->analyze)
- ExplainPropertyLong("Heap Fetches",
- ((IndexOnlyScanState *) planstate)->ioss_HeapFetches, es);
- break;
- case T_BitmapIndexScan:
- show_scan_qual(((BitmapIndexScan *) plan)->indexqualorig,
- "Index Cond", planstate, ancestors, es);
- break;
- case T_BitmapHeapScan:
- show_scan_qual(((BitmapHeapScan *) plan)->bitmapqualorig,
- "Recheck Cond", planstate, ancestors, es);
- if (((BitmapHeapScan *) plan)->bitmapqualorig)
- show_instrumentation_count("Rows Removed by Index Recheck", 2,
- planstate, es);
- show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
- if (plan->qual)
- show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
- if (es->analyze)
- show_tidbitmap_info((BitmapHeapScanState *) planstate, es);
- break;
- case T_SampleScan:
- show_tablesample(((SampleScan *) plan)->tablesample,
- planstate, ancestors, es);
- /* FALL THRU to print additional fields the same as SeqScan */
- case T_SeqScan:
- case T_ValuesScan:
- case T_CteScan:
- case T_NamedTuplestoreScan:
- case T_WorkTableScan:
- case T_SubqueryScan:
- show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
- if (plan->qual)
- show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
- break;
- case T_Gather:
- {
- Gather *gather = (Gather *) plan;
-
- show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
- if (plan->qual)
- show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
- ExplainPropertyInteger("Workers Planned",
- gather->num_workers, es);
- if (es->analyze)
- {
- int nworkers;
-
- nworkers = ((GatherState *) planstate)->nworkers_launched;
- ExplainPropertyInteger("Workers Launched",
- nworkers, es);
- }
- if (gather->single_copy || es->format != EXPLAIN_FORMAT_TEXT)
- ExplainPropertyBool("Single Copy", gather->single_copy, es);
- }
- break;
- case T_GatherMerge:
- {
- GatherMerge *gm = (GatherMerge *) plan;
-
- show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
- if (plan->qual)
- show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
- ExplainPropertyInteger("Workers Planned",
- gm->num_workers, es);
- if (es->analyze)
- {
- int nworkers;
-
- nworkers = ((GatherMergeState *) planstate)->nworkers_launched;
- ExplainPropertyInteger("Workers Launched",
- nworkers, es);
- }
- }
- break;
- case T_FunctionScan:
- if (es->verbose)
- {
- List *fexprs = NIL;
- ListCell *lc;
-
- foreach(lc, ((FunctionScan *) plan)->functions)
- {
- RangeTblFunction *rtfunc = (RangeTblFunction *) lfirst(lc);
-
- fexprs = lappend(fexprs, rtfunc->funcexpr);
- }
- /* We rely on show_expression to insert commas as needed */
- show_expression((Node *) fexprs,
- "Function Call", planstate, ancestors,
- es->verbose, es);
- }
- show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
- if (plan->qual)
- show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
- break;
- case T_TableFuncScan:
- if (es->verbose)
- {
- TableFunc *tablefunc = ((TableFuncScan *) plan)->tablefunc;
-
- show_expression((Node *) tablefunc,
- "Table Function Call", planstate, ancestors,
- es->verbose, es);
- }
- show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
- if (plan->qual)
- show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
- break;
- case T_TidScan:
- {
- /*
- * The tidquals list has OR semantics, so be sure to show it
- * as an OR condition.
- */
- List *tidquals = ((TidScan *) plan)->tidquals;
-
- if (list_length(tidquals) > 1)
- tidquals = list_make1(make_orclause(tidquals));
- show_scan_qual(tidquals, "TID Cond", planstate, ancestors, es);
- show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
- if (plan->qual)
- show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
- }
- break;
- case T_ForeignScan:
- show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
- if (plan->qual)
- show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
- show_foreignscan_info((ForeignScanState *) planstate, es);
- break;
- case T_CustomScan:
- {
- CustomScanState *css = (CustomScanState *) planstate;
-
- show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
- if (plan->qual)
- show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
- if (css->methods->ExplainCustomScan)
- css->methods->ExplainCustomScan(css, ancestors, es);
- }
- break;
- case T_NestLoop:
- show_upper_qual(((NestLoop *) plan)->join.joinqual,
- "Join Filter", planstate, ancestors, es);
- if (((NestLoop *) plan)->join.joinqual)
- show_instrumentation_count("Rows Removed by Join Filter", 1,
- planstate, es);
- show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
- if (plan->qual)
- show_instrumentation_count("Rows Removed by Filter", 2,
- planstate, es);
- break;
- case T_MergeJoin:
- show_upper_qual(((MergeJoin *) plan)->mergeclauses,
- "Merge Cond", planstate, ancestors, es);
- show_upper_qual(((MergeJoin *) plan)->join.joinqual,
- "Join Filter", planstate, ancestors, es);
- if (((MergeJoin *) plan)->join.joinqual)
- show_instrumentation_count("Rows Removed by Join Filter", 1,
- planstate, es);
- show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
- if (plan->qual)
- show_instrumentation_count("Rows Removed by Filter", 2,
- planstate, es);
- break;
- case T_HashJoin:
- show_upper_qual(((HashJoin *) plan)->hashclauses,
- "Hash Cond", planstate, ancestors, es);
- show_upper_qual(((HashJoin *) plan)->join.joinqual,
- "Join Filter", planstate, ancestors, es);
- if (((HashJoin *) plan)->join.joinqual)
- show_instrumentation_count("Rows Removed by Join Filter", 1,
- planstate, es);
- show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
- if (plan->qual)
- show_instrumentation_count("Rows Removed by Filter", 2,
- planstate, es);
- break;
- case T_Agg:
- show_agg_keys(castNode(AggState, planstate), ancestors, es);
- show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
- if (plan->qual)
- show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
- break;
- case T_Group:
- show_group_keys(castNode(GroupState, planstate), ancestors, es);
- show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
- if (plan->qual)
- show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
- break;
- case T_Sort:
- show_sort_keys(castNode(SortState, planstate), ancestors, es);
- show_sort_info(castNode(SortState, planstate), es);
- break;
- case T_MergeAppend:
- show_merge_append_keys(castNode(MergeAppendState, planstate),
- ancestors, es);
- break;
- case T_Result:
- show_upper_qual((List *) ((Result *) plan)->resconstantqual,
- "One-Time Filter", planstate, ancestors, es);
- show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
- if (plan->qual)
- show_instrumentation_count("Rows Removed by Filter", 1,
- planstate, es);
- break;
- case T_ModifyTable:
- show_modifytable_info(castNode(ModifyTableState, planstate), ancestors,
- es);
- break;
- case T_Hash:
- show_hash_info(castNode(HashState, planstate), es);
- break;
- default:
- break;
- }
+ show_control_qual(planstate, ancestors, es);
/* Show buffer usage */
if (es->buffers && planstate->instrument)
@@ -1637,11 +965,12 @@ ExplainNode(PlanState *planstate, List *ancestors,
if (nloops <= 0)
continue;
+
startup_sec = 1000.0 * instrument->startup / nloops;
total_sec = 1000.0 * instrument->total / nloops;
rows = instrument->ntuples / nloops;
- if (es->format == EXPLAIN_FORMAT_TEXT)
+ if (es->format == REPORT_FORMAT_TEXT)
{
appendStringInfoSpaces(es->str, es->indent * 2);
appendStringInfo(es->str, "Worker %d: ", n);
@@ -1662,100 +991,80 @@ ExplainNode(PlanState *planstate, List *ancestors,
{
if (!opened_group)
{
- ExplainOpenGroup("Workers", "Workers", false, es);
+ ReportOpenGroup("Workers", "Workers", false, es);
opened_group = true;
}
- ExplainOpenGroup("Worker", NULL, true, es);
- ExplainPropertyInteger("Worker Number", n, es);
+ ReportOpenGroup("Worker", NULL, true, es);
+ ReportPropertyInteger("Worker Number", n, es);
if (es->timing)
{
- ExplainPropertyFloat("Actual Startup Time", startup_sec, 3, es);
- ExplainPropertyFloat("Actual Total Time", total_sec, 3, es);
+ ReportPropertyFloat("Actual Startup Time", startup_sec, 3, es);
+ ReportPropertyFloat("Actual Total Time", total_sec, 3, es);
}
- ExplainPropertyFloat("Actual Rows", rows, 0, es);
- ExplainPropertyFloat("Actual Loops", nloops, 0, es);
+ ReportPropertyFloat("Actual Rows", rows, 0, es);
+ ReportPropertyFloat("Actual Loops", nloops, 0, es);
if (es->buffers)
show_buffer_usage(es, &instrument->bufusage);
- ExplainCloseGroup("Worker", NULL, true, es);
+ ReportCloseGroup("Worker", NULL, true, es);
}
}
if (opened_group)
- ExplainCloseGroup("Workers", "Workers", false, es);
+ ReportCloseGroup("Workers", "Workers", false, es);
}
/* Get ready to display the child plans */
- haschildren = planstate->initPlan ||
- outerPlanState(planstate) ||
- innerPlanState(planstate) ||
- IsA(plan, ModifyTable) ||
- IsA(plan, Append) ||
- IsA(plan, MergeAppend) ||
- IsA(plan, BitmapAnd) ||
- IsA(plan, BitmapOr) ||
- IsA(plan, SubqueryScan) ||
- (IsA(planstate, CustomScanState) &&
- ((CustomScanState *) planstate)->custom_ps != NIL) ||
- planstate->subPlan;
+ haschildren = ReportHasChildren(plan, planstate);
if (haschildren)
{
- ExplainOpenGroup("Plans", "Plans", false, es);
+ ReportOpenGroup("Plans", "Plans", false, es);
/* Pass current PlanState as head of ancestors list for children */
ancestors = lcons(planstate, ancestors);
}
/* initPlan-s */
if (planstate->initPlan)
- ExplainSubPlans(planstate->initPlan, ancestors, "InitPlan", es);
+ ReportSubPlans(planstate->initPlan, ancestors, "InitPlan", es, ExplainNode);
/* lefttree */
if (outerPlanState(planstate))
- ExplainNode(outerPlanState(planstate), ancestors,
- "Outer", NULL, es);
+ ExplainNode(outerPlanState(planstate), ancestors, "Outer", NULL, es);
/* righttree */
if (innerPlanState(planstate))
- ExplainNode(innerPlanState(planstate), ancestors,
- "Inner", NULL, es);
+ ExplainNode(innerPlanState(planstate), ancestors, "Inner", NULL, es);
/* special child plans */
switch (nodeTag(plan))
{
case T_ModifyTable:
- ExplainMemberNodes(((ModifyTable *) plan)->plans,
- ((ModifyTableState *) planstate)->mt_plans,
- ancestors, es);
+ ReportMemberNodes(((ModifyTable*) plan)->plans,
+ ((ModifyTableState*) planstate)->mt_plans, ancestors, es, ExplainNode);
break;
case T_Append:
- ExplainMemberNodes(((Append *) plan)->appendplans,
- ((AppendState *) planstate)->appendplans,
- ancestors, es);
+ ReportMemberNodes(((Append *) plan)->appendplans,
+ ((AppendState *) planstate)->appendplans, ancestors, es, ExplainNode);
break;
case T_MergeAppend:
- ExplainMemberNodes(((MergeAppend *) plan)->mergeplans,
- ((MergeAppendState *) planstate)->mergeplans,
- ancestors, es);
+ ReportMemberNodes(((MergeAppend *) plan)->mergeplans,
+ ((MergeAppendState *) planstate)->mergeplans, ancestors, es, ExplainNode);
break;
case T_BitmapAnd:
- ExplainMemberNodes(((BitmapAnd *) plan)->bitmapplans,
- ((BitmapAndState *) planstate)->bitmapplans,
- ancestors, es);
+ ReportMemberNodes(((BitmapAnd *) plan)->bitmapplans,
+ ((BitmapAndState *) planstate)->bitmapplans, ancestors, es, ExplainNode);
break;
case T_BitmapOr:
- ExplainMemberNodes(((BitmapOr *) plan)->bitmapplans,
- ((BitmapOrState *) planstate)->bitmapplans,
- ancestors, es);
+ ReportMemberNodes(((BitmapOr *) plan)->bitmapplans,
+ ((BitmapOrState *) planstate)->bitmapplans, ancestors, es, ExplainNode);
break;
case T_SubqueryScan:
- ExplainNode(((SubqueryScanState *) planstate)->subplan, ancestors,
- "Subquery", NULL, es);
+ ExplainNode(((SubqueryScanState *) planstate)->subplan, ancestors, "Subquery", NULL, es);
break;
case T_CustomScan:
- ExplainCustomChildren((CustomScanState *) planstate,
- ancestors, es);
+ ReportCustomChildren((CustomScanState *) planstate, ancestors, es, ExplainNode);
break;
default:
break;
@@ -1763,1764 +1072,323 @@ ExplainNode(PlanState *planstate, List *ancestors,
/* subPlan-s */
if (planstate->subPlan)
- ExplainSubPlans(planstate->subPlan, ancestors, "SubPlan", es);
+ ReportSubPlans(planstate->subPlan, ancestors, "SubPlan", es, ExplainNode);
/* end of child plans */
if (haschildren)
{
ancestors = list_delete_first(ancestors);
- ExplainCloseGroup("Plans", "Plans", false, es);
+ ReportCloseGroup("Plans", "Plans", false, es);
}
/* in text format, undo whatever indentation we added */
- if (es->format == EXPLAIN_FORMAT_TEXT)
+ if (es->format == REPORT_FORMAT_TEXT)
es->indent = save_indent;
- ExplainCloseGroup("Plan",
+ ReportCloseGroup("Plan",
relationship ? NULL : "Plan",
true, es);
}
/*
- * Show the targetlist of a plan node
+ * Show buffer usage details.
*/
-static void
-show_plan_tlist(PlanState *planstate, List *ancestors, ExplainState *es)
+void
+show_buffer_usage(ReportState *es, const BufferUsage *usage)
{
- Plan *plan = planstate->plan;
- List *context;
- List *result = NIL;
- bool useprefix;
- ListCell *lc;
+ if (es->format == REPORT_FORMAT_TEXT)
+ {
+ bool has_shared = (usage->shared_blks_hit > 0 ||
+ usage->shared_blks_read > 0 ||
+ usage->shared_blks_dirtied > 0 ||
+ usage->shared_blks_written > 0);
+ bool has_local = (usage->local_blks_hit > 0 ||
+ usage->local_blks_read > 0 ||
+ usage->local_blks_dirtied > 0 ||
+ usage->local_blks_written > 0);
+ bool has_temp = (usage->temp_blks_read > 0 ||
+ usage->temp_blks_written > 0);
+ bool has_timing = (!INSTR_TIME_IS_ZERO(usage->blk_read_time) ||
+ !INSTR_TIME_IS_ZERO(usage->blk_write_time));
- /* No work if empty tlist (this occurs eg in bitmap indexscans) */
- if (plan->targetlist == NIL)
- return;
- /* The tlist of an Append isn't real helpful, so suppress it */
- if (IsA(plan, Append))
- return;
- /* Likewise for MergeAppend and RecursiveUnion */
- if (IsA(plan, MergeAppend))
- return;
- if (IsA(plan, RecursiveUnion))
- return;
+ /* Show only positive counter values. */
+ if (has_shared || has_local || has_temp)
+ {
+ appendStringInfoSpaces(es->str, es->indent * 2);
+ appendStringInfoString(es->str, "Buffers:");
- /*
- * Likewise for ForeignScan that executes a direct INSERT/UPDATE/DELETE
- *
- * Note: the tlist for a ForeignScan that executes a direct INSERT/UPDATE
- * might contain subplan output expressions that are confusing in this
- * context. The tlist for a ForeignScan that executes a direct UPDATE/
- * DELETE always contains "junk" target columns to identify the exact row
- * to update or delete, which would be confusing in this context. So, we
- * suppress it in all the cases.
- */
- if (IsA(plan, ForeignScan) &&
- ((ForeignScan *) plan)->operation != CMD_SELECT)
- return;
+ if (has_shared)
+ {
+ appendStringInfoString(es->str, " shared");
+ if (usage->shared_blks_hit > 0)
+ appendStringInfo(es->str, " hit=%ld",
+ usage->shared_blks_hit);
- /* Set up deparsing context */
- context = set_deparse_context_planstate(es->deparse_cxt,
- (Node *) planstate,
- ancestors);
- useprefix = list_length(es->rtable) > 1;
-
- /* Deparse each result column (we now include resjunk ones) */
- foreach(lc, plan->targetlist)
- {
- TargetEntry *tle = (TargetEntry *) lfirst(lc);
-
- result = lappend(result,
- deparse_expression((Node *) tle->expr, context,
- useprefix, false));
- }
-
- /* Print results */
- ExplainPropertyList("Output", result, es);
-}
-
-/*
- * Show a generic expression
- */
-static void
-show_expression(Node *node, const char *qlabel,
- PlanState *planstate, List *ancestors,
- bool useprefix, ExplainState *es)
-{
- List *context;
- char *exprstr;
-
- /* Set up deparsing context */
- context = set_deparse_context_planstate(es->deparse_cxt,
- (Node *) planstate,
- ancestors);
-
- /* Deparse the expression */
- exprstr = deparse_expression(node, context, useprefix, false);
+ if (usage->shared_blks_read > 0)
+ appendStringInfo(es->str, " read=%ld",
+ usage->shared_blks_read);
- /* And add to es->str */
- ExplainPropertyText(qlabel, exprstr, es);
-}
+ if (usage->shared_blks_dirtied > 0)
+ appendStringInfo(es->str, " dirtied=%ld",
+ usage->shared_blks_dirtied);
-/*
- * Show a qualifier expression (which is a List with implicit AND semantics)
- */
-static void
-show_qual(List *qual, const char *qlabel,
- PlanState *planstate, List *ancestors,
- bool useprefix, ExplainState *es)
-{
- Node *node;
+ if (usage->shared_blks_written > 0)
+ appendStringInfo(es->str, " written=%ld",
+ usage->shared_blks_written);
- /* No work if empty qual */
- if (qual == NIL)
- return;
+ if (has_local || has_temp)
+ appendStringInfoChar(es->str, ',');
+ }
- /* Convert AND list to explicit AND */
- node = (Node *) make_ands_explicit(qual);
+ if (has_local)
+ {
+ appendStringInfoString(es->str, " local");
+ if (usage->local_blks_hit > 0)
+ appendStringInfo(es->str, " hit=%ld",
+ usage->local_blks_hit);
- /* And show it */
- show_expression(node, qlabel, planstate, ancestors, useprefix, es);
-}
+ if (usage->local_blks_read > 0)
+ appendStringInfo(es->str, " read=%ld",
+ usage->local_blks_read);
-/*
- * Show a qualifier expression for a scan plan node
- */
-static void
-show_scan_qual(List *qual, const char *qlabel,
- PlanState *planstate, List *ancestors,
- ExplainState *es)
-{
- bool useprefix;
+ if (usage->local_blks_dirtied > 0)
+ appendStringInfo(es->str, " dirtied=%ld",
+ usage->local_blks_dirtied);
- useprefix = (IsA(planstate->plan, SubqueryScan) ||es->verbose);
- show_qual(qual, qlabel, planstate, ancestors, useprefix, es);
-}
+ if (usage->local_blks_written > 0)
+ appendStringInfo(es->str, " written=%ld",
+ usage->local_blks_written);
-/*
- * Show a qualifier expression for an upper-level plan node
- */
-static void
-show_upper_qual(List *qual, const char *qlabel,
- PlanState *planstate, List *ancestors,
- ExplainState *es)
-{
- bool useprefix;
+ if (has_temp)
+ appendStringInfoChar(es->str, ',');
+ }
- useprefix = (list_length(es->rtable) > 1 || es->verbose);
- show_qual(qual, qlabel, planstate, ancestors, useprefix, es);
-}
+ if (has_temp)
+ {
+ appendStringInfoString(es->str, " temp");
+ if (usage->temp_blks_read > 0)
+ appendStringInfo(es->str, " read=%ld",
+ usage->temp_blks_read);
-/*
- * Show the sort keys for a Sort node.
- */
-static void
-show_sort_keys(SortState *sortstate, List *ancestors, ExplainState *es)
-{
- Sort *plan = (Sort *) sortstate->ss.ps.plan;
+ if (usage->temp_blks_written > 0)
+ appendStringInfo(es->str, " written=%ld",
+ usage->temp_blks_written);
+ }
- show_sort_group_keys((PlanState *) sortstate, "Sort Key",
- plan->numCols, plan->sortColIdx,
- plan->sortOperators, plan->collations,
- plan->nullsFirst,
- ancestors, es);
-}
+ appendStringInfoChar(es->str, '\n');
+ }
-/*
- * Likewise, for a MergeAppend node.
- */
-static void
-show_merge_append_keys(MergeAppendState *mstate, List *ancestors,
- ExplainState *es)
-{
- MergeAppend *plan = (MergeAppend *) mstate->ps.plan;
+ /* As above, show only positive counter values. */
+ if (has_timing)
+ {
+ appendStringInfoSpaces(es->str, es->indent * 2);
+ appendStringInfoString(es->str, "I/O Timings:");
- show_sort_group_keys((PlanState *) mstate, "Sort Key",
- plan->numCols, plan->sortColIdx,
- plan->sortOperators, plan->collations,
- plan->nullsFirst,
- ancestors, es);
-}
+ if (!INSTR_TIME_IS_ZERO(usage->blk_read_time))
+ appendStringInfo(es->str, " read=%0.3f",
+ INSTR_TIME_GET_MILLISEC(usage->blk_read_time));
-/*
- * Show the grouping keys for an Agg node.
- */
-static void
-show_agg_keys(AggState *astate, List *ancestors,
- ExplainState *es)
-{
- Agg *plan = (Agg *) astate->ss.ps.plan;
+ if (!INSTR_TIME_IS_ZERO(usage->blk_write_time))
+ appendStringInfo(es->str, " write=%0.3f",
+ INSTR_TIME_GET_MILLISEC(usage->blk_write_time));
- if (plan->numCols > 0 || plan->groupingSets)
+ appendStringInfoChar(es->str, '\n');
+ }
+ }
+ else
{
- /* The key columns refer to the tlist of the child plan */
- ancestors = lcons(astate, ancestors);
-
- if (plan->groupingSets)
- show_grouping_sets(outerPlanState(astate), plan, ancestors, es);
- else
- show_sort_group_keys(outerPlanState(astate), "Group Key",
- plan->numCols, plan->grpColIdx,
- NULL, NULL, NULL,
- ancestors, es);
-
- ancestors = list_delete_first(ancestors);
+ ReportPropertyLong("Shared Hit Blocks", usage->shared_blks_hit, es);
+ ReportPropertyLong("Shared Read Blocks", usage->shared_blks_read, es);
+ ReportPropertyLong("Shared Dirtied Blocks", usage->shared_blks_dirtied, es);
+ ReportPropertyLong("Shared Written Blocks", usage->shared_blks_written, es);
+ ReportPropertyLong("Local Hit Blocks", usage->local_blks_hit, es);
+ ReportPropertyLong("Local Read Blocks", usage->local_blks_read, es);
+ ReportPropertyLong("Local Dirtied Blocks", usage->local_blks_dirtied, es);
+ ReportPropertyLong("Local Written Blocks", usage->local_blks_written, es);
+ ReportPropertyLong("Temp Read Blocks", usage->temp_blks_read, es);
+ ReportPropertyLong("Temp Written Blocks", usage->temp_blks_written, es);
+ if (track_io_timing)
+ {
+ ReportPropertyFloat("I/O Read Time", INSTR_TIME_GET_MILLISEC(usage->blk_read_time), 3, es);
+ ReportPropertyFloat("I/O Write Time", INSTR_TIME_GET_MILLISEC(usage->blk_write_time), 3, es);
+ }
}
}
+/*
+ * Add some additional details about an IndexScan or IndexOnlyScan
+ */
static void
-show_grouping_sets(PlanState *planstate, Agg *agg,
- List *ancestors, ExplainState *es)
+ExplainIndexScanDetails(Oid indexid, ScanDirection indexorderdir,
+ ReportState *es)
{
- List *context;
- bool useprefix;
- ListCell *lc;
-
- /* Set up deparsing context */
- context = set_deparse_context_planstate(es->deparse_cxt,
- (Node *) planstate,
- ancestors);
- useprefix = (list_length(es->rtable) > 1 || es->verbose);
-
- ExplainOpenGroup("Grouping Sets", "Grouping Sets", false, es);
-
- show_grouping_set_keys(planstate, agg, NULL,
- context, useprefix, ancestors, es);
+ const char *indexname = explain_get_index_name(indexid);
- foreach(lc, agg->chain)
+ if (es->format == REPORT_FORMAT_TEXT)
{
- Agg *aggnode = lfirst(lc);
- Sort *sortnode = (Sort *) aggnode->plan.lefttree;
-
- show_grouping_set_keys(planstate, aggnode, sortnode,
- context, useprefix, ancestors, es);
- }
-
- ExplainCloseGroup("Grouping Sets", "Grouping Sets", false, es);
-}
-
-static void
-show_grouping_set_keys(PlanState *planstate,
- Agg *aggnode, Sort *sortnode,
- List *context, bool useprefix,
- List *ancestors, ExplainState *es)
-{
- Plan *plan = planstate->plan;
- char *exprstr;
- ListCell *lc;
- List *gsets = aggnode->groupingSets;
- AttrNumber *keycols = aggnode->grpColIdx;
- const char *keyname;
- const char *keysetname;
+ if (ScanDirectionIsBackward(indexorderdir))
+ appendStringInfoString(es->str, " Backward");
- if (aggnode->aggstrategy == AGG_HASHED || aggnode->aggstrategy == AGG_MIXED)
- {
- keyname = "Hash Key";
- keysetname = "Hash Keys";
+ appendStringInfo(es->str, " using %s", indexname);
}
else
{
- keyname = "Group Key";
- keysetname = "Group Keys";
- }
-
- ExplainOpenGroup("Grouping Set", NULL, true, es);
-
- if (sortnode)
- {
- show_sort_group_keys(planstate, "Sort Key",
- sortnode->numCols, sortnode->sortColIdx,
- sortnode->sortOperators, sortnode->collations,
- sortnode->nullsFirst,
- ancestors, es);
- if (es->format == EXPLAIN_FORMAT_TEXT)
- es->indent++;
- }
-
- ExplainOpenGroup(keysetname, keysetname, false, es);
-
- foreach(lc, gsets)
- {
- List *result = NIL;
- ListCell *lc2;
+ const char *scandir;
- foreach(lc2, (List *) lfirst(lc))
+ switch (indexorderdir)
{
- Index i = lfirst_int(lc2);
- AttrNumber keyresno = keycols[i];
- TargetEntry *target = get_tle_by_resno(plan->targetlist,
- keyresno);
-
- if (!target)
- elog(ERROR, "no tlist entry for key %d", keyresno);
- /* Deparse the expression, showing any top-level cast */
- exprstr = deparse_expression((Node *) target->expr, context,
- useprefix, true);
-
- result = lappend(result, exprstr);
+ case BackwardScanDirection:
+ scandir = "Backward";
+ break;
+ case NoMovementScanDirection:
+ scandir = "NoMovement";
+ break;
+ case ForwardScanDirection:
+ scandir = "Forward";
+ break;
+ default:
+ scandir = "???";
+ break;
}
- if (!result && es->format == EXPLAIN_FORMAT_TEXT)
- ExplainPropertyText(keyname, "()", es);
- else
- ExplainPropertyListNested(keyname, result, es);
+ ReportPropertyText("Scan Direction", scandir, es);
+ ReportPropertyText("Index Name", indexname, es);
}
-
- ExplainCloseGroup(keysetname, keysetname, false, es);
-
- if (sortnode && es->format == EXPLAIN_FORMAT_TEXT)
- es->indent--;
-
- ExplainCloseGroup("Grouping Set", NULL, true, es);
}
/*
- * Show the grouping keys for a Group node.
+ * Show the target of a Scan node
*/
static void
-show_group_keys(GroupState *gstate, List *ancestors,
- ExplainState *es)
+ExplainScanTarget(Scan *plan, ReportState *es)
{
- Group *plan = (Group *) gstate->ss.ps.plan;
-
- /* The key columns refer to the tlist of the child plan */
- ancestors = lcons(gstate, ancestors);
- show_sort_group_keys(outerPlanState(gstate), "Group Key",
- plan->numCols, plan->grpColIdx,
- NULL, NULL, NULL,
- ancestors, es);
- ancestors = list_delete_first(ancestors);
+ ExplainTargetRel((Plan *) plan, plan->scanrelid, es);
}
/*
- * Common code to show sort/group keys, which are represented in plan nodes
- * as arrays of targetlist indexes. If it's a sort key rather than a group
- * key, also pass sort operators/collations/nullsFirst arrays.
+ * Show the target of a ModifyTable node
+ *
+ * Here we show the nominal target (ie, the relation that was named in the
+ * original query). If the actual target(s) is/are different, we'll show them
+ * in show_modifytable_info().
*/
static void
-show_sort_group_keys(PlanState *planstate, const char *qlabel,
- int nkeys, AttrNumber *keycols,
- Oid *sortOperators, Oid *collations, bool *nullsFirst,
- List *ancestors, ExplainState *es)
+ExplainModifyTarget(ModifyTable *plan, ReportState *es)
{
- Plan *plan = planstate->plan;
- List *context;
- List *result = NIL;
- StringInfoData sortkeybuf;
- bool useprefix;
- int keyno;
-
- if (nkeys <= 0)
- return;
-
- initStringInfo(&sortkeybuf);
-
- /* Set up deparsing context */
- context = set_deparse_context_planstate(es->deparse_cxt,
- (Node *) planstate,
- ancestors);
- useprefix = (list_length(es->rtable) > 1 || es->verbose);
-
- for (keyno = 0; keyno < nkeys; keyno++)
- {
- /* find key expression in tlist */
- AttrNumber keyresno = keycols[keyno];
- TargetEntry *target = get_tle_by_resno(plan->targetlist,
- keyresno);
- char *exprstr;
-
- if (!target)
- elog(ERROR, "no tlist entry for key %d", keyresno);
- /* Deparse the expression, showing any top-level cast */
- exprstr = deparse_expression((Node *) target->expr, context,
- useprefix, true);
- resetStringInfo(&sortkeybuf);
- appendStringInfoString(&sortkeybuf, exprstr);
- /* Append sort order information, if relevant */
- if (sortOperators != NULL)
- show_sortorder_options(&sortkeybuf,
- (Node *) target->expr,
- sortOperators[keyno],
- collations[keyno],
- nullsFirst[keyno]);
- /* Emit one property-list item per sort key */
- result = lappend(result, pstrdup(sortkeybuf.data));
- }
-
- ExplainPropertyList(qlabel, result, es);
+ ExplainTargetRel((Plan *) plan, plan->nominalRelation, es);
}
/*
- * Append nondefault characteristics of the sort ordering of a column to buf
- * (collation, direction, NULLS FIRST/LAST)
+ * Show the target relation of a scan or modify node
*/
static void
-show_sortorder_options(StringInfo buf, Node *sortexpr,
- Oid sortOperator, Oid collation, bool nullsFirst)
+ExplainTargetRel(Plan *plan, Index rti, ReportState *es)
{
- Oid sortcoltype = exprType(sortexpr);
- bool reverse = false;
- TypeCacheEntry *typentry;
+ char *objectname = NULL;
+ char *namespace = NULL;
+ const char *objecttag = NULL;
+ RangeTblEntry *rte;
+ char *refname;
- typentry = lookup_type_cache(sortcoltype,
- TYPECACHE_LT_OPR | TYPECACHE_GT_OPR);
+ rte = rt_fetch(rti, es->rtable);
+ refname = (char *) list_nth(es->rtable_names, rti - 1);
+ if (refname == NULL)
+ refname = rte->eref->aliasname;
+
+ switch (nodeTag(plan))
- /*
- * Print COLLATE if it's not default. There are some cases where this is
- * redundant, eg if expression is a column whose declared collation is
- * that collation, but it's hard to distinguish that here.
- */
- if (OidIsValid(collation) && collation != DEFAULT_COLLATION_OID)
{
- char *collname = get_collation_name(collation);
+ case T_SeqScan:
+ case T_SampleScan:
+ case T_IndexScan:
+ case T_IndexOnlyScan:
+ case T_BitmapHeapScan:
+ case T_TidScan:
+ case T_ForeignScan:
+ case T_CustomScan:
+ case T_ModifyTable:
+ /* Assert it's on a real relation */
+ Assert(rte->rtekind == RTE_RELATION);
+ objectname = get_rel_name(rte->relid);
+ if (es->verbose)
+ namespace = get_namespace_name(get_rel_namespace(rte->relid));
- if (collname == NULL)
- elog(ERROR, "cache lookup failed for collation %u", collation);
- appendStringInfo(buf, " COLLATE %s", quote_identifier(collname));
- }
+ objecttag = "Relation Name";
- /* Print direction if not ASC, or USING if non-default sort operator */
- if (sortOperator == typentry->gt_opr)
- {
- appendStringInfoString(buf, " DESC");
- reverse = true;
- }
- else if (sortOperator != typentry->lt_opr)
- {
- char *opname = get_opname(sortOperator);
+ break;
- if (opname == NULL)
- elog(ERROR, "cache lookup failed for operator %u", sortOperator);
- appendStringInfo(buf, " USING %s", opname);
- /* Determine whether operator would be considered ASC or DESC */
- (void) get_equality_op_for_ordering_op(sortOperator, &reverse);
- }
+ case T_FunctionScan:
+ {
+ FunctionScan *fscan = (FunctionScan *) plan;
- /* Add NULLS FIRST/LAST only if it wouldn't be default */
- if (nullsFirst && !reverse)
- {
- appendStringInfoString(buf, " NULLS FIRST");
- }
- else if (!nullsFirst && reverse)
- {
- appendStringInfoString(buf, " NULLS LAST");
- }
-}
+ /* Assert it's on a RangeFunction */
+ Assert(rte->rtekind == RTE_FUNCTION);
-/*
- * Show TABLESAMPLE properties
- */
-static void
-show_tablesample(TableSampleClause *tsc, PlanState *planstate,
- List *ancestors, ExplainState *es)
-{
- List *context;
- bool useprefix;
- char *method_name;
- List *params = NIL;
- char *repeatable;
- ListCell *lc;
+ /*
+ * If the expression is still a function call of a single
+ * function, we can get the real name of the function.
+ * Otherwise, punt. (Even if it was a single function call
+ * originally, the optimizer could have simplified it away.)
+ */
+ if (list_length(fscan->functions) == 1)
+ {
+ RangeTblFunction *rtfunc = (RangeTblFunction *) linitial(fscan->functions);
- /* Set up deparsing context */
- context = set_deparse_context_planstate(es->deparse_cxt,
- (Node *) planstate,
- ancestors);
- useprefix = list_length(es->rtable) > 1;
+ if (IsA(rtfunc->funcexpr, FuncExpr))
+ {
+ FuncExpr *funcexpr = (FuncExpr *) rtfunc->funcexpr;
+ Oid funcid = funcexpr->funcid;
- /* Get the tablesample method name */
- method_name = get_func_name(tsc->tsmhandler);
+ objectname = get_func_name(funcid);
+ if (es->verbose)
+ namespace =
+ get_namespace_name(get_func_namespace(funcid));
+ }
+ }
+ objecttag = "Function Name";
- /* Deparse parameter expressions */
- foreach(lc, tsc->args)
- {
- Node *arg = (Node *) lfirst(lc);
+ }
+ break;
- params = lappend(params,
- deparse_expression(arg, context,
- useprefix, false));
+ case T_ValuesScan:
+ Assert(rte->rtekind == RTE_VALUES);
+ break;
+ case T_CteScan:
+ /* Assert it's on a non-self-reference CTE */
+ Assert(rte->rtekind == RTE_CTE);
+ Assert(!rte->self_reference);
+ objectname = rte->ctename;
+ objecttag = "CTE Name";
+ break;
+ case T_WorkTableScan:
+ /* Assert it's on a self-reference CTE */
+ Assert(rte->rtekind == RTE_CTE);
+ Assert(rte->self_reference);
+ objectname = rte->ctename;
+ objecttag = "CTE Name";
+ break;
+ default:
+ break;
}
- if (tsc->repeatable)
- repeatable = deparse_expression((Node *) tsc->repeatable, context,
- useprefix, false);
- else
- repeatable = NULL;
- /* Print results */
- if (es->format == EXPLAIN_FORMAT_TEXT)
+ if (es->format == REPORT_FORMAT_TEXT)
{
- bool first = true;
+ appendStringInfoString(es->str, " on");
+ if (namespace != NULL)
+ appendStringInfo(es->str, " %s.%s", quote_identifier(namespace),
+ quote_identifier(objectname));
+ else if (objectname != NULL)
+ appendStringInfo(es->str, " %s", quote_identifier(objectname));
- appendStringInfoSpaces(es->str, es->indent * 2);
- appendStringInfo(es->str, "Sampling: %s (", method_name);
- foreach(lc, params)
- {
- if (!first)
- appendStringInfoString(es->str, ", ");
- appendStringInfoString(es->str, (const char *) lfirst(lc));
- first = false;
- }
- appendStringInfoChar(es->str, ')');
- if (repeatable)
- appendStringInfo(es->str, " REPEATABLE (%s)", repeatable);
- appendStringInfoChar(es->str, '\n');
+ if (objectname == NULL || strcmp(refname, objectname) != 0)
+ appendStringInfo(es->str, " %s", quote_identifier(refname));
}
else
{
- ExplainPropertyText("Sampling Method", method_name, es);
- ExplainPropertyList("Sampling Parameters", params, es);
- if (repeatable)
- ExplainPropertyText("Repeatable Seed", repeatable, es);
- }
-}
-
-/*
- * If it's EXPLAIN ANALYZE, show tuplesort stats for a sort node
- */
-static void
-show_sort_info(SortState *sortstate, ExplainState *es)
-{
- if (es->analyze && sortstate->sort_Done &&
- sortstate->tuplesortstate != NULL)
- {
- Tuplesortstate *state = (Tuplesortstate *) sortstate->tuplesortstate;
- const char *sortMethod;
- const char *spaceType;
- long spaceUsed;
-
- tuplesort_get_stats(state, &sortMethod, &spaceType, &spaceUsed);
-
- if (es->format == EXPLAIN_FORMAT_TEXT)
- {
- appendStringInfoSpaces(es->str, es->indent * 2);
- appendStringInfo(es->str, "Sort Method: %s %s: %ldkB\n",
- sortMethod, spaceType, spaceUsed);
- }
- else
- {
- ExplainPropertyText("Sort Method", sortMethod, es);
- ExplainPropertyLong("Sort Space Used", spaceUsed, es);
- ExplainPropertyText("Sort Space Type", spaceType, es);
- }
+ if (objecttag != NULL && objectname != NULL)
+ ReportPropertyText(objecttag, objectname, es);
+ if (namespace != NULL)
+ ReportPropertyText("Schema", namespace, es);
+ ReportPropertyText("Alias", refname, es);
}
}
-
-/*
- * Show information on hash buckets/batches.
- */
-static void
-show_hash_info(HashState *hashstate, ExplainState *es)
-{
- HashJoinTable hashtable;
-
- hashtable = hashstate->hashtable;
-
- if (hashtable)
- {
- long spacePeakKb = (hashtable->spacePeak + 1023) / 1024;
-
- if (es->format != EXPLAIN_FORMAT_TEXT)
- {
- ExplainPropertyLong("Hash Buckets", hashtable->nbuckets, es);
- ExplainPropertyLong("Original Hash Buckets",
- hashtable->nbuckets_original, es);
- ExplainPropertyLong("Hash Batches", hashtable->nbatch, es);
- ExplainPropertyLong("Original Hash Batches",
- hashtable->nbatch_original, es);
- ExplainPropertyLong("Peak Memory Usage", spacePeakKb, es);
- }
- else if (hashtable->nbatch_original != hashtable->nbatch ||
- hashtable->nbuckets_original != hashtable->nbuckets)
- {
- appendStringInfoSpaces(es->str, es->indent * 2);
- appendStringInfo(es->str,
- "Buckets: %d (originally %d) Batches: %d (originally %d) Memory Usage: %ldkB\n",
- hashtable->nbuckets,
- hashtable->nbuckets_original,
- hashtable->nbatch,
- hashtable->nbatch_original,
- spacePeakKb);
- }
- else
- {
- appendStringInfoSpaces(es->str, es->indent * 2);
- appendStringInfo(es->str,
- "Buckets: %d Batches: %d Memory Usage: %ldkB\n",
- hashtable->nbuckets, hashtable->nbatch,
- spacePeakKb);
- }
- }
-}
-
-/*
- * If it's EXPLAIN ANALYZE, show exact/lossy pages for a BitmapHeapScan node
- */
-static void
-show_tidbitmap_info(BitmapHeapScanState *planstate, ExplainState *es)
-{
- if (es->format != EXPLAIN_FORMAT_TEXT)
- {
- ExplainPropertyLong("Exact Heap Blocks", planstate->exact_pages, es);
- ExplainPropertyLong("Lossy Heap Blocks", planstate->lossy_pages, es);
- }
- else
- {
- if (planstate->exact_pages > 0 || planstate->lossy_pages > 0)
- {
- appendStringInfoSpaces(es->str, es->indent * 2);
- appendStringInfoString(es->str, "Heap Blocks:");
- if (planstate->exact_pages > 0)
- appendStringInfo(es->str, " exact=%ld", planstate->exact_pages);
- if (planstate->lossy_pages > 0)
- appendStringInfo(es->str, " lossy=%ld", planstate->lossy_pages);
- appendStringInfoChar(es->str, '\n');
- }
- }
-}
-
-/*
- * If it's EXPLAIN ANALYZE, show instrumentation information for a plan node
- *
- * "which" identifies which instrumentation counter to print
- */
-static void
-show_instrumentation_count(const char *qlabel, int which,
- PlanState *planstate, ExplainState *es)
-{
- double nfiltered;
- double nloops;
-
- if (!es->analyze || !planstate->instrument)
- return;
-
- if (which == 2)
- nfiltered = planstate->instrument->nfiltered2;
- else
- nfiltered = planstate->instrument->nfiltered1;
- nloops = planstate->instrument->nloops;
-
- /* In text mode, suppress zero counts; they're not interesting enough */
- if (nfiltered > 0 || es->format != EXPLAIN_FORMAT_TEXT)
- {
- if (nloops > 0)
- ExplainPropertyFloat(qlabel, nfiltered / nloops, 0, es);
- else
- ExplainPropertyFloat(qlabel, 0.0, 0, es);
- }
-}
-
-/*
- * Show extra information for a ForeignScan node.
- */
-static void
-show_foreignscan_info(ForeignScanState *fsstate, ExplainState *es)
-{
- FdwRoutine *fdwroutine = fsstate->fdwroutine;
-
- /* Let the FDW emit whatever fields it wants */
- if (((ForeignScan *) fsstate->ss.ps.plan)->operation != CMD_SELECT)
- {
- if (fdwroutine->ExplainDirectModify != NULL)
- fdwroutine->ExplainDirectModify(fsstate, es);
- }
- else
- {
- if (fdwroutine->ExplainForeignScan != NULL)
- fdwroutine->ExplainForeignScan(fsstate, es);
- }
-}
-
-/*
- * Fetch the name of an index in an EXPLAIN
- *
- * We allow plugins to get control here so that plans involving hypothetical
- * indexes can be explained.
- */
-static const char *
-explain_get_index_name(Oid indexId)
-{
- const char *result;
-
- if (explain_get_index_name_hook)
- result = (*explain_get_index_name_hook) (indexId);
- else
- result = NULL;
- if (result == NULL)
- {
- /* default behavior: look in the catalogs and quote it */
- result = get_rel_name(indexId);
- if (result == NULL)
- elog(ERROR, "cache lookup failed for index %u", indexId);
- result = quote_identifier(result);
- }
- return result;
-}
-
-/*
- * Show buffer usage details.
- */
-static void
-show_buffer_usage(ExplainState *es, const BufferUsage *usage)
-{
- if (es->format == EXPLAIN_FORMAT_TEXT)
- {
- bool has_shared = (usage->shared_blks_hit > 0 ||
- usage->shared_blks_read > 0 ||
- usage->shared_blks_dirtied > 0 ||
- usage->shared_blks_written > 0);
- bool has_local = (usage->local_blks_hit > 0 ||
- usage->local_blks_read > 0 ||
- usage->local_blks_dirtied > 0 ||
- usage->local_blks_written > 0);
- bool has_temp = (usage->temp_blks_read > 0 ||
- usage->temp_blks_written > 0);
- bool has_timing = (!INSTR_TIME_IS_ZERO(usage->blk_read_time) ||
- !INSTR_TIME_IS_ZERO(usage->blk_write_time));
-
- /* Show only positive counter values. */
- if (has_shared || has_local || has_temp)
- {
- appendStringInfoSpaces(es->str, es->indent * 2);
- appendStringInfoString(es->str, "Buffers:");
-
- if (has_shared)
- {
- appendStringInfoString(es->str, " shared");
- if (usage->shared_blks_hit > 0)
- appendStringInfo(es->str, " hit=%ld",
- usage->shared_blks_hit);
- if (usage->shared_blks_read > 0)
- appendStringInfo(es->str, " read=%ld",
- usage->shared_blks_read);
- if (usage->shared_blks_dirtied > 0)
- appendStringInfo(es->str, " dirtied=%ld",
- usage->shared_blks_dirtied);
- if (usage->shared_blks_written > 0)
- appendStringInfo(es->str, " written=%ld",
- usage->shared_blks_written);
- if (has_local || has_temp)
- appendStringInfoChar(es->str, ',');
- }
- if (has_local)
- {
- appendStringInfoString(es->str, " local");
- if (usage->local_blks_hit > 0)
- appendStringInfo(es->str, " hit=%ld",
- usage->local_blks_hit);
- if (usage->local_blks_read > 0)
- appendStringInfo(es->str, " read=%ld",
- usage->local_blks_read);
- if (usage->local_blks_dirtied > 0)
- appendStringInfo(es->str, " dirtied=%ld",
- usage->local_blks_dirtied);
- if (usage->local_blks_written > 0)
- appendStringInfo(es->str, " written=%ld",
- usage->local_blks_written);
- if (has_temp)
- appendStringInfoChar(es->str, ',');
- }
- if (has_temp)
- {
- appendStringInfoString(es->str, " temp");
- if (usage->temp_blks_read > 0)
- appendStringInfo(es->str, " read=%ld",
- usage->temp_blks_read);
- if (usage->temp_blks_written > 0)
- appendStringInfo(es->str, " written=%ld",
- usage->temp_blks_written);
- }
- appendStringInfoChar(es->str, '\n');
- }
-
- /* As above, show only positive counter values. */
- if (has_timing)
- {
- appendStringInfoSpaces(es->str, es->indent * 2);
- appendStringInfoString(es->str, "I/O Timings:");
- if (!INSTR_TIME_IS_ZERO(usage->blk_read_time))
- appendStringInfo(es->str, " read=%0.3f",
- INSTR_TIME_GET_MILLISEC(usage->blk_read_time));
- if (!INSTR_TIME_IS_ZERO(usage->blk_write_time))
- appendStringInfo(es->str, " write=%0.3f",
- INSTR_TIME_GET_MILLISEC(usage->blk_write_time));
- appendStringInfoChar(es->str, '\n');
- }
- }
- else
- {
- ExplainPropertyLong("Shared Hit Blocks", usage->shared_blks_hit, es);
- ExplainPropertyLong("Shared Read Blocks", usage->shared_blks_read, es);
- ExplainPropertyLong("Shared Dirtied Blocks", usage->shared_blks_dirtied, es);
- ExplainPropertyLong("Shared Written Blocks", usage->shared_blks_written, es);
- ExplainPropertyLong("Local Hit Blocks", usage->local_blks_hit, es);
- ExplainPropertyLong("Local Read Blocks", usage->local_blks_read, es);
- ExplainPropertyLong("Local Dirtied Blocks", usage->local_blks_dirtied, es);
- ExplainPropertyLong("Local Written Blocks", usage->local_blks_written, es);
- ExplainPropertyLong("Temp Read Blocks", usage->temp_blks_read, es);
- ExplainPropertyLong("Temp Written Blocks", usage->temp_blks_written, es);
- if (track_io_timing)
- {
- ExplainPropertyFloat("I/O Read Time", INSTR_TIME_GET_MILLISEC(usage->blk_read_time), 3, es);
- ExplainPropertyFloat("I/O Write Time", INSTR_TIME_GET_MILLISEC(usage->blk_write_time), 3, es);
- }
- }
-}
-
-/*
- * Add some additional details about an IndexScan or IndexOnlyScan
- */
-static void
-ExplainIndexScanDetails(Oid indexid, ScanDirection indexorderdir,
- ExplainState *es)
-{
- const char *indexname = explain_get_index_name(indexid);
-
- if (es->format == EXPLAIN_FORMAT_TEXT)
- {
- if (ScanDirectionIsBackward(indexorderdir))
- appendStringInfoString(es->str, " Backward");
- appendStringInfo(es->str, " using %s", indexname);
- }
- else
- {
- const char *scandir;
-
- switch (indexorderdir)
- {
- case BackwardScanDirection:
- scandir = "Backward";
- break;
- case NoMovementScanDirection:
- scandir = "NoMovement";
- break;
- case ForwardScanDirection:
- scandir = "Forward";
- break;
- default:
- scandir = "???";
- break;
- }
- ExplainPropertyText("Scan Direction", scandir, es);
- ExplainPropertyText("Index Name", indexname, es);
- }
-}
-
-/*
- * Show the target of a Scan node
- */
-static void
-ExplainScanTarget(Scan *plan, ExplainState *es)
-{
- ExplainTargetRel((Plan *) plan, plan->scanrelid, es);
-}
-
-/*
- * Show the target of a ModifyTable node
- *
- * Here we show the nominal target (ie, the relation that was named in the
- * original query). If the actual target(s) is/are different, we'll show them
- * in show_modifytable_info().
- */
-static void
-ExplainModifyTarget(ModifyTable *plan, ExplainState *es)
-{
- ExplainTargetRel((Plan *) plan, plan->nominalRelation, es);
-}
-
-/*
- * Show the target relation of a scan or modify node
- */
-static void
-ExplainTargetRel(Plan *plan, Index rti, ExplainState *es)
-{
- char *objectname = NULL;
- char *namespace = NULL;
- const char *objecttag = NULL;
- RangeTblEntry *rte;
- char *refname;
-
- rte = rt_fetch(rti, es->rtable);
- refname = (char *) list_nth(es->rtable_names, rti - 1);
- if (refname == NULL)
- refname = rte->eref->aliasname;
-
- switch (nodeTag(plan))
- {
- case T_SeqScan:
- case T_SampleScan:
- case T_IndexScan:
- case T_IndexOnlyScan:
- case T_BitmapHeapScan:
- case T_TidScan:
- case T_ForeignScan:
- case T_CustomScan:
- case T_ModifyTable:
- /* Assert it's on a real relation */
- Assert(rte->rtekind == RTE_RELATION);
- objectname = get_rel_name(rte->relid);
- if (es->verbose)
- namespace = get_namespace_name(get_rel_namespace(rte->relid));
- objecttag = "Relation Name";
- break;
- case T_FunctionScan:
- {
- FunctionScan *fscan = (FunctionScan *) plan;
-
- /* Assert it's on a RangeFunction */
- Assert(rte->rtekind == RTE_FUNCTION);
-
- /*
- * If the expression is still a function call of a single
- * function, we can get the real name of the function.
- * Otherwise, punt. (Even if it was a single function call
- * originally, the optimizer could have simplified it away.)
- */
- if (list_length(fscan->functions) == 1)
- {
- RangeTblFunction *rtfunc = (RangeTblFunction *) linitial(fscan->functions);
-
- if (IsA(rtfunc->funcexpr, FuncExpr))
- {
- FuncExpr *funcexpr = (FuncExpr *) rtfunc->funcexpr;
- Oid funcid = funcexpr->funcid;
-
- objectname = get_func_name(funcid);
- if (es->verbose)
- namespace =
- get_namespace_name(get_func_namespace(funcid));
- }
- }
- objecttag = "Function Name";
- }
- break;
- case T_TableFuncScan:
- Assert(rte->rtekind == RTE_TABLEFUNC);
- objectname = "xmltable";
- objecttag = "Table Function Name";
- break;
- case T_ValuesScan:
- Assert(rte->rtekind == RTE_VALUES);
- break;
- case T_CteScan:
- /* Assert it's on a non-self-reference CTE */
- Assert(rte->rtekind == RTE_CTE);
- Assert(!rte->self_reference);
- objectname = rte->ctename;
- objecttag = "CTE Name";
- break;
- case T_NamedTuplestoreScan:
- Assert(rte->rtekind == RTE_NAMEDTUPLESTORE);
- objectname = rte->enrname;
- objecttag = "Tuplestore Name";
- break;
- case T_WorkTableScan:
- /* Assert it's on a self-reference CTE */
- Assert(rte->rtekind == RTE_CTE);
- Assert(rte->self_reference);
- objectname = rte->ctename;
- objecttag = "CTE Name";
- break;
- default:
- break;
- }
-
- if (es->format == EXPLAIN_FORMAT_TEXT)
- {
- appendStringInfoString(es->str, " on");
- if (namespace != NULL)
- appendStringInfo(es->str, " %s.%s", quote_identifier(namespace),
- quote_identifier(objectname));
- else if (objectname != NULL)
- appendStringInfo(es->str, " %s", quote_identifier(objectname));
- if (objectname == NULL || strcmp(refname, objectname) != 0)
- appendStringInfo(es->str, " %s", quote_identifier(refname));
- }
- else
- {
- if (objecttag != NULL && objectname != NULL)
- ExplainPropertyText(objecttag, objectname, es);
- if (namespace != NULL)
- ExplainPropertyText("Schema", namespace, es);
- ExplainPropertyText("Alias", refname, es);
- }
-}
-
-/*
- * Show extra information for a ModifyTable node
- *
- * We have three objectives here. First, if there's more than one target
- * table or it's different from the nominal target, identify the actual
- * target(s). Second, give FDWs a chance to display extra info about foreign
- * targets. Third, show information about ON CONFLICT.
- */
-static void
-show_modifytable_info(ModifyTableState *mtstate, List *ancestors,
- ExplainState *es)
-{
- ModifyTable *node = (ModifyTable *) mtstate->ps.plan;
- const char *operation;
- const char *foperation;
- bool labeltargets;
- int j;
- List *idxNames = NIL;
- ListCell *lst;
-
- switch (node->operation)
- {
- case CMD_INSERT:
- operation = "Insert";
- foperation = "Foreign Insert";
- break;
- case CMD_UPDATE:
- operation = "Update";
- foperation = "Foreign Update";
- break;
- case CMD_DELETE:
- operation = "Delete";
- foperation = "Foreign Delete";
- break;
- default:
- operation = "???";
- foperation = "Foreign ???";
- break;
- }
-
- /* Should we explicitly label target relations? */
- labeltargets = (mtstate->mt_nplans > 1 ||
- (mtstate->mt_nplans == 1 &&
- mtstate->resultRelInfo->ri_RangeTableIndex != node->nominalRelation));
-
- if (labeltargets)
- ExplainOpenGroup("Target Tables", "Target Tables", false, es);
-
- for (j = 0; j < mtstate->mt_nplans; j++)
- {
- ResultRelInfo *resultRelInfo = mtstate->resultRelInfo + j;
- FdwRoutine *fdwroutine = resultRelInfo->ri_FdwRoutine;
-
- if (labeltargets)
- {
- /* Open a group for this target */
- ExplainOpenGroup("Target Table", NULL, true, es);
-
- /*
- * In text mode, decorate each target with operation type, so that
- * ExplainTargetRel's output of " on foo" will read nicely.
- */
- if (es->format == EXPLAIN_FORMAT_TEXT)
- {
- appendStringInfoSpaces(es->str, es->indent * 2);
- appendStringInfoString(es->str,
- fdwroutine ? foperation : operation);
- }
-
- /* Identify target */
- ExplainTargetRel((Plan *) node,
- resultRelInfo->ri_RangeTableIndex,
- es);
-
- if (es->format == EXPLAIN_FORMAT_TEXT)
- {
- appendStringInfoChar(es->str, '\n');
- es->indent++;
- }
- }
-
- /* Give FDW a chance if needed */
- if (!resultRelInfo->ri_usesFdwDirectModify &&
- fdwroutine != NULL &&
- fdwroutine->ExplainForeignModify != NULL)
- {
- List *fdw_private = (List *) list_nth(node->fdwPrivLists, j);
-
- fdwroutine->ExplainForeignModify(mtstate,
- resultRelInfo,
- fdw_private,
- j,
- es);
- }
-
- if (labeltargets)
- {
- /* Undo the indentation we added in text format */
- if (es->format == EXPLAIN_FORMAT_TEXT)
- es->indent--;
-
- /* Close the group */
- ExplainCloseGroup("Target Table", NULL, true, es);
- }
- }
-
- /* Gather names of ON CONFLICT arbiter indexes */
- foreach(lst, node->arbiterIndexes)
- {
- char *indexname = get_rel_name(lfirst_oid(lst));
-
- idxNames = lappend(idxNames, indexname);
- }
-
- if (node->onConflictAction != ONCONFLICT_NONE)
- {
- ExplainProperty("Conflict Resolution",
- node->onConflictAction == ONCONFLICT_NOTHING ?
- "NOTHING" : "UPDATE",
- false, es);
-
- /*
- * Don't display arbiter indexes at all when DO NOTHING variant
- * implicitly ignores all conflicts
- */
- if (idxNames)
- ExplainPropertyList("Conflict Arbiter Indexes", idxNames, es);
-
- /* ON CONFLICT DO UPDATE WHERE qual is specially displayed */
- if (node->onConflictWhere)
- {
- show_upper_qual((List *) node->onConflictWhere, "Conflict Filter",
- &mtstate->ps, ancestors, es);
- show_instrumentation_count("Rows Removed by Conflict Filter", 1, &mtstate->ps, es);
- }
-
- /* EXPLAIN ANALYZE display of actual outcome for each tuple proposed */
- if (es->analyze && mtstate->ps.instrument)
- {
- double total;
- double insert_path;
- double other_path;
-
- InstrEndLoop(mtstate->mt_plans[0]->instrument);
-
- /* count the number of source rows */
- total = mtstate->mt_plans[0]->instrument->ntuples;
- other_path = mtstate->ps.instrument->nfiltered2;
- insert_path = total - other_path;
-
- ExplainPropertyFloat("Tuples Inserted", insert_path, 0, es);
- ExplainPropertyFloat("Conflicting Tuples", other_path, 0, es);
- }
- }
-
- if (labeltargets)
- ExplainCloseGroup("Target Tables", "Target Tables", false, es);
-}
-
-/*
- * Explain the constituent plans of a ModifyTable, Append, MergeAppend,
- * BitmapAnd, or BitmapOr node.
- *
- * The ancestors list should already contain the immediate parent of these
- * plans.
- *
- * Note: we don't actually need to examine the Plan list members, but
- * we need the list in order to determine the length of the PlanState array.
- */
-static void
-ExplainMemberNodes(List *plans, PlanState **planstates,
- List *ancestors, ExplainState *es)
-{
- int nplans = list_length(plans);
- int j;
-
- for (j = 0; j < nplans; j++)
- ExplainNode(planstates[j], ancestors,
- "Member", NULL, es);
-}
-
-/*
- * Explain a list of SubPlans (or initPlans, which also use SubPlan nodes).
- *
- * The ancestors list should already contain the immediate parent of these
- * SubPlanStates.
- */
-static void
-ExplainSubPlans(List *plans, List *ancestors,
- const char *relationship, ExplainState *es)
-{
- ListCell *lst;
-
- foreach(lst, plans)
- {
- SubPlanState *sps = (SubPlanState *) lfirst(lst);
- SubPlan *sp = sps->subplan;
-
- /*
- * There can be multiple SubPlan nodes referencing the same physical
- * subplan (same plan_id, which is its index in PlannedStmt.subplans).
- * We should print a subplan only once, so track which ones we already
- * printed. This state must be global across the plan tree, since the
- * duplicate nodes could be in different plan nodes, eg both a bitmap
- * indexscan's indexqual and its parent heapscan's recheck qual. (We
- * do not worry too much about which plan node we show the subplan as
- * attached to in such cases.)
- */
- if (bms_is_member(sp->plan_id, es->printed_subplans))
- continue;
- es->printed_subplans = bms_add_member(es->printed_subplans,
- sp->plan_id);
-
- ExplainNode(sps->planstate, ancestors,
- relationship, sp->plan_name, es);
- }
-}
-
-/*
- * Explain a list of children of a CustomScan.
- */
-static void
-ExplainCustomChildren(CustomScanState *css, List *ancestors, ExplainState *es)
-{
- ListCell *cell;
- const char *label =
- (list_length(css->custom_ps) != 1 ? "children" : "child");
-
- foreach(cell, css->custom_ps)
- ExplainNode((PlanState *) lfirst(cell), ancestors, label, NULL, es);
-}
-
-/*
- * Explain a property, such as sort keys or targets, that takes the form of
- * a list of unlabeled items. "data" is a list of C strings.
- */
-void
-ExplainPropertyList(const char *qlabel, List *data, ExplainState *es)
-{
- ListCell *lc;
- bool first = true;
-
- switch (es->format)
- {
- case EXPLAIN_FORMAT_TEXT:
- appendStringInfoSpaces(es->str, es->indent * 2);
- appendStringInfo(es->str, "%s: ", qlabel);
- foreach(lc, data)
- {
- if (!first)
- appendStringInfoString(es->str, ", ");
- appendStringInfoString(es->str, (const char *) lfirst(lc));
- first = false;
- }
- appendStringInfoChar(es->str, '\n');
- break;
-
- case EXPLAIN_FORMAT_XML:
- ExplainXMLTag(qlabel, X_OPENING, es);
- foreach(lc, data)
- {
- char *str;
-
- appendStringInfoSpaces(es->str, es->indent * 2 + 2);
- appendStringInfoString(es->str, "<Item>");
- str = escape_xml((const char *) lfirst(lc));
- appendStringInfoString(es->str, str);
- pfree(str);
- appendStringInfoString(es->str, "</Item>\n");
- }
- ExplainXMLTag(qlabel, X_CLOSING, es);
- break;
-
- case EXPLAIN_FORMAT_JSON:
- ExplainJSONLineEnding(es);
- appendStringInfoSpaces(es->str, es->indent * 2);
- escape_json(es->str, qlabel);
- appendStringInfoString(es->str, ": [");
- foreach(lc, data)
- {
- if (!first)
- appendStringInfoString(es->str, ", ");
- escape_json(es->str, (const char *) lfirst(lc));
- first = false;
- }
- appendStringInfoChar(es->str, ']');
- break;
-
- case EXPLAIN_FORMAT_YAML:
- ExplainYAMLLineStarting(es);
- appendStringInfo(es->str, "%s: ", qlabel);
- foreach(lc, data)
- {
- appendStringInfoChar(es->str, '\n');
- appendStringInfoSpaces(es->str, es->indent * 2 + 2);
- appendStringInfoString(es->str, "- ");
- escape_yaml(es->str, (const char *) lfirst(lc));
- }
- break;
- }
-}
-
-/*
- * Explain a property that takes the form of a list of unlabeled items within
- * another list. "data" is a list of C strings.
- */
-void
-ExplainPropertyListNested(const char *qlabel, List *data, ExplainState *es)
-{
- ListCell *lc;
- bool first = true;
-
- switch (es->format)
- {
- case EXPLAIN_FORMAT_TEXT:
- case EXPLAIN_FORMAT_XML:
- ExplainPropertyList(qlabel, data, es);
- return;
-
- case EXPLAIN_FORMAT_JSON:
- ExplainJSONLineEnding(es);
- appendStringInfoSpaces(es->str, es->indent * 2);
- appendStringInfoChar(es->str, '[');
- foreach(lc, data)
- {
- if (!first)
- appendStringInfoString(es->str, ", ");
- escape_json(es->str, (const char *) lfirst(lc));
- first = false;
- }
- appendStringInfoChar(es->str, ']');
- break;
-
- case EXPLAIN_FORMAT_YAML:
- ExplainYAMLLineStarting(es);
- appendStringInfoString(es->str, "- [");
- foreach(lc, data)
- {
- if (!first)
- appendStringInfoString(es->str, ", ");
- escape_yaml(es->str, (const char *) lfirst(lc));
- first = false;
- }
- appendStringInfoChar(es->str, ']');
- break;
- }
-}
-
-/*
- * Explain a simple property.
- *
- * If "numeric" is true, the value is a number (or other value that
- * doesn't need quoting in JSON).
- *
- * This usually should not be invoked directly, but via one of the datatype
- * specific routines ExplainPropertyText, ExplainPropertyInteger, etc.
- */
-static void
-ExplainProperty(const char *qlabel, const char *value, bool numeric,
- ExplainState *es)
-{
- switch (es->format)
- {
- case EXPLAIN_FORMAT_TEXT:
- appendStringInfoSpaces(es->str, es->indent * 2);
- appendStringInfo(es->str, "%s: %s\n", qlabel, value);
- break;
-
- case EXPLAIN_FORMAT_XML:
- {
- char *str;
-
- appendStringInfoSpaces(es->str, es->indent * 2);
- ExplainXMLTag(qlabel, X_OPENING | X_NOWHITESPACE, es);
- str = escape_xml(value);
- appendStringInfoString(es->str, str);
- pfree(str);
- ExplainXMLTag(qlabel, X_CLOSING | X_NOWHITESPACE, es);
- appendStringInfoChar(es->str, '\n');
- }
- break;
-
- case EXPLAIN_FORMAT_JSON:
- ExplainJSONLineEnding(es);
- appendStringInfoSpaces(es->str, es->indent * 2);
- escape_json(es->str, qlabel);
- appendStringInfoString(es->str, ": ");
- if (numeric)
- appendStringInfoString(es->str, value);
- else
- escape_json(es->str, value);
- break;
-
- case EXPLAIN_FORMAT_YAML:
- ExplainYAMLLineStarting(es);
- appendStringInfo(es->str, "%s: ", qlabel);
- if (numeric)
- appendStringInfoString(es->str, value);
- else
- escape_yaml(es->str, value);
- break;
- }
-}
-
-/*
- * Explain a string-valued property.
- */
-void
-ExplainPropertyText(const char *qlabel, const char *value, ExplainState *es)
-{
- ExplainProperty(qlabel, value, false, es);
-}
-
-/*
- * Explain an integer-valued property.
- */
-void
-ExplainPropertyInteger(const char *qlabel, int value, ExplainState *es)
-{
- char buf[32];
-
- snprintf(buf, sizeof(buf), "%d", value);
- ExplainProperty(qlabel, buf, true, es);
-}
-
-/*
- * Explain a long-integer-valued property.
- */
-void
-ExplainPropertyLong(const char *qlabel, long value, ExplainState *es)
-{
- char buf[32];
-
- snprintf(buf, sizeof(buf), "%ld", value);
- ExplainProperty(qlabel, buf, true, es);
-}
-
-/*
- * Explain a float-valued property, using the specified number of
- * fractional digits.
- */
-void
-ExplainPropertyFloat(const char *qlabel, double value, int ndigits,
- ExplainState *es)
-{
- char buf[256];
-
- snprintf(buf, sizeof(buf), "%.*f", ndigits, value);
- ExplainProperty(qlabel, buf, true, es);
-}
-
-/*
- * Explain a bool-valued property.
- */
-void
-ExplainPropertyBool(const char *qlabel, bool value, ExplainState *es)
-{
- ExplainProperty(qlabel, value ? "true" : "false", true, es);
-}
-
-/*
- * Open a group of related objects.
- *
- * objtype is the type of the group object, labelname is its label within
- * a containing object (if any).
- *
- * If labeled is true, the group members will be labeled properties,
- * while if it's false, they'll be unlabeled objects.
- */
-static void
-ExplainOpenGroup(const char *objtype, const char *labelname,
- bool labeled, ExplainState *es)
-{
- switch (es->format)
- {
- case EXPLAIN_FORMAT_TEXT:
- /* nothing to do */
- break;
-
- case EXPLAIN_FORMAT_XML:
- ExplainXMLTag(objtype, X_OPENING, es);
- es->indent++;
- break;
-
- case EXPLAIN_FORMAT_JSON:
- ExplainJSONLineEnding(es);
- appendStringInfoSpaces(es->str, 2 * es->indent);
- if (labelname)
- {
- escape_json(es->str, labelname);
- appendStringInfoString(es->str, ": ");
- }
- appendStringInfoChar(es->str, labeled ? '{' : '[');
-
- /*
- * In JSON format, the grouping_stack is an integer list. 0 means
- * we've emitted nothing at this grouping level, 1 means we've
- * emitted something (and so the next item needs a comma). See
- * ExplainJSONLineEnding().
- */
- es->grouping_stack = lcons_int(0, es->grouping_stack);
- es->indent++;
- break;
-
- case EXPLAIN_FORMAT_YAML:
-
- /*
- * In YAML format, the grouping stack is an integer list. 0 means
- * we've emitted nothing at this grouping level AND this grouping
- * level is unlabelled and must be marked with "- ". See
- * ExplainYAMLLineStarting().
- */
- ExplainYAMLLineStarting(es);
- if (labelname)
- {
- appendStringInfo(es->str, "%s: ", labelname);
- es->grouping_stack = lcons_int(1, es->grouping_stack);
- }
- else
- {
- appendStringInfoString(es->str, "- ");
- es->grouping_stack = lcons_int(0, es->grouping_stack);
- }
- es->indent++;
- break;
- }
-}
-
-/*
- * Close a group of related objects.
- * Parameters must match the corresponding ExplainOpenGroup call.
- */
-static void
-ExplainCloseGroup(const char *objtype, const char *labelname,
- bool labeled, ExplainState *es)
-{
- switch (es->format)
- {
- case EXPLAIN_FORMAT_TEXT:
- /* nothing to do */
- break;
-
- case EXPLAIN_FORMAT_XML:
- es->indent--;
- ExplainXMLTag(objtype, X_CLOSING, es);
- break;
-
- case EXPLAIN_FORMAT_JSON:
- es->indent--;
- appendStringInfoChar(es->str, '\n');
- appendStringInfoSpaces(es->str, 2 * es->indent);
- appendStringInfoChar(es->str, labeled ? '}' : ']');
- es->grouping_stack = list_delete_first(es->grouping_stack);
- break;
-
- case EXPLAIN_FORMAT_YAML:
- es->indent--;
- es->grouping_stack = list_delete_first(es->grouping_stack);
- break;
- }
-}
-
-/*
- * Emit a "dummy" group that never has any members.
- *
- * objtype is the type of the group object, labelname is its label within
- * a containing object (if any).
- */
-static void
-ExplainDummyGroup(const char *objtype, const char *labelname, ExplainState *es)
-{
- switch (es->format)
- {
- case EXPLAIN_FORMAT_TEXT:
- /* nothing to do */
- break;
-
- case EXPLAIN_FORMAT_XML:
- ExplainXMLTag(objtype, X_CLOSE_IMMEDIATE, es);
- break;
-
- case EXPLAIN_FORMAT_JSON:
- ExplainJSONLineEnding(es);
- appendStringInfoSpaces(es->str, 2 * es->indent);
- if (labelname)
- {
- escape_json(es->str, labelname);
- appendStringInfoString(es->str, ": ");
- }
- escape_json(es->str, objtype);
- break;
-
- case EXPLAIN_FORMAT_YAML:
- ExplainYAMLLineStarting(es);
- if (labelname)
- {
- escape_yaml(es->str, labelname);
- appendStringInfoString(es->str, ": ");
- }
- else
- {
- appendStringInfoString(es->str, "- ");
- }
- escape_yaml(es->str, objtype);
- break;
- }
-}
-
-/*
- * Emit the start-of-output boilerplate.
- *
- * This is just enough different from processing a subgroup that we need
- * a separate pair of subroutines.
- */
-void
-ExplainBeginOutput(ExplainState *es)
-{
- switch (es->format)
- {
- case EXPLAIN_FORMAT_TEXT:
- /* nothing to do */
- break;
-
- case EXPLAIN_FORMAT_XML:
- appendStringInfoString(es->str,
- "<explain xmlns=\"http://www.postgresql.org/2009/explain\">\n");
- es->indent++;
- break;
-
- case EXPLAIN_FORMAT_JSON:
- /* top-level structure is an array of plans */
- appendStringInfoChar(es->str, '[');
- es->grouping_stack = lcons_int(0, es->grouping_stack);
- es->indent++;
- break;
-
- case EXPLAIN_FORMAT_YAML:
- es->grouping_stack = lcons_int(0, es->grouping_stack);
- break;
- }
-}
-
-/*
- * Emit the end-of-output boilerplate.
- */
-void
-ExplainEndOutput(ExplainState *es)
-{
- switch (es->format)
- {
- case EXPLAIN_FORMAT_TEXT:
- /* nothing to do */
- break;
-
- case EXPLAIN_FORMAT_XML:
- es->indent--;
- appendStringInfoString(es->str, "</explain>");
- break;
-
- case EXPLAIN_FORMAT_JSON:
- es->indent--;
- appendStringInfoString(es->str, "\n]");
- es->grouping_stack = list_delete_first(es->grouping_stack);
- break;
-
- case EXPLAIN_FORMAT_YAML:
- es->grouping_stack = list_delete_first(es->grouping_stack);
- break;
- }
-}
-
-/*
- * Put an appropriate separator between multiple plans
- */
-void
-ExplainSeparatePlans(ExplainState *es)
-{
- switch (es->format)
- {
- case EXPLAIN_FORMAT_TEXT:
- /* add a blank line */
- appendStringInfoChar(es->str, '\n');
- break;
-
- case EXPLAIN_FORMAT_XML:
- case EXPLAIN_FORMAT_JSON:
- case EXPLAIN_FORMAT_YAML:
- /* nothing to do */
- break;
- }
-}
-
-/*
- * Emit opening or closing XML tag.
- *
- * "flags" must contain X_OPENING, X_CLOSING, or X_CLOSE_IMMEDIATE.
- * Optionally, OR in X_NOWHITESPACE to suppress the whitespace we'd normally
- * add.
- *
- * XML restricts tag names more than our other output formats, eg they can't
- * contain white space or slashes. Replace invalid characters with dashes,
- * so that for example "I/O Read Time" becomes "I-O-Read-Time".
- */
-static void
-ExplainXMLTag(const char *tagname, int flags, ExplainState *es)
-{
- const char *s;
- const char *valid = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_.";
-
- if ((flags & X_NOWHITESPACE) == 0)
- appendStringInfoSpaces(es->str, 2 * es->indent);
- appendStringInfoCharMacro(es->str, '<');
- if ((flags & X_CLOSING) != 0)
- appendStringInfoCharMacro(es->str, '/');
- for (s = tagname; *s; s++)
- appendStringInfoChar(es->str, strchr(valid, *s) ? *s : '-');
- if ((flags & X_CLOSE_IMMEDIATE) != 0)
- appendStringInfoString(es->str, " /");
- appendStringInfoCharMacro(es->str, '>');
- if ((flags & X_NOWHITESPACE) == 0)
- appendStringInfoCharMacro(es->str, '\n');
-}
-
-/*
- * Emit a JSON line ending.
- *
- * JSON requires a comma after each property but the last. To facilitate this,
- * in JSON format, the text emitted for each property begins just prior to the
- * preceding line-break (and comma, if applicable).
- */
-static void
-ExplainJSONLineEnding(ExplainState *es)
-{
- Assert(es->format == EXPLAIN_FORMAT_JSON);
- if (linitial_int(es->grouping_stack) != 0)
- appendStringInfoChar(es->str, ',');
- else
- linitial_int(es->grouping_stack) = 1;
- appendStringInfoChar(es->str, '\n');
-}
-
-/*
- * Indent a YAML line.
- *
- * YAML lines are ordinarily indented by two spaces per indentation level.
- * The text emitted for each property begins just prior to the preceding
- * line-break, except for the first property in an unlabelled group, for which
- * it begins immediately after the "- " that introduces the group. The first
- * property of the group appears on the same line as the opening "- ".
- */
-static void
-ExplainYAMLLineStarting(ExplainState *es)
-{
- Assert(es->format == EXPLAIN_FORMAT_YAML);
- if (linitial_int(es->grouping_stack) == 0)
- {
- linitial_int(es->grouping_stack) = 1;
- }
- else
- {
- appendStringInfoChar(es->str, '\n');
- appendStringInfoSpaces(es->str, es->indent * 2);
- }
-}
-
-/*
- * YAML is a superset of JSON; unfortunately, the YAML quoting rules are
- * ridiculously complicated -- as documented in sections 5.3 and 7.3.3 of
- * http://yaml.org/spec/1.2/spec.html -- so we chose to just quote everything.
- * Empty strings, strings with leading or trailing whitespace, and strings
- * containing a variety of special characters must certainly be quoted or the
- * output is invalid; and other seemingly harmless strings like "0xa" or
- * "true" must be quoted, lest they be interpreted as a hexadecimal or Boolean
- * constant rather than a string.
- */
-static void
-escape_yaml(StringInfo buf, const char *str)
-{
- escape_json(buf, str);
-}
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index d265c77..ebbcb83 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -22,6 +22,7 @@
#include "catalog/pg_type.h"
#include "commands/createas.h"
#include "commands/prepare.h"
+#include "commands/report.h"
#include "miscadmin.h"
#include "nodes/nodeFuncs.h"
#include "parser/analyze.h"
@@ -628,7 +629,7 @@ DropAllPreparedStatements(void)
* not the original PREPARE; we get the latter string from the plancache.
*/
void
-ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
+ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ReportState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv)
{
@@ -692,7 +693,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
/* Separate plans with an appropriate separator */
if (lnext(p) != NULL)
- ExplainSeparatePlans(es);
+ ReportSeparatePlans(es);
}
if (estate)
diff --git a/src/backend/commands/progress.c b/src/backend/commands/progress.c
new file mode 100644
index 0000000..e427b8f
--- /dev/null
+++ b/src/backend/commands/progress.c
@@ -0,0 +1,1314 @@
+/*
+ * progress.c
+ * Monitor progression of request: PROGRESS
+ *
+ * Portions Copyright (c) 1996-2011, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ * src/backend/commands/monitor.c
+ */
+
+#include "postgres.h"
+
+#include <signal.h>
+#include <unistd.h>
+#include <sys/stat.h>
+
+#include "nodes/nodes.h"
+#include "tcop/dest.h"
+#include "catalog/pg_type.h"
+#include "nodes/extensible.h"
+#include "nodes/nodeFuncs.h"
+#include "parser/parsetree.h"
+#include "executor/progress.h"
+#include "access/xact.h"
+#include "storage/ipc.h"
+#include "storage/procarray.h"
+#include "storage/lmgr.h"
+#include "storage/latch.h"
+#include "storage/procsignal.h"
+#include "storage/backendid.h"
+#include "executor/execdesc.h"
+#include "executor/executor.h"
+#include "executor/hashjoin.h"
+#include "commands/defrem.h"
+#include "commands/report.h"
+#include "access/relscan.h"
+#include "utils/memutils.h"
+#include "utils/lsyscache.h"
+#include "utils/builtins.h"
+#include "utils/json.h"
+#include "utils/tuplesort.h"
+#include "utils/tuplestore.h"
+#include "storage/buffile.h"
+#include "utils/ruleutils.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+
+static int log_stmt = 0; /* log query monitored */
+static int debug = 0;
+
+/*
+ * One ProgressCtl is allocated for each backend process which is to be potentially monitored
+ * The array of progress_ctl structures is protected by ProgressLock global lock.
+ *
+ * Only one backend can be monitored at a time. This may be improved with a finer granulary
+ * using a LWLock tranche of MAX_NR_BACKENDS locks. In which case, one backend can be monitored
+ * independantly of the otther backends.
+ *
+ * The LWLock ensure that one backend can be only monitored by one other backend at a time.
+ * Other backends trying to monitor an already monitered backend will be put in queue of the LWWlock.
+ */
+typedef struct ProgressCtl {
+ ReportFormat format; /* format of the progress response to be delivered */
+
+ /*
+ * options
+ */
+ bool verbose; /* be verbose */
+ bool buffers; /* print buffer usage */
+ bool timing; /* print detailed node timing */
+
+
+ char* buf; /* progress status report in shm */
+ struct Latch* latch; /* Used by requestor to wait for backend to complete its report */
+} ProgressCtl;
+
+struct ProgressCtl* progress_ctl_array; /* Array of MaxBackends ProgressCtl */
+char* dump_buf_array; /* SHMEM buffers one for each backend */
+struct Latch* resp_latch_array; /* Array of MaxBackends latches to synchronize response from
+ * monitored backend to monitoring backend */
+
+/*
+ * No progress request unless requested.
+ */
+volatile bool progress_requested = false;
+
+/*
+ * get options and tupledesc for result
+ */
+static void ProgressGetOptions(ProgressCtl* prg, ProgressStmt* stmt, ParseState* pstate);
+static TupleDesc ProgressResultDesc(ProgressCtl* prg);
+
+/*
+ * local functions
+ */
+static void ProgressPlan(QueryDesc* query, ReportState* ps);
+static void ProgressNode(PlanState* planstate, List* ancestors,
+ const char* relationship, const char* plan_name, ReportState* ps);
+
+/*
+ * Individual nodes of interest are:
+ * - scan data: for heap or index
+ * - sort data: for any relation or tuplestore
+ * Other nodes only wait on above nodes
+ */
+static void ProgressScanBlks(ScanState* ss, ReportState* ps);
+static void ProgressScanRows(Scan* plan, PlanState* plantstate, ReportState* ps);
+static void ProgressTidScan(TidScanState* ts, ReportState* ps);
+static void ProgressLimit(LimitState* ls, ReportState* ps);
+static void ProgressCustomScan(CustomScanState* cs, ReportState* ps);
+static void ProgressIndexScan(IndexScanState* is, ReportState* ps);
+static void ProgressModifyTable(ModifyTableState * planstate, ReportState* ps);
+static void ProgressHashJoin(HashJoinState* planstate, ReportState* ps);
+static void ProgressHash(HashState* planstate, ReportState* ps);
+static void ProgressHashJoinTable(HashJoinTable hashtable, ReportState* ps);
+static void ProgressBufFileRW(BufFile* bf, ReportState* ps, unsigned long *reads, unsigned long * writes);
+static void ProgressBufFile(BufFile* bf, ReportState* ps);
+static void ProgressMaterial(MaterialState* planstate, ReportState* ps);
+static void ProgressTupleStore(Tuplestorestate* tss, ReportState* ps);
+static void ProgressAgg(AggState* planstate, ReportState* ps);
+static void ProgressSort(SortState* ss, ReportState* ps);
+static void ProgressTupleSort(Tuplesortstate* tss, ReportState* ps);
+static void dumpTapes(struct ts_report* tsr, ReportState* ps);
+
+
+Size ProgressShmemSize(void)
+{
+ Size size;
+
+ /* Must match ProgressShmemInit */
+ size = mul_size(MaxBackends, sizeof(ProgressCtl));
+ size = add_size(size, mul_size(MaxBackends, PROGRESS_AREA_SIZE));
+ size = add_size(size, mul_size(MaxBackends, sizeof(struct Latch)));
+
+ return size;
+}
+
+/*
+ * Initialize our shared memory area
+ */
+void ProgressShmemInit(void)
+{
+ bool found;
+ size_t size = 0;
+
+ /* Allocated shared latches for response to progress request */
+ size = mul_size(MaxBackends, sizeof(struct Latch));
+ resp_latch_array = ShmemInitStruct("Progress latches", size, &found);
+ if (!found) {
+ int i;
+ struct Latch* l;
+
+ l = resp_latch_array;
+ for (i = 0; i < MaxBackends; i++) {
+ InitSharedLatch(l);
+ l++;
+ }
+ }
+
+ /* Allocate SHMEM buffers for backend communication */
+ size = MaxBackends * PROGRESS_AREA_SIZE;
+ dump_buf_array = (char*) ShmemInitStruct("Backend Dump Pages", size, &found);
+ if (!found) {
+ memset(dump_buf_array, 0, size);
+ }
+
+ /* Allocate progress request meta data, one for each backend */
+ size = mul_size(MaxBackends, sizeof(ProgressCtl));
+ progress_ctl_array = ShmemInitStruct("ProgressCtl array", size, &found);
+ if (!found) {
+ int i;
+ ProgressCtl* req;
+ struct Latch* latch;
+
+ req = progress_ctl_array;
+ latch = resp_latch_array;
+ for (i = 0; i < MaxBackends; i++) {
+ /* Already zeroed above */
+ memset(req, 0, sizeof(ProgressCtl));
+
+ /* set default value */
+ req->format = REPORT_FORMAT_TEXT;
+ req->latch = latch;
+ req->buf = dump_buf_array + i * PROGRESS_AREA_SIZE;
+ req++;
+ latch++;
+ }
+ }
+
+ return;
+}
+
+/*
+ * Each backend needs to have its own progress_state
+ */
+void ProgressBackendInit(void)
+{
+ //progress_state = CreateReportState(0);
+}
+
+void ProgressBackendExit(int code, Datum arg)
+{
+ //FreeReportState(progress_state);
+}
+
+/*
+ * ProgressSendRequest:
+ * Log a request to a backend in order to fetch its progress log
+ * This is initaited by the SQL command: PROGRESS pid.
+ */
+void ProgressSendRequest(
+ ParseState* pstate,
+ ProgressStmt *stmt,
+ DestReceiver* dest)
+{
+ BackendId bid;
+ ProgressCtl* req; // Used for the request
+ TupOutputState* tstate;
+ char* buf;
+ MemoryContext local_context;
+ MemoryContext old_context;
+
+ /* Convert pid to backend_id */
+ bid = ProcPidGetBackendId(stmt->pid);
+ if (bid == InvalidBackendId) {
+ ereport(ERROR, (
+ errcode(ERRCODE_INTERVAL_FIELD_OVERFLOW),
+ errmsg("Invalid backend process pid")));
+ }
+
+ if (stmt->pid == getpid()) {
+ ereport(ERROR, (
+ errcode(ERRCODE_INTERVAL_FIELD_OVERFLOW),
+ errmsg("Cannot request status from self")));
+ }
+
+ /* Collect progress state from monitored backend str data */
+ local_context = AllocSetContextCreate(CurrentMemoryContext, "ProgressState", ALLOCSET_DEFAULT_SIZES);
+ old_context = MemoryContextSwitchTo(local_context);
+
+ /* Allocate buf for local work */
+ buf = palloc0(PROGRESS_AREA_SIZE);
+ MemoryContextSwitchTo(old_context);
+
+ /* Serialize signals/request to get the progress state of the query */
+ LWLockAcquire(ProgressLock, LW_EXCLUSIVE);
+
+ req = progress_ctl_array + bid;
+ ProgressGetOptions(req, stmt, pstate);
+
+ OwnLatch(req->latch);
+ ResetLatch(req->latch);
+
+ SendProcSignal(stmt->pid, PROCSIG_PROGRESS, bid);
+ WaitLatch(req->latch, WL_LATCH_SET, 0, WAIT_EVENT_PROGRESS);
+ DisownLatch(req->latch);
+
+ /* Fetch result and clear SHM buffer */
+ memcpy(buf, req->buf, strlen(req->buf));
+ memset(req->buf, 0, PROGRESS_AREA_SIZE);
+
+ /* End serialization */
+ LWLockRelease(ProgressLock);
+
+ /* Send response to client */
+ tstate = begin_tup_output_tupdesc(dest, ProgressResultDesc(req));
+ if (req->format == REPORT_FORMAT_TEXT)
+ do_text_output_multiline(tstate, buf);
+ else
+ do_text_output_oneline(tstate, buf);
+
+ end_tup_output(tstate);
+
+ MemoryContextDelete(local_context); // pfree(buf);
+}
+
+static void ProgressGetOptions(ProgressCtl* req, ProgressStmt* stmt, ParseState* pstate)
+{
+ unsigned short result_type;
+ ListCell* lc;
+
+ /* default options */
+ req->format = REPORT_FORMAT_TEXT;
+ req->verbose = 0;
+ req->buffers = 0;
+ req->timing = 0;
+
+ /*
+ * Check for format option
+ */
+ foreach (lc, stmt->options) {
+ DefElem* opt = (DefElem*) lfirst(lc);
+
+ if (strcmp(opt->defname, "format") == 0) {
+ char* p = defGetString(opt);
+
+ if (strcmp(p, "xml") == 0) {
+ result_type = REPORT_FORMAT_XML;
+ } else if (strcmp(p, "json") == 0) {
+ result_type = REPORT_FORMAT_JSON;
+ } else if (strcmp(p, "yaml") == 0) {
+ result_type = REPORT_FORMAT_YAML;
+ } else if (strcmp(p, "text") == 0) {
+ result_type = REPORT_FORMAT_TEXT;
+ } else {
+ ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("unrecognized value for EXPLAIN option \"%s\": \"%s\"",
+ opt->defname, p), parser_errposition(pstate, opt->location)));
+ }
+ req->format = result_type;
+ } else if (strcmp(opt->defname, "verbose") == 0) {
+ req->verbose = defGetBoolean(opt);
+ } else {
+ ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("unrecognized EXPLAIN option \"%s\"", opt->defname),
+ parser_errposition(pstate, opt->location)));
+ }
+ }
+}
+
+static TupleDesc ProgressResultDesc(
+ ProgressCtl* prg)
+{
+ TupleDesc tupdesc;
+ Oid result_type = TEXTOID;
+
+ switch(prg->format) {
+ case REPORT_FORMAT_XML:
+ result_type = XMLOID;
+ break;
+ case REPORT_FORMAT_JSON:
+ result_type = JSONOID;
+ break;
+ default:
+ result_type = TEXTOID;
+ /* No YAMLOID */
+ }
+
+ /*
+ * Need a tuple descriptor representing a single TEXT or XML column
+ */
+ tupdesc = CreateTemplateTupleDesc(1, false);
+ TupleDescInitEntry(tupdesc, (AttrNumber) 1, "PLAN PROGRESS", (Oid) result_type, -1, 0);
+
+ return tupdesc;
+}
+
+void HandleProgressSignal(void)
+{
+ progress_requested = true;
+ InterruptPending = true;
+}
+
+void HandleProgressRequest(void)
+{
+ ProgressCtl* req;
+ ReportState* ps;
+ MemoryContext oldcontext;
+ MemoryContext progress_context;
+ char* shmBufferTooShort = "shm buffer is too small";
+
+ //HOLD_INTERRUPTS();
+
+ progress_context = AllocSetContextCreate(CurrentMemoryContext, "ReportState", ALLOCSET_DEFAULT_SIZES);
+ oldcontext = MemoryContextSwitchTo(progress_context);
+
+ ps = CreateReportState(0);
+ ps->memcontext = progress_context;
+
+ Assert(ps != NULL);
+ Assert(ps->str != NULL);
+ Assert(ps->str->data != NULL);
+
+ req = progress_ctl_array + MyBackendId;
+
+ /* Clear previous content of ps->str */
+ ps->format = req->format;
+ ps->verbose = req->verbose;
+ ps->buffers = req->buffers;
+ ps->timing = req->timing;
+ resetStringInfo(ps->str);
+
+ if (MyQueryDesc == NULL) {
+ appendStringInfo(ps->str, "<idle backend>\n");
+ goto out;
+ }
+
+ if (!IsTransactionState()) {
+ appendStringInfo(ps->str, "<out of transaction>\n");
+ goto out;
+ }
+
+ if (MyQueryDesc->plannedstmt == NULL) {
+ appendStringInfo(ps->str, "<NULL planned statement>");
+ goto out;
+ }
+
+ if (MyQueryDesc->plannedstmt->commandType == CMD_UTILITY) {
+ appendStringInfo(ps->str, "<utility statement>\n");
+ goto out;
+ }
+
+ if (log_stmt) {
+ appendStringInfo(ps->str, "QUERY: %s", MyQueryDesc->sourceText);
+ appendStringInfoChar(ps->str, '\n');
+ }
+
+ ReportBeginOutput(ps);
+ ProgressPlan(MyQueryDesc, ps);
+ ReportEndOutput(ps);
+
+out:
+ /* Dump in SHM the string buffer content */
+ if (strlen(ps->str->data) < PROGRESS_AREA_SIZE) {
+ memcpy(req->buf, ps->str->data, strlen(ps->str->data));
+ } else {
+ memcpy(req->buf, shmBufferTooShort, strlen(shmBufferTooShort));
+ elog(LOG, "Needed size for buffer %d", (int) strlen(ps->str->data));
+ elog(LOG, "Buffer %s", ps->str->data);
+ }
+
+ MemoryContextSwitchTo(oldcontext);
+ MemoryContextDelete(ps->memcontext);
+
+ SetLatch(req->latch); // Notify of progress state delivery
+
+ //RESUME_INTERRUPTS();
+}
+
+static void ProgressPlan(
+ QueryDesc* query,
+ ReportState* ps)
+{
+ Bitmapset* rels_used = NULL;
+ PlanState* planstate;
+
+ /*
+ * Set up ReportState fields associated with this plan tree
+ */
+ Assert(query->plannedstmt != NULL);
+
+ /* Top level tree data */
+ ps->plan = query->plannedstmt->planTree;
+ ps->planstate = query->planstate;
+ ps->es = query->estate;
+
+ ps->pstmt = query->plannedstmt;
+ ps->rtable = query->plannedstmt->rtable;
+
+ ReportPreScanNode(query->planstate, &rels_used);
+
+ ps->rtable_names = select_rtable_names_for_explain(ps->rtable, rels_used);
+ ps->deparse_cxt = deparse_context_for_plan_rtable(ps->rtable, ps->rtable_names);
+ ps->printed_subplans = NULL;
+
+ planstate = query->planstate;
+ if (IsA(planstate, GatherState) && ((Gather*) planstate->plan)->invisible) {
+ planstate = outerPlanState(planstate);
+ }
+
+ ProgressNode(planstate, NIL, NULL, NULL, ps);
+}
+
+/*
+ * This is the main workhorse for collecting query execution progress.
+ *
+ * planstate is the current execution state in the global execution tree
+ * relationship: describes the relationship of this plan state to its parent
+ * "outer", "inner". It is null at tol level.
+ */
+static void ProgressNode(
+ PlanState* planstate,
+ List* ancestors,
+ const char* relationship,
+ const char* plan_name,
+ ReportState* ps)
+{
+ Plan* plan = planstate->plan;
+ PlanInfo info;
+ int save_indent = ps->indent;
+ bool haschildren;
+ int ret;
+
+ if (debug)
+ elog(LOG, "=> %s", nodeToString(plan));
+
+ /*
+ * 1st step: display the node type
+ */
+ ret = planNodeInfo(plan, &info);
+ if (ret != 0) {
+ elog(LOG, "unknown node type for plan");
+ }
+
+ ReportOpenGroup("Progress", relationship ? NULL : "Progress", true, ps);
+ ReportProperties(plan, &info, plan_name, relationship, ps);
+
+ /*
+ * Second step
+ */
+ switch(nodeTag(plan)) {
+ case T_SeqScan: // ScanState
+ case T_SampleScan: // ScanState
+ case T_BitmapHeapScan: // ScanState
+ case T_SubqueryScan: // ScanState
+ case T_FunctionScan: // ScanState
+ case T_ValuesScan: // ScanState
+ case T_CteScan: // ScanState
+ case T_WorkTableScan: // ScanState
+ ProgressScanRows((Scan*) plan, planstate, ps);
+ ProgressScanBlks((ScanState*) planstate, ps);
+ break;
+
+ case T_TidScan: // ScanState
+ ProgressTidScan((TidScanState*) planstate, ps);
+ ProgressScanBlks((ScanState*) planstate, ps);
+ break;
+
+ case T_Limit: // PlanState
+ ProgressLimit((LimitState*) planstate, ps);
+ break;
+
+ case T_ForeignScan: // ScanState
+ case T_CustomScan: // ScanState
+ ProgressCustomScan((CustomScanState*) planstate, ps);
+ ProgressScanRows((Scan*) plan, planstate, ps);
+ break;
+
+ case T_IndexScan: // ScanState
+ case T_IndexOnlyScan: // ScanState
+ case T_BitmapIndexScan: // ScanState
+ ProgressScanBlks((ScanState*) planstate, ps);
+ ProgressIndexScan((IndexScanState*) planstate, ps);
+ break;
+
+ case T_ModifyTable: // PlanState
+ /*
+ * Dealt below with mt_plans array of PlanState nodes
+ */
+ ProgressModifyTable((ModifyTableState *) planstate, ps);
+ break;
+
+ case T_NestLoop: // JoinState (includes a Planstate)
+ case T_MergeJoin: // JoinState (includes a Planstate)
+ /*
+ * Does not perform long ops. Only Join
+ */
+ break;
+
+ case T_HashJoin: { // JoinState (includes a Planstate)
+ /*
+ * uses a HashJoin with BufFile
+ */
+ const char* jointype;
+
+ switch (((Join*) plan)->jointype) {
+ case JOIN_INNER:
+ jointype = "Inner";
+ break;
+
+ case JOIN_LEFT:
+ jointype = "Left";
+ break;
+
+ case JOIN_FULL:
+ jointype = "Full";
+ break;
+
+ case JOIN_RIGHT:
+ jointype = "Right";
+ break;
+
+ case JOIN_SEMI:
+ jointype = "Semi";
+ break;
+
+ case JOIN_ANTI:
+ jointype = "Anti";
+ break;
+
+ default:
+ jointype = "???";
+ break;
+ }
+
+ if (ps->format == REPORT_FORMAT_TEXT) {
+ /*
+ * For historical reasons, the join type is interpolated
+ * into the node type name...
+ */
+ if (((Join*) plan)->jointype != JOIN_INNER) {
+ appendStringInfo(ps->str, " %s Join", jointype);
+ } else if (!IsA(plan, NestLoop)) {
+ appendStringInfoString(ps->str, " Join");
+ }
+ } else {
+ ReportPropertyText("Join Type", jointype, ps);
+ }
+
+ }
+
+ ProgressHashJoin((HashJoinState*) planstate, ps);
+ break;
+
+ case T_SetOp: { // PlanState
+ /*
+ * Only uses a in memory hash table
+ */
+ const char* setopcmd;
+
+ switch (((SetOp*) plan)->cmd) {
+ case SETOPCMD_INTERSECT:
+ setopcmd = "Intersect";
+ break;
+
+ case SETOPCMD_INTERSECT_ALL:
+ setopcmd = "Intersect All";
+ break;
+
+ case SETOPCMD_EXCEPT:
+ setopcmd = "Except";
+ break;
+
+ case SETOPCMD_EXCEPT_ALL:
+ setopcmd = "Except All";
+ break;
+
+ default:
+ setopcmd = "???";
+ break;
+ }
+
+ if (ps->format == REPORT_FORMAT_TEXT) {
+ appendStringInfo(ps->str, " %s", setopcmd);
+ } else {
+ ReportPropertyText("Command", setopcmd, ps);
+ }
+
+ }
+ break;
+
+ case T_Sort: // ScanState
+ ProgressSort((SortState*) planstate, ps);
+ break;
+
+ case T_Material: // ScanState
+ /*
+ * Uses: ScanState and Tuplestorestate
+ */
+ ProgressMaterial((MaterialState*) planstate, ps);
+ ProgressScanBlks((ScanState*) planstate, ps);
+ break;
+
+ case T_Group: // ScanState
+ ProgressScanBlks((ScanState*) planstate, ps);
+ break;
+
+ case T_Agg: // ScanState
+ /*
+ * Use tuplesortstate 2 times.
+ * Not reflected in child nodes
+ */
+ ProgressAgg((AggState*) planstate, ps);
+ break;
+
+ case T_WindowAgg: // ScanState
+ // Has a Tuplestorestate (field buffer)
+ ProgressTupleStore(((WindowAggState*) plan)->buffer, ps);
+ break;
+
+ case T_Unique: // PlanState
+ /*
+ * Does not store any tuple.
+ * Just fetch tuple and compare with previous one.
+ */
+ break;
+
+ case T_Gather: // PlanState
+ /*
+ * Does not store any tuple.
+ */
+ break;
+
+ case T_Hash: // PlanState
+ /*
+ * Has a potential on file hash data
+ */
+ ProgressHash((HashState*) planstate, ps);
+ break;
+
+ case T_LockRows: // PlanState
+ /*
+ * Only store tuples in memory array
+ */
+ break;
+
+ default:
+ break;
+ }
+
+ /*
+ * In text format, first line ends here
+ */
+ ReportNewLine(ps);
+
+ /*
+ * Target list
+ */
+ if (ps->verbose)
+ show_plan_tlist(planstate, ancestors, ps);
+
+ /*
+ * Controls (sort, qual, ...)
+ */
+ show_control_qual(planstate, ancestors, ps);
+
+ /*
+ * Get ready to display the child plans.
+ * Pass current PlanState as head of ancestors list for children
+ */
+ haschildren = ReportHasChildren(plan, planstate);
+ if (haschildren) {
+ ReportOpenGroup("Progress", "Progress", false, ps);
+ ancestors = lcons(planstate, ancestors);
+ }
+
+ /*
+ * initPlan-s
+ */
+ if (planstate->initPlan) {
+ ReportSubPlans(planstate->initPlan, ancestors, "InitPlan", ps, ProgressNode);
+ }
+
+ /*
+ * lefttree
+ */
+ if (outerPlanState(planstate)) {
+ ProgressNode(outerPlanState(planstate), ancestors, "Outer", NULL, ps);
+ }
+
+ /*
+ * righttree
+ */
+ if (innerPlanState(planstate)) {
+ ProgressNode(innerPlanState(planstate), ancestors, "Inner", NULL, ps);
+ }
+
+ /*
+ * special child plans
+ */
+ switch (nodeTag(plan)) {
+ case T_ModifyTable:
+ ReportMemberNodes(((ModifyTable*) plan)->plans,
+ ((ModifyTableState*) planstate)->mt_plans, ancestors, ps, ProgressNode);
+ break;
+
+ case T_Append:
+ ReportMemberNodes(((Append*) plan)->appendplans,
+ ((AppendState*) planstate)->appendplans, ancestors, ps, ProgressNode);
+ break;
+
+ case T_MergeAppend:
+ ReportMemberNodes(((MergeAppend*) plan)->mergeplans,
+ ((MergeAppendState*) planstate)->mergeplans, ancestors, ps, ProgressNode);
+ break;
+
+ case T_BitmapAnd:
+ ReportMemberNodes(((BitmapAnd*) plan)->bitmapplans,
+ ((BitmapAndState*) planstate)->bitmapplans, ancestors, ps, ProgressNode);
+ break;
+
+ case T_BitmapOr:
+ ReportMemberNodes(((BitmapOr*) plan)->bitmapplans,
+ ((BitmapOrState*) planstate)->bitmapplans, ancestors, ps, ProgressNode);
+ break;
+
+ case T_SubqueryScan:
+ ProgressNode(((SubqueryScanState*) planstate)->subplan, ancestors,
+ "Subquery", NULL, ps);
+ break;
+
+ case T_CustomScan:
+ ReportCustomChildren((CustomScanState*) planstate, ancestors, ps, ProgressNode);
+ break;
+
+ default:
+ break;
+ }
+
+ /*
+ * subPlan-s
+ */
+ if (planstate->subPlan) {
+ ReportSubPlans(planstate->subPlan, ancestors, "SubPlan", ps, ProgressNode);
+ }
+
+ /*
+ * end of child plans
+ */
+ if (haschildren) {
+ ancestors = list_delete_first(ancestors);
+ ReportCloseGroup("Progress", "Progress", false, ps);
+ }
+
+ /*
+ * in text format, undo whatever indentation we added
+ */
+ if (ps->format == REPORT_FORMAT_TEXT) {
+ ps->indent = save_indent;
+ }
+
+ ReportCloseGroup("Progress", relationship ? NULL : "Plan", true, ps);
+}
+
+
+/*********************************************************************************
+ * Indivual Progress report function for the different execution nodes starts here
+ *********************************************************************************/
+
+
+/*
+ * Monitor progress of commands by page access based on HeapScanDesc
+ */
+static
+void ProgressScanBlks(ScanState* ss, ReportState* ps)
+{
+ HeapScanDesc hsd;
+ ParallelHeapScanDesc phsd;
+ unsigned int nr_blks;
+
+ hsd = ss->ss_currentScanDesc;
+ if (hsd == NULL) {
+ return;
+ }
+
+ phsd = hsd->rs_parallel;
+
+ if (hsd->rs_parallel != NULL) {
+ if (phsd->phs_nblocks != 0 && phsd->phs_cblock != InvalidBlockNumber) {
+ if (phsd->phs_cblock > phsd->phs_startblock)
+ nr_blks = phsd->phs_cblock - phsd->phs_startblock;
+ else
+ nr_blks = phsd->phs_cblock + phsd->phs_nblocks - phsd->phs_startblock;
+
+ appendStringInfo(ps->str, " blks %u/%u %u%%",
+ nr_blks, phsd->phs_nblocks,
+ 100 * nr_blks/(phsd->phs_nblocks));
+ }
+ } else {
+ /* Not a parallel query */
+ if (hsd->rs_nblocks != 0 && hsd->rs_cblock != InvalidBlockNumber) {
+ if (hsd->rs_cblock > hsd->rs_startblock)
+ nr_blks = hsd->rs_cblock - hsd->rs_startblock;
+ else
+ nr_blks = hsd->rs_cblock + hsd->rs_nblocks - hsd->rs_startblock;
+
+ appendStringInfo(ps->str, " blks %u/%u %u%%",
+ nr_blks, hsd->rs_nblocks,
+ 100 * nr_blks/(hsd->rs_nblocks));
+ }
+ }
+}
+
+static
+void ProgressScanRows(Scan* plan, PlanState* planstate, ReportState* ps)
+{
+ Index rti;
+ RangeTblEntry* rte;
+ char* objectname;
+
+ rti = plan->scanrelid;
+ rte = rt_fetch(rti, ps->rtable);
+
+ if (ps->format == REPORT_FORMAT_TEXT)
+ appendStringInfo(ps->str, " on");
+
+ objectname = get_rel_name(rte->relid);
+ if (objectname != NULL) {
+ appendStringInfo(ps->str, " %s", quote_identifier(objectname));
+ }
+
+ if (ps->format == REPORT_FORMAT_TEXT) {
+ appendStringInfo(ps->str, " => rows %ld/%ld %d%%",
+ (long int) planstate->plan_rows, (long int) plan->plan.plan_rows,
+ (unsigned short) planstate->percent_done);
+ }
+}
+
+static
+void ProgressTidScan(TidScanState* ts, ReportState* ps)
+{
+ unsigned int percent;
+
+ if (ts == NULL) {
+ return;
+ }
+
+ if (ts->tss_NumTids == 0)
+ percent = 0;
+ else
+ percent = (unsigned short)(100 * (ts->tss_TidPtr) / (ts->tss_NumTids));
+
+ if (ps->format == REPORT_FORMAT_TEXT) {
+ appendStringInfo(ps->str, " => rows %ld/%ld %d%%",
+ (long int) ts->tss_TidPtr, (long int) ts->tss_NumTids, percent);
+ }
+}
+
+static
+void ProgressLimit(LimitState* ls, ReportState* ps)
+{
+ if (ps->format == REPORT_FORMAT_TEXT) {
+ if (ls->position == 0) {
+ appendStringInfoSpaces(ps->str, ps->indent * 2);
+ appendStringInfo(ps->str, " => offset 0%% limit 0%%");
+ return;
+ }
+
+ if (ls->position > 0 && ls->position <= ls->offset) {
+ appendStringInfoSpaces(ps->str, ps->indent * 2);
+ appendStringInfo(ps->str, " => offset %d%% limit 0%%",
+ (unsigned short)(100 * (ls->position)/(ls->offset)));
+ return;
+ }
+
+ if (ls->position > ls->offset) {
+ appendStringInfoSpaces(ps->str, ps->indent * 2);
+ appendStringInfo(ps->str, " => offset 100%% limit %d%%",
+ (unsigned short)(100 * (ls->position - ls->offset)/(ls->count)));
+ return;
+ }
+ }
+}
+
+static
+void ProgressCustomScan(CustomScanState* cs, ReportState* ps)
+{
+ if (cs->methods->ProgressCustomScan) {
+ cs->methods->ProgressCustomScan(cs, NULL, ps);
+ }
+}
+
+static
+void ProgressIndexScan(IndexScanState* is, ReportState* ps)
+{
+ PlanState planstate;
+ Plan* p;
+
+ if (is == NULL) {
+ return;
+ }
+
+ planstate = is->ss.ps;
+ p = planstate.plan;
+ if (p == NULL) {
+ return;
+ }
+
+ if (ps->format == REPORT_FORMAT_TEXT) {
+ appendStringInfo(ps->str, " => rows %ld/%ld %d%%",
+ (long int) planstate.plan_rows, (long int) p->plan_rows,
+ (unsigned short) planstate.percent_done);
+ }
+}
+
+static
+void ProgressModifyTable(ModifyTableState * mts, ReportState* ps)
+{
+ EState* es;
+
+ if (mts == NULL)
+ return;
+
+ es = mts->ps.state;
+ if (es == NULL)
+ return;
+
+ if (ps->format == REPORT_FORMAT_TEXT) {
+ appendStringInfo(ps->str, " => rows %ld", (long int) es->es_processed);
+ }
+}
+
+static
+void ProgressHash(HashState* planstate, ReportState* ps)
+{
+ if (planstate == NULL)
+ return;
+
+ ProgressHashJoinTable((HashJoinTable) planstate->hashtable, ps);
+}
+
+static
+void ProgressHashJoin(HashJoinState* planstate, ReportState* ps)
+{
+ if (planstate == NULL)
+ return;
+
+ ProgressHashJoinTable((HashJoinTable) planstate->hj_HashTable, ps);
+}
+
+/*
+ * HashJoinTable is not a node type
+ */
+static
+void ProgressHashJoinTable(HashJoinTable hashtable, ReportState* ps)
+{
+ int i;
+ unsigned long reads;
+ unsigned long writes;
+
+ unsigned long lreads;
+ unsigned long lwrites;
+
+ /* Could be used but not yet allocated */
+ if (hashtable == NULL)
+ return;
+
+ if (hashtable->nbatch <= 1)
+ return;
+
+ appendStringInfo(ps->str, " hashtable nbatch %d", hashtable->nbatch);
+
+ /* Display global reads and writes */
+ reads = 0;
+ writes = 0;
+ for (i = 0; i < hashtable->nbatch; i++) {
+ if (hashtable->innerBatchFile[i]) {
+ ProgressBufFileRW(hashtable->innerBatchFile[i], ps, &lreads, &lwrites);
+ reads += lreads;
+ writes += lwrites;
+ }
+
+ if (hashtable->outerBatchFile[i]) {
+ ProgressBufFileRW(hashtable->outerBatchFile[i], ps, &lreads, &lwrites);
+ reads += lreads;
+ writes += lwrites;
+ }
+ }
+ //appendStringInfoSpaces(ps->str, ps->indent * 2);
+ appendStringInfo(ps->str, " kbytes read/write %ld/%ld",
+ reads/1024, writes/1024);
+
+ /* Only display details if requested */
+ if (ps->verbose == false)
+ return;
+
+ ps->indent++;
+ for (i = 0; i < hashtable->nbatch; i++) {
+ appendStringInfoSpaces(ps->str, ps->indent * 2);
+ appendStringInfo(ps->str, "batch %d\n", i);
+ if (hashtable->innerBatchFile[i]) {
+ ps->indent++;
+ appendStringInfoSpaces(ps->str, ps->indent * 2);
+ appendStringInfo(ps->str, "inner ");
+ ProgressBufFile(hashtable->innerBatchFile[i], ps);
+ ps->indent--;
+ }
+
+ if (hashtable->outerBatchFile[i]) {
+ ps->indent++;
+ appendStringInfoSpaces(ps->str, ps->indent * 2);
+ appendStringInfo(ps->str, "outer ");
+ ProgressBufFile(hashtable->outerBatchFile[i], ps);
+ ps->indent--;
+ }
+ }
+ ps->indent--;
+}
+
+static
+void ProgressBufFileRW(BufFile* bf, ReportState* ps,
+ unsigned long *reads, unsigned long * writes)
+{
+ MemoryContext oldcontext;
+ struct buffile_state* bfs;
+ int i;
+
+ *reads = 0;
+ *writes = 0;
+
+ oldcontext = MemoryContextSwitchTo(ps->memcontext);
+ bfs = BufFileState(bf);
+ MemoryContextSwitchTo(oldcontext);
+
+ for (i = 0; i < bfs->numFiles; i++) {
+ *reads += bfs->bytes_read[i];
+ *writes += bfs->bytes_write[i];
+ }
+}
+
+static
+void ProgressBufFile(BufFile* bf, ReportState* ps)
+{
+ int i;
+ struct buffile_state* bfs;
+ MemoryContext oldcontext;
+
+ oldcontext = MemoryContextSwitchTo(ps->memcontext);
+ bfs = BufFileState(bf);
+ MemoryContextSwitchTo(oldcontext);
+
+ appendStringInfo(ps->str, "buffile with %d files\n", bfs->numFiles);
+ ps->indent++;
+ for (i = 0; i < bfs->numFiles; i++) {
+ appendStringInfoSpaces(ps->str, ps->indent * 2);
+ appendStringInfo(ps->str, "file %d r/w (kbytes) %d/%d\n",
+ i, bfs->bytes_read[i]/1024, bfs->bytes_write[i]/1024);
+ }
+ ps->indent--;
+}
+
+static
+void ProgressMaterial(MaterialState* planstate, ReportState* ps)
+{
+ Tuplestorestate* tss;
+
+ tss = planstate->tuplestorestate;
+ ProgressTupleStore(tss, ps);
+
+}
+/*
+ * Tuplestorestate is not a node type
+ */
+static
+void ProgressTupleStore(Tuplestorestate* tss, ReportState* ps)
+{
+ struct tss_report tssr;
+
+ if (tss == NULL)
+ return;
+
+ tuplestore_get_state(tss, &tssr);
+
+ switch (tssr.status) {
+ case TSS_INMEM:
+ appendStringInfo(ps->str, " => memory tuples write=%ld",
+ (long int) tssr.memtupcount);
+ if (tssr.memtupskipped > 0) {
+ appendStringInfo(ps->str, " skipped=%ld", (long int) tssr.memtupskipped);
+ }
+
+ appendStringInfo(ps->str, " read=%ld", (long int) tssr.memtupread);
+ if (tssr.memtupdeleted) {
+ appendStringInfo(ps->str, " deleted=%ld", (long int) tssr.memtupread);
+ }
+
+ appendStringInfo(ps->str, ")");
+ break;
+
+ case TSS_WRITEFILE:
+ case TSS_READFILE:
+ appendStringInfo(ps->str, " => file");
+ if (tssr.status == TSS_WRITEFILE)
+ appendStringInfo(ps->str, " write");
+ else
+ appendStringInfo(ps->str, " read");
+
+ appendStringInfo(ps->str, " readptrcount=%d", tssr.readptrcount);
+ appendStringInfo(ps->str, " tuples (");
+ appendStringInfo(ps->str, "write=%ld", (long int ) tssr.tuples_count);
+ if (tssr.tuples_skipped > 0) {
+ appendStringInfo(ps->str, " skipped=%ld", (long int) tssr.tuples_skipped);
+ }
+
+ appendStringInfo(ps->str, " read=%ld", (long int) tssr.tuples_read);
+ if (tssr.tuples_deleted) {
+ appendStringInfo(ps->str, " deleted=%ld", (long int ) tssr.tuples_deleted);
+ }
+
+ appendStringInfo(ps->str, ")");
+ break;
+
+ default:
+ break;
+ }
+}
+
+static
+void ProgressAgg(AggState* planstate, ReportState* ps)
+{
+ if (planstate == NULL)
+ return;
+
+ ProgressTupleSort(planstate->sort_in, ps);
+ ProgressTupleSort(planstate->sort_out, ps);
+}
+
+static
+void ProgressSort(SortState* ss, ReportState* ps)
+{
+ Assert(nodeTag(ss) == T_SortState);
+
+ if (ss == NULL)
+ return;
+
+ if (ss->tuplesortstate == NULL)
+ return;
+
+ ProgressTupleSort(ss->tuplesortstate, ps);
+}
+
+static
+void ProgressTupleSort(Tuplesortstate* tss, ReportState* ps)
+{
+ struct ts_report* tsr;
+ MemoryContext oldcontext;
+
+ oldcontext = MemoryContextSwitchTo(ps->memcontext);
+ tsr = tuplesort_get_state(tss);
+ MemoryContextSwitchTo(oldcontext);
+
+ switch (tsr->status) {
+ case TSS_INITIAL: /* Loading tuples in mem still within memory limit */
+ case TSS_BOUNDED: /* Loading tuples in mem into bounded-size heap */
+ appendStringInfo(ps->str, "=> loading tuples in memory %d",
+ tsr->memtupcount);
+ break;
+
+ case TSS_SORTEDINMEM: /* Sort completed entirely in memory */
+ appendStringInfo(ps->str, "=> sort completed in memory %d",
+ tsr->memtupcount);
+ break;
+
+ case TSS_BUILDRUNS: /* Dumping tuples to tape */
+ appendStringInfo(ps->str, "=> dumping tuples to tapes");
+ switch (tsr->sub_status) {
+ case TSSS_INIT_TAPES:
+ appendStringInfo(ps->str, " / init tapes");
+ break;
+
+ case TSSS_DUMPING_TUPLES:
+ appendStringInfo(ps->str, " / dumping tuples");
+ break;
+
+ case TSSS_SORTING_ON_TAPES:
+ appendStringInfo(ps->str, " / sorting on tapes");
+ break;
+
+ case TSSS_MERGING_TAPES:
+ appendStringInfo(ps->str, " / merging tapes");
+ break;
+ default:
+ ;
+ };
+
+ appendStringInfo(ps->str, "\n");
+ dumpTapes(tsr, ps);
+ break;
+
+ case TSS_FINALMERGE: /* Performing final merge on-the-fly */
+ appendStringInfo(ps->str, "=> final merge sort on tapes\n");
+ dumpTapes(tsr, ps);
+ break;
+
+ case TSS_SORTEDONTAPE: /* Sort completed, final run is on tape */
+ appendStringInfo(ps->str, "=> sort completed on tape");
+ switch (tsr->sub_status) {
+ case TSSS_FETCHING_FROM_TAPES:
+ appendStringInfo(ps->str, " / fetching from tapes");
+ break;
+
+ case TSSS_FETCHING_FROM_TAPES_WITH_MERGE:
+ appendStringInfo(ps->str, " / fetching from tapes with merge");
+ break;
+ default:
+ ;
+ };
+
+ appendStringInfo(ps->str, "\n");
+ dumpTapes(tsr, ps);
+ break;
+
+ default:
+ appendStringInfo(ps->str, "=> unexpected sort state\n");
+ };
+}
+
+static
+void dumpTapes(struct ts_report* tsr, ReportState* ps)
+{
+ int i;
+ int percent_effective;
+
+ if (ps->verbose) {
+ appendStringInfoSpaces(ps->str, ps->indent * 2);
+ appendStringInfo(ps->str, ": total=%d actives=%d",
+ tsr->maxTapes, tsr->activeTapes);
+
+ if (tsr->result_tape != -1)
+ appendStringInfo(ps->str, " result=%d", tsr->result_tape);
+
+ appendStringInfo(ps->str, "\n");
+
+ for (i = 0; i< tsr->maxTapes; i++) {
+ appendStringInfoSpaces(ps->str, ps->indent * 2);
+ appendStringInfo(ps->str, " -> tape %d: %d %d %d %d %d\n",
+ i, tsr->tp_fib[i], tsr->tp_runs[i], tsr->tp_dummy[i],
+ tsr->tp_read[i], tsr->tp_write[i]);
+ }
+ }
+
+ appendStringInfoSpaces(ps->str, ps->indent * 2);
+
+ if (tsr->tp_write_effective > 0)
+ percent_effective = (tsr->tp_read_effective * 100)/tsr->tp_write_effective;
+ else
+ percent_effective = 0;
+
+ appendStringInfo(ps->str, "rows r/w merge %d/%d rows r/w effective %d/%d %d%%",
+ tsr->tp_read_merge, tsr->tp_write_merge,
+ tsr->tp_read_effective, tsr->tp_write_effective,
+ percent_effective);
+}
diff --git a/src/backend/commands/report.c b/src/backend/commands/report.c
new file mode 100644
index 0000000..36ce7a2
--- /dev/null
+++ b/src/backend/commands/report.c
@@ -0,0 +1,2120 @@
+/*-------------------------------------------------------------------------
+ *
+ * report.c
+ * Display plans properties
+ *
+ * Portions Copyright (c) 1996-2017, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994-5, Regents of the University of California
+ *
+ * IDENTIFICATION
+ * src/backend/commands/report.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/xact.h"
+#include "catalog/pg_collation.h"
+#include "catalog/pg_type.h"
+#include "commands/createas.h"
+#include "commands/defrem.h"
+#include "commands/prepare.h"
+#include "commands/report.h"
+#include "executor/hashjoin.h"
+#include "foreign/fdwapi.h"
+#include "nodes/extensible.h"
+#include "nodes/nodeFuncs.h"
+#include "nodes/plannodes.h"
+#include "optimizer/clauses.h"
+#include "optimizer/planmain.h"
+#include "parser/parsetree.h"
+#include "rewrite/rewriteHandler.h"
+#include "storage/bufmgr.h"
+#include "tcop/tcopprot.h"
+#include "utils/builtins.h"
+#include "utils/json.h"
+#include "utils/lsyscache.h"
+#include "utils/rel.h"
+#include "utils/ruleutils.h"
+#include "utils/snapmgr.h"
+#include "utils/tuplesort.h"
+#include "utils/typcache.h"
+#include "utils/xml.h"
+
+
+static void ReportXMLTag(const char *tagname, int flags, ReportState *rpt);
+static void ReportJSONLineEnding(ReportState *rpt);
+static void ReportYAMLLineStarting(ReportState *rpt);
+static void escape_yaml(StringInfo buf, const char *str);
+
+
+ReportState* CreateReportState(int needed)
+{
+ StringInfo str;
+ ReportState* prg;
+
+ str = makeStringInfo();
+
+ /* default allocation size is 1024 */
+ if (needed > 1024)
+ enlargeStringInfo(str, needed);
+
+ prg = (ReportState*) palloc0(sizeof(ReportState));
+ prg->format = REPORT_FORMAT_TEXT; /* default */
+ prg->str = str;
+ prg->indent = 0;
+
+ prg->rtable = NULL;
+ prg->plan = NULL;
+
+ return prg;
+}
+
+int SetReportStateCosts(ReportState* prg, bool costs)
+{
+ if (prg != NULL) {
+ prg->costs = costs;
+ return 0;
+ } else {
+ return 1;
+ }
+}
+
+void FreeReportState(ReportState* prg)
+{
+ if (prg == NULL)
+ return;
+
+ if (prg->str != NULL) {
+ if (prg->str->data != NULL)
+ pfree(prg->str->data);
+
+ pfree(prg->str);
+ }
+
+ pfree(prg);
+}
+
+/*
+ * ReportQueryText -
+ * add a "Query Text" node that contains the actual text of the query
+ *
+ * The caller should have set up the options fields of *es, as well as
+ * initializing the output buffer es->str.
+ *
+ */
+void
+ReportQueryText(ReportState *es, QueryDesc *queryDesc)
+{
+ if (queryDesc->sourceText)
+ ReportPropertyText("Query Text", queryDesc->sourceText, es);
+}
+
+/*
+ * ReportPreScanNode -
+ * Prescan the planstate tree to identify which RTEs are referenced
+ *
+ * Adds the relid of each referenced RTE to *rels_used. The result controls
+ * which RTEs are assigned aliases by select_rtable_names_for_explain.
+ * This ensures that we don't confusingly assign un-suffixed aliases to RTEs
+ * that never appear in the EXPLAIN output (such as inheritance parents).
+ */
+bool
+ReportPreScanNode(PlanState *planstate, Bitmapset **rels_used)
+{
+ Plan *plan = planstate->plan;
+
+ switch (nodeTag(plan))
+ {
+ case T_SeqScan:
+ case T_SampleScan:
+ case T_IndexScan:
+ case T_IndexOnlyScan:
+ case T_BitmapHeapScan:
+ case T_TidScan:
+ case T_SubqueryScan:
+ case T_FunctionScan:
+ case T_ValuesScan:
+ case T_CteScan:
+ case T_WorkTableScan:
+ *rels_used = bms_add_member(*rels_used,
+ ((Scan *) plan)->scanrelid);
+ break;
+ case T_ForeignScan:
+ *rels_used = bms_add_members(*rels_used,
+ ((ForeignScan *) plan)->fs_relids);
+ break;
+ case T_CustomScan:
+ *rels_used = bms_add_members(*rels_used,
+ ((CustomScan *) plan)->custom_relids);
+ break;
+ case T_ModifyTable:
+ *rels_used = bms_add_member(*rels_used,
+ ((ModifyTable *) plan)->nominalRelation);
+ if (((ModifyTable *) plan)->exclRelRTI)
+ *rels_used = bms_add_member(*rels_used,
+ ((ModifyTable *) plan)->exclRelRTI);
+ break;
+ default:
+ break;
+ }
+
+ return planstate_tree_walker(planstate, ReportPreScanNode, rels_used);
+}
+
+/*
+ * Show the targetlist of a plan node
+ */
+void
+show_plan_tlist(PlanState *planstate, List *ancestors, ReportState *es)
+{
+ Plan *plan = planstate->plan;
+ List *context;
+ List *result = NIL;
+ bool useprefix;
+ ListCell *lc;
+
+ elog(LOG, "TGT start");
+
+ /* No work if empty tlist (this occurs eg in bitmap indexscans) */
+ if (plan->targetlist == NIL)
+ return;
+
+ /* The tlist of an Append isn't real helpful, so suppress it */
+ if (IsA(plan, Append))
+ return;
+
+ /* Likewise for MergeAppend and RecursiveUnion */
+ if (IsA(plan, MergeAppend))
+ return;
+
+ if (IsA(plan, RecursiveUnion))
+ return;
+
+ elog(LOG, "TGT step 1");
+
+ /*
+ * Likewise for ForeignScan that executes a direct INSERT/UPDATE/DELETE
+ *
+ * Note: the tlist for a ForeignScan that executes a direct INSERT/UPDATE
+ * might contain subplan output expressions that are confusing in this
+ * context. The tlist for a ForeignScan that executes a direct UPDATE/
+ * DELETE always contains "junk" target columns to identify the exact row
+ * to update or delete, which would be confusing in this context. So, we
+ * suppress it in all the cases.
+ */
+
+ if (IsA(plan, ForeignScan) &&
+ ((ForeignScan *) plan)->operation != CMD_SELECT)
+ return;
+
+ elog(LOG, "TGT step 2");
+
+ /* Set up deparsing context */
+ context = set_deparse_context_planstate(es->deparse_cxt,
+ (Node *) planstate,
+ ancestors);
+ elog(LOG, "TGT step 3");
+ useprefix = list_length(es->rtable) > 1;
+
+ /* Deparse each result column (we now include resjunk ones) */
+ foreach(lc, plan->targetlist)
+ {
+ TargetEntry *tle = (TargetEntry *) lfirst(lc);
+
+ elog(LOG, "TGT step 4");
+ result = lappend(result,
+ deparse_expression((Node *) tle->expr, context,
+ useprefix, false));
+ }
+
+ elog(LOG, "TGT step 5");
+ /* Print results */
+ ReportPropertyList("Output", result, es);
+
+ elog(LOG, "TGT end of show_plan_tlist");
+}
+
+void show_control_qual(PlanState *planstate, List *ancestors, ReportState *es)
+{
+ Plan *plan = planstate->plan;
+
+ /* quals, sort keys, etc */
+ switch (nodeTag(plan))
+ {
+ case T_IndexScan:
+ show_scan_qual(((IndexScan *) plan)->indexqualorig,
+ "Index Cond", planstate, ancestors, es);
+ if (((IndexScan *) plan)->indexqualorig)
+ show_instrumentation_count("Rows Removed by Index Recheck", 2,
+ planstate, es);
+ show_scan_qual(((IndexScan *) plan)->indexorderbyorig,
+ "Order By", planstate, ancestors, es);
+ show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
+ if (plan->qual)
+ show_instrumentation_count("Rows Removed by Filter", 1,
+ planstate, es);
+ break;
+ case T_IndexOnlyScan:
+ show_scan_qual(((IndexOnlyScan *) plan)->indexqual,
+ "Index Cond", planstate, ancestors, es);
+ if (((IndexOnlyScan *) plan)->indexqual)
+ show_instrumentation_count("Rows Removed by Index Recheck", 2,
+ planstate, es);
+ show_scan_qual(((IndexOnlyScan *) plan)->indexorderby,
+ "Order By", planstate, ancestors, es);
+ show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
+ if (plan->qual)
+ show_instrumentation_count("Rows Removed by Filter", 1,
+ planstate, es);
+ if (es->analyze)
+ ReportPropertyLong("Heap Fetches",
+ ((IndexOnlyScanState *) planstate)->ioss_HeapFetches, es);
+ break;
+ case T_BitmapIndexScan:
+ show_scan_qual(((BitmapIndexScan *) plan)->indexqualorig,
+ "Index Cond", planstate, ancestors, es);
+ break;
+ case T_BitmapHeapScan:
+ show_scan_qual(((BitmapHeapScan *) plan)->bitmapqualorig,
+ "Recheck Cond", planstate, ancestors, es);
+ if (((BitmapHeapScan *) plan)->bitmapqualorig)
+ show_instrumentation_count("Rows Removed by Index Recheck", 2,
+ planstate, es);
+ show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
+ if (plan->qual)
+ show_instrumentation_count("Rows Removed by Filter", 1,
+ planstate, es);
+ if (es->analyze)
+ show_tidbitmap_info((BitmapHeapScanState *) planstate, es);
+ break;
+ case T_SampleScan:
+ show_tablesample(((SampleScan *) plan)->tablesample,
+ planstate, ancestors, es);
+ /* FALL THRU to print additional fields the same as SeqScan */
+ case T_SeqScan:
+ case T_ValuesScan:
+ case T_CteScan:
+ case T_WorkTableScan:
+ case T_SubqueryScan:
+ show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
+ if (plan->qual)
+ show_instrumentation_count("Rows Removed by Filter", 1,
+ planstate, es);
+ break;
+ case T_Gather:
+ {
+ Gather *gather = (Gather *) plan;
+
+ show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
+ if (plan->qual)
+ show_instrumentation_count("Rows Removed by Filter", 1,
+ planstate, es);
+ ReportPropertyInteger("Workers Planned",
+ gather->num_workers, es);
+ if (es->analyze)
+ {
+ int nworkers;
+
+ nworkers = ((GatherState *) planstate)->nworkers_launched;
+ ReportPropertyInteger("Workers Launched",
+ nworkers, es);
+ }
+ if (gather->single_copy || es->format != REPORT_FORMAT_TEXT)
+ ReportPropertyBool("Single Copy", gather->single_copy, es);
+ }
+ break;
+ case T_FunctionScan:
+ if (es->verbose)
+ {
+ List *fexprs = NIL;
+ ListCell *lc;
+
+ foreach(lc, ((FunctionScan *) plan)->functions)
+ {
+ RangeTblFunction *rtfunc = (RangeTblFunction *) lfirst(lc);
+
+ fexprs = lappend(fexprs, rtfunc->funcexpr);
+ }
+ /* We rely on show_expression to insert commas as needed */
+ show_expression((Node *) fexprs,
+ "Function Call", planstate, ancestors,
+ es->verbose, es);
+ }
+ show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
+ if (plan->qual)
+ show_instrumentation_count("Rows Removed by Filter", 1,
+ planstate, es);
+ break;
+ case T_TidScan:
+ {
+ /*
+ * The tidquals list has OR semantics, so be sure to show it
+ * as an OR condition.
+ */
+ List *tidquals = ((TidScan *) plan)->tidquals;
+
+ if (list_length(tidquals) > 1)
+ tidquals = list_make1(make_orclause(tidquals));
+ show_scan_qual(tidquals, "TID Cond", planstate, ancestors, es);
+ show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
+ if (plan->qual)
+ show_instrumentation_count("Rows Removed by Filter", 1,
+ planstate, es);
+ }
+ break;
+ case T_ForeignScan:
+ show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
+ if (plan->qual)
+ show_instrumentation_count("Rows Removed by Filter", 1,
+ planstate, es);
+ show_foreignscan_info((ForeignScanState *) planstate, es);
+ break;
+ case T_CustomScan:
+ {
+ CustomScanState *css = (CustomScanState *) planstate;
+
+ show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
+ if (plan->qual)
+ show_instrumentation_count("Rows Removed by Filter", 1,
+ planstate, es);
+ if (css->methods->ExplainCustomScan)
+ css->methods->ExplainCustomScan(css, ancestors, es);
+ }
+ break;
+ case T_NestLoop:
+ show_upper_qual(((NestLoop *) plan)->join.joinqual,
+ "Join Filter", planstate, ancestors, es);
+ if (((NestLoop *) plan)->join.joinqual)
+ show_instrumentation_count("Rows Removed by Join Filter", 1,
+ planstate, es);
+ show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
+ if (plan->qual)
+ show_instrumentation_count("Rows Removed by Filter", 2,
+ planstate, es);
+ break;
+ case T_MergeJoin:
+ show_upper_qual(((MergeJoin *) plan)->mergeclauses,
+ "Merge Cond", planstate, ancestors, es);
+ show_upper_qual(((MergeJoin *) plan)->join.joinqual,
+ "Join Filter", planstate, ancestors, es);
+ if (((MergeJoin *) plan)->join.joinqual)
+ show_instrumentation_count("Rows Removed by Join Filter", 1,
+ planstate, es);
+ show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
+ if (plan->qual)
+ show_instrumentation_count("Rows Removed by Filter", 2,
+ planstate, es);
+ break;
+ case T_HashJoin:
+ show_upper_qual(((HashJoin *) plan)->hashclauses,
+ "Hash Cond", planstate, ancestors, es);
+ show_upper_qual(((HashJoin *) plan)->join.joinqual,
+ "Join Filter", planstate, ancestors, es);
+ if (((HashJoin *) plan)->join.joinqual)
+ show_instrumentation_count("Rows Removed by Join Filter", 1,
+ planstate, es);
+ show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
+ if (plan->qual)
+ show_instrumentation_count("Rows Removed by Filter", 2,
+ planstate, es);
+ break;
+ case T_Agg:
+ show_agg_keys(castNode(AggState, planstate), ancestors, es);
+ show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
+ if (plan->qual)
+ show_instrumentation_count("Rows Removed by Filter", 1,
+ planstate, es);
+ break;
+ case T_Group:
+ show_group_keys(castNode(GroupState, planstate), ancestors, es);
+ show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
+ if (plan->qual)
+ show_instrumentation_count("Rows Removed by Filter", 1,
+ planstate, es);
+ break;
+ case T_Sort:
+ show_sort_keys(castNode(SortState, planstate), ancestors, es);
+ show_sort_info(castNode(SortState, planstate), es);
+ break;
+ case T_MergeAppend:
+ show_merge_append_keys(castNode(MergeAppendState, planstate),
+ ancestors, es);
+ break;
+ case T_Result:
+ show_upper_qual((List *) ((Result *) plan)->resconstantqual,
+ "One-Time Filter", planstate, ancestors, es);
+ show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
+ if (plan->qual)
+ show_instrumentation_count("Rows Removed by Filter", 1,
+ planstate, es);
+ break;
+ case T_ModifyTable:
+ show_modifytable_info(castNode(ModifyTableState, planstate), ancestors,
+ es);
+ break;
+ case T_Hash:
+ show_hash_info(castNode(HashState, planstate), es);
+ break;
+ default:
+ break;
+ }
+}
+
+/*
+ * Show a generic expression
+ */
+void
+show_expression(Node *node, const char *qlabel,
+ PlanState *planstate, List *ancestors,
+ bool useprefix, ReportState *es)
+{
+ List *context;
+ char *exprstr;
+
+ /* Set up deparsing context */
+ context = set_deparse_context_planstate(es->deparse_cxt,
+ (Node *) planstate,
+ ancestors);
+
+ /* Deparse the expression */
+ exprstr = deparse_expression(node, context, useprefix, false);
+
+ /* And add to es->str */
+ ReportPropertyText(qlabel, exprstr, es);
+}
+
+/*
+ * Show a qualifier expression (which is a List with implicit AND semantics)
+ */
+void
+show_qual(List *qual, const char *qlabel,
+ PlanState *planstate, List *ancestors,
+ bool useprefix, ReportState *es)
+{
+ Node *node;
+
+ /* No work if empty qual */
+ if (qual == NIL)
+ return;
+
+ /* Convert AND list to explicit AND */
+ node = (Node *) make_ands_explicit(qual);
+
+ /* And show it */
+ show_expression(node, qlabel, planstate, ancestors, useprefix, es);
+}
+
+/*
+ * Show a qualifier expression for a scan plan node
+ */
+void
+show_scan_qual(List *qual, const char *qlabel,
+ PlanState *planstate, List *ancestors,
+ ReportState *es)
+{
+ bool useprefix;
+
+ useprefix = (IsA(planstate->plan, SubqueryScan) ||es->verbose);
+ show_qual(qual, qlabel, planstate, ancestors, useprefix, es);
+}
+
+/*
+ * Show a qualifier expression for an upper-level plan node
+ */
+void
+show_upper_qual(List *qual, const char *qlabel,
+ PlanState *planstate, List *ancestors,
+ ReportState *es)
+{
+ bool useprefix;
+
+ useprefix = (list_length(es->rtable) > 1 || es->verbose);
+ show_qual(qual, qlabel, planstate, ancestors, useprefix, es);
+}
+
+/*
+ * Show the sort keys for a Sort node.
+ */
+void
+show_sort_keys(SortState *sortstate, List *ancestors, ReportState *es)
+{
+ Sort *plan = (Sort *) sortstate->ss.ps.plan;
+
+ show_sort_group_keys((PlanState *) sortstate, "Sort Key",
+ plan->numCols, plan->sortColIdx,
+ plan->sortOperators, plan->collations,
+ plan->nullsFirst,
+ ancestors, es);
+}
+
+/*
+ * Likewise, for a MergeAppend node.
+ */
+void
+show_merge_append_keys(MergeAppendState *mstate, List *ancestors,
+ ReportState *es)
+{
+ MergeAppend *plan = (MergeAppend *) mstate->ps.plan;
+
+ show_sort_group_keys((PlanState *) mstate, "Sort Key",
+ plan->numCols, plan->sortColIdx,
+ plan->sortOperators, plan->collations,
+ plan->nullsFirst,
+ ancestors, es);
+}
+
+/*
+ * Show the grouping keys for an Agg node.
+ */
+void
+show_agg_keys(AggState *astate, List *ancestors,
+ ReportState *es)
+{
+ Agg *plan = (Agg *) astate->ss.ps.plan;
+
+ if (plan->numCols > 0 || plan->groupingSets)
+ {
+
+ /* The key columns refer to the tlist of the child plan */
+
+ ancestors = lcons(astate, ancestors);
+
+ if (plan->groupingSets)
+ show_grouping_sets(outerPlanState(astate), plan, ancestors, es);
+ else
+ show_sort_group_keys(outerPlanState(astate), "Group Key",
+ plan->numCols, plan->grpColIdx,
+ NULL, NULL, NULL,
+ ancestors, es);
+
+ ancestors = list_delete_first(ancestors);
+ }
+}
+
+void
+show_grouping_sets(PlanState *planstate, Agg *agg,
+ List *ancestors, ReportState *es)
+{
+ List *context;
+ bool useprefix;
+ ListCell *lc;
+
+ /* Set up deparsing context */
+ context = set_deparse_context_planstate(es->deparse_cxt,
+ (Node *) planstate,
+ ancestors);
+ useprefix = (list_length(es->rtable) > 1 || es->verbose);
+
+ ReportOpenGroup("Grouping Sets", "Grouping Sets", false, es);
+
+ show_grouping_set_keys(planstate, agg, NULL,
+ context, useprefix, ancestors, es);
+
+ foreach(lc, agg->chain)
+ {
+ Agg *aggnode = lfirst(lc);
+ Sort *sortnode = (Sort *) aggnode->plan.lefttree;
+
+ show_grouping_set_keys(planstate, aggnode, sortnode,
+ context, useprefix, ancestors, es);
+ }
+
+ ReportCloseGroup("Grouping Sets", "Grouping Sets", false, es);
+}
+
+void
+show_grouping_set_keys(PlanState *planstate,
+ Agg *aggnode, Sort *sortnode,
+ List *context, bool useprefix,
+ List *ancestors, ReportState *es)
+{
+ Plan *plan = planstate->plan;
+ char *exprstr;
+ ListCell *lc;
+ List *gsets = aggnode->groupingSets;
+ AttrNumber *keycols = aggnode->grpColIdx;
+
+ ReportOpenGroup("Grouping Set", NULL, true, es);
+
+ if (sortnode)
+ {
+ show_sort_group_keys(planstate, "Sort Key",
+ sortnode->numCols, sortnode->sortColIdx,
+ sortnode->sortOperators, sortnode->collations,
+ sortnode->nullsFirst,
+ ancestors, es);
+ if (es->format == REPORT_FORMAT_TEXT)
+ es->indent++;
+ }
+
+ ReportOpenGroup("Group Keys", "Group Keys", false, es);
+
+ foreach(lc, gsets)
+ {
+ List *result = NIL;
+ ListCell *lc2;
+
+ foreach(lc2, (List *) lfirst(lc))
+ {
+ Index i = lfirst_int(lc2);
+ AttrNumber keyresno = keycols[i];
+ TargetEntry *target = get_tle_by_resno(plan->targetlist,
+ keyresno);
+
+ if (!target)
+ elog(ERROR, "no tlist entry for key %d", keyresno);
+
+ /* Deparse the expression, showing any top-level cast */
+ exprstr = deparse_expression((Node *) target->expr, context,
+ useprefix, true);
+
+ result = lappend(result, exprstr);
+ }
+
+ if (!result && es->format == REPORT_FORMAT_TEXT)
+ ReportPropertyText("Group Key", "()", es);
+ else
+ ReportPropertyListNested("Group Key", result, es);
+ }
+
+ ReportCloseGroup("Group Keys", "Group Keys", false, es);
+
+ if (sortnode && es->format == REPORT_FORMAT_TEXT)
+ es->indent--;
+
+ ReportCloseGroup("Grouping Set", NULL, true, es);
+}
+
+/*
+ * Show the grouping keys for a Group node.
+ */
+void
+show_group_keys(GroupState *gstate, List *ancestors,
+ ReportState *es)
+{
+ Group *plan = (Group *) gstate->ss.ps.plan;
+
+ /* The key columns refer to the tlist of the child plan */
+ ancestors = lcons(gstate, ancestors);
+ show_sort_group_keys(outerPlanState(gstate), "Group Key",
+ plan->numCols, plan->grpColIdx,
+ NULL, NULL, NULL,
+ ancestors, es);
+ ancestors = list_delete_first(ancestors);
+}
+
+/*
+ * Common code to show sort/group keys, which are represented in plan nodes
+ * as arrays of targetlist indexes. If it's a sort key rather than a group
+ * key, also pass sort operators/collations/nullsFirst arrays.
+ */
+void
+show_sort_group_keys(PlanState *planstate, const char *qlabel,
+ int nkeys, AttrNumber *keycols,
+ Oid *sortOperators, Oid *collations, bool *nullsFirst,
+ List *ancestors, ReportState *es)
+{
+ Plan *plan = planstate->plan;
+ List *context;
+ List *result = NIL;
+ StringInfoData sortkeybuf;
+ bool useprefix;
+ int keyno;
+
+ if (nkeys <= 0)
+ return;
+
+ initStringInfo(&sortkeybuf);
+
+
+ /* Set up deparsing context */
+ context = set_deparse_context_planstate(es->deparse_cxt,
+ (Node *) planstate,
+ ancestors);
+ useprefix = (list_length(es->rtable) > 1 || es->verbose);
+
+ for (keyno = 0; keyno < nkeys; keyno++)
+ {
+
+ /* find key expression in tlist */
+ AttrNumber keyresno = keycols[keyno];
+ TargetEntry *target = get_tle_by_resno(plan->targetlist,
+ keyresno);
+ char *exprstr;
+
+ if (!target)
+ elog(ERROR, "no tlist entry for key %d", keyresno);
+
+ /* Deparse the expression, showing any top-level cast */
+ exprstr = deparse_expression((Node *) target->expr, context,
+ useprefix, true);
+ resetStringInfo(&sortkeybuf);
+ appendStringInfoString(&sortkeybuf, exprstr);
+
+ /* Append sort order information, if relevant */
+ if (sortOperators != NULL)
+ show_sortorder_options(&sortkeybuf,
+ (Node *) target->expr,
+ sortOperators[keyno],
+ collations[keyno],
+ nullsFirst[keyno]);
+
+ /* Emit one property-list item per sort key */
+ result = lappend(result, pstrdup(sortkeybuf.data));
+ }
+
+ ReportPropertyList(qlabel, result, es);
+}
+
+
+/*
+ * Append nondefault characteristics of the sort ordering of a column to buf
+ * (collation, direction, NULLS FIRST/LAST)
+ */
+void
+show_sortorder_options(StringInfo buf, Node *sortexpr,
+ Oid sortOperator, Oid collation, bool nullsFirst)
+{
+ Oid sortcoltype = exprType(sortexpr);
+ bool reverse = false;
+ TypeCacheEntry *typentry;
+
+ typentry = lookup_type_cache(sortcoltype,
+ TYPECACHE_LT_OPR | TYPECACHE_GT_OPR);
+
+ /*
+ * Print COLLATE if it's not default. There are some cases where this is
+ * redundant, eg if expression is a column whose declared collation is
+ * that collation, but it's hard to distinguish that here.
+ */
+ if (OidIsValid(collation) && collation != DEFAULT_COLLATION_OID)
+ {
+ char *collname = get_collation_name(collation);
+
+ if (collname == NULL)
+ elog(ERROR, "cache lookup failed for collation %u", collation);
+ appendStringInfo(buf, " COLLATE %s", quote_identifier(collname));
+ }
+
+
+ /* Print direction if not ASC, or USING if non-default sort operator */
+ if (sortOperator == typentry->gt_opr)
+ {
+ appendStringInfoString(buf, " DESC");
+ reverse = true;
+ }
+ else if (sortOperator != typentry->lt_opr)
+ {
+ char *opname = get_opname(sortOperator);
+
+ if (opname == NULL)
+ elog(ERROR, "cache lookup failed for operator %u", sortOperator);
+ appendStringInfo(buf, " USING %s", opname);
+
+ /* Determine whether operator would be considered ASC or DESC */
+ (void) get_equality_op_for_ordering_op(sortOperator, &reverse);
+ }
+
+ /* Add NULLS FIRST/LAST only if it wouldn't be default */
+ if (nullsFirst && !reverse)
+ {
+ appendStringInfoString(buf, " NULLS FIRST");
+ }
+ else if (!nullsFirst && reverse)
+ {
+ appendStringInfoString(buf, " NULLS LAST");
+ }
+}
+
+/*
+ * Show TABLESAMPLE properties
+ */
+void
+show_tablesample(TableSampleClause *tsc, PlanState *planstate,
+ List *ancestors, ReportState *es)
+{
+ List *context;
+ bool useprefix;
+ char *method_name;
+ List *params = NIL;
+ char *repeatable;
+ ListCell *lc;
+
+ /* Set up deparsing context */
+ context = set_deparse_context_planstate(es->deparse_cxt,
+ (Node *) planstate,
+ ancestors);
+ useprefix = list_length(es->rtable) > 1;
+
+ /* Get the tablesample method name */
+ method_name = get_func_name(tsc->tsmhandler);
+
+ /* Deparse parameter expressions */
+ foreach(lc, tsc->args)
+ {
+ Node *arg = (Node *) lfirst(lc);
+
+ params = lappend(params,
+ deparse_expression(arg, context,
+ useprefix, false));
+ }
+ if (tsc->repeatable)
+ repeatable = deparse_expression((Node *) tsc->repeatable, context,
+ useprefix, false);
+ else
+ repeatable = NULL;
+
+ /* Print results */
+ if (es->format == REPORT_FORMAT_TEXT)
+ {
+ bool first = true;
+
+ appendStringInfoSpaces(es->str, es->indent * 2);
+ appendStringInfo(es->str, "Sampling: %s (", method_name);
+ foreach(lc, params)
+ {
+ if (!first)
+ appendStringInfoString(es->str, ", ");
+ appendStringInfoString(es->str, (const char *) lfirst(lc));
+ first = false;
+ }
+ appendStringInfoChar(es->str, ')');
+ if (repeatable)
+ appendStringInfo(es->str, " REPEATABLE (%s)", repeatable);
+ appendStringInfoChar(es->str, '\n');
+ }
+ else
+ {
+ ReportPropertyText("Sampling Method", method_name, es);
+ ReportPropertyList("Sampling Parameters", params, es);
+ if (repeatable)
+ ReportPropertyText("Repeatable Seed", repeatable, es);
+ }
+}
+
+
+/*
+ * If it's EXPLAIN ANALYZE, show tuplesort stats for a sort node
+ */
+void
+show_sort_info(SortState *sortstate, ReportState *es)
+{
+ if (es->analyze && sortstate->sort_Done &&
+ sortstate->tuplesortstate != NULL)
+ {
+ Tuplesortstate *state = (Tuplesortstate *) sortstate->tuplesortstate;
+ const char *sortMethod;
+ const char *spaceType;
+ long spaceUsed;
+
+ tuplesort_get_stats(state, &sortMethod, &spaceType, &spaceUsed);
+
+ if (es->format == REPORT_FORMAT_TEXT)
+ {
+ appendStringInfoSpaces(es->str, es->indent * 2);
+ appendStringInfo(es->str, "Sort Method: %s %s: %ldkB\n",
+ sortMethod, spaceType, spaceUsed);
+ }
+ else
+ {
+ ReportPropertyText("Sort Method", sortMethod, es);
+ ReportPropertyLong("Sort Space Used", spaceUsed, es);
+ ReportPropertyText("Sort Space Type", spaceType, es);
+ }
+ }
+}
+
+/*
+ * Show information on hash buckets/batches.
+ */
+void
+show_hash_info(HashState *hashstate, ReportState *es)
+{
+ HashJoinTable hashtable;
+
+ hashtable = hashstate->hashtable;
+
+ if (hashtable)
+ {
+ long spacePeakKb = (hashtable->spacePeak + 1023) / 1024;
+
+ if (es->format != REPORT_FORMAT_TEXT)
+ {
+ ReportPropertyLong("Hash Buckets", hashtable->nbuckets, es);
+ ReportPropertyLong("Original Hash Buckets",
+ hashtable->nbuckets_original, es);
+ ReportPropertyLong("Hash Batches", hashtable->nbatch, es);
+ ReportPropertyLong("Original Hash Batches",
+ hashtable->nbatch_original, es);
+ ReportPropertyLong("Peak Memory Usage", spacePeakKb, es);
+ }
+ else if (hashtable->nbatch_original != hashtable->nbatch ||
+ hashtable->nbuckets_original != hashtable->nbuckets)
+ {
+ appendStringInfoSpaces(es->str, es->indent * 2);
+ appendStringInfo(es->str,
+ "Buckets: %d (originally %d) Batches: %d (originally %d) Memory Usage: %ldkB\n",
+ hashtable->nbuckets,
+ hashtable->nbuckets_original,
+ hashtable->nbatch,
+ hashtable->nbatch_original,
+ spacePeakKb);
+ }
+ else
+ {
+ appendStringInfoSpaces(es->str, es->indent * 2);
+ appendStringInfo(es->str,
+ "Buckets: %d Batches: %d Memory Usage: %ldkB\n",
+ hashtable->nbuckets, hashtable->nbatch,
+ spacePeakKb);
+ }
+ }
+}
+
+/*
+ * If it's EXPLAIN ANALYZE, show exact/lossy pages for a BitmapHeapScan node
+ */
+void
+show_tidbitmap_info(BitmapHeapScanState *planstate, ReportState *es)
+{
+ if (es->format != REPORT_FORMAT_TEXT)
+ {
+ ReportPropertyLong("Exact Heap Blocks", planstate->exact_pages, es);
+ ReportPropertyLong("Lossy Heap Blocks", planstate->lossy_pages, es);
+ }
+ else
+ {
+ if (planstate->exact_pages > 0 || planstate->lossy_pages > 0)
+ {
+ appendStringInfoSpaces(es->str, es->indent * 2);
+ appendStringInfoString(es->str, "Heap Blocks:");
+ if (planstate->exact_pages > 0)
+ appendStringInfo(es->str, " exact=%ld", planstate->exact_pages);
+ if (planstate->lossy_pages > 0)
+ appendStringInfo(es->str, " lossy=%ld", planstate->lossy_pages);
+ appendStringInfoChar(es->str, '\n');
+ }
+ }
+}
+
+/*
+ * If it's EXPLAIN ANALYZE, show instrumentation information for a plan node
+ *
+ * "which" identifies which instrumentation counter to print
+ */
+void
+show_instrumentation_count(const char *qlabel, int which,
+ PlanState *planstate, ReportState *es)
+{
+ double nfiltered;
+ double nloops;
+
+ if (!es->analyze || !planstate->instrument)
+ return;
+
+ if (which == 2)
+ nfiltered = planstate->instrument->nfiltered2;
+ else
+ nfiltered = planstate->instrument->nfiltered1;
+ nloops = planstate->instrument->nloops;
+
+ /* In text mode, suppress zero counts; they're not interesting enough */
+ if (nfiltered > 0 || es->format != REPORT_FORMAT_TEXT)
+ {
+ if (nloops > 0)
+ ReportPropertyFloat(qlabel, nfiltered / nloops, 0, es);
+ else
+ ReportPropertyFloat(qlabel, 0.0, 0, es);
+ }
+}
+
+/*
+ * Show extra information for a ForeignScan node.
+ */
+void
+show_foreignscan_info(ForeignScanState *fsstate, ReportState *es)
+{
+ FdwRoutine *fdwroutine = fsstate->fdwroutine;
+
+ /* Let the FDW emit whatever fields it wants */
+ if (((ForeignScan *) fsstate->ss.ps.plan)->operation != CMD_SELECT)
+ {
+ if (fdwroutine->ExplainDirectModify != NULL)
+ fdwroutine->ExplainDirectModify(fsstate, es);
+ }
+ else
+ {
+ if (fdwroutine->ExplainForeignScan != NULL)
+ fdwroutine->ExplainForeignScan(fsstate, es);
+ }
+}
+
+/*
+ * Fetch the name of an index in an EXPLAIN
+ *
+ * We allow plugins to get control here so that plans involving hypothetical
+ * indexes can be explained.
+ */
+const char *
+explain_get_index_name(Oid indexId)
+{
+ const char *result;
+
+ if (explain_get_index_name_hook)
+ result = (*explain_get_index_name_hook) (indexId);
+ else
+ result = NULL;
+ if (result == NULL)
+ {
+
+ /* default behavior: look in the catalogs and quote it */
+ result = get_rel_name(indexId);
+ if (result == NULL)
+ elog(ERROR, "cache lookup failed for index %u", indexId);
+ result = quote_identifier(result);
+ }
+ return result;
+}
+
+/*
+ * Add some additional details about an IndexScan or IndexOnlyScan
+ */
+void
+ReportIndexScanDetails(Oid indexid, ScanDirection indexorderdir,
+ ReportState *es)
+{
+ const char *indexname = explain_get_index_name(indexid);
+
+ if (es->format == REPORT_FORMAT_TEXT)
+ {
+ if (ScanDirectionIsBackward(indexorderdir))
+ appendStringInfoString(es->str, " Backward");
+ appendStringInfo(es->str, " using %s", indexname);
+ }
+ else
+ {
+ const char *scandir;
+
+ switch (indexorderdir)
+ {
+ case BackwardScanDirection:
+ scandir = "Backward";
+ break;
+ case NoMovementScanDirection:
+ scandir = "NoMovement";
+ break;
+ case ForwardScanDirection:
+ scandir = "Forward";
+ break;
+ default:
+ scandir = "???";
+ break;
+ }
+ ReportPropertyText("Scan Direction", scandir, es);
+ ReportPropertyText("Index Name", indexname, es);
+ }
+}
+
+/*
+ * Show the target of a Scan node
+ */
+void
+ReportScanTarget(Scan *plan, ReportState *es)
+{
+ ReportTargetRel((Plan *) plan, plan->scanrelid, es);
+}
+
+/*
+ * Show the target of a ModifyTable node
+ *
+ * Here we show the nominal target (ie, the relation that was named in the
+ * original query). If the actual target(s) is/are different, we'll show them
+ * in show_modifytable_info().
+ */
+void
+ReportModifyTarget(ModifyTable *plan, ReportState *es)
+{
+ ReportTargetRel((Plan *) plan, plan->nominalRelation, es);
+}
+
+/*
+ * Show the target relation of a scan or modify node
+ */
+void
+ReportTargetRel(Plan *plan, Index rti, ReportState *es)
+{
+ char *objectname = NULL;
+ char *namespace = NULL;
+ const char *objecttag = NULL;
+ RangeTblEntry *rte;
+ char *refname;
+
+ rte = rt_fetch(rti, es->rtable);
+ refname = (char *) list_nth(es->rtable_names, rti - 1);
+ if (refname == NULL)
+ refname = rte->eref->aliasname;
+
+ switch (nodeTag(plan))
+ {
+ case T_SeqScan:
+ case T_SampleScan:
+ case T_IndexScan:
+ case T_IndexOnlyScan:
+ case T_BitmapHeapScan:
+ case T_TidScan:
+ case T_ForeignScan:
+ case T_CustomScan:
+ case T_ModifyTable:
+
+ /* Assert it's on a real relation */
+ Assert(rte->rtekind == RTE_RELATION);
+ objectname = get_rel_name(rte->relid);
+ if (es->verbose)
+ namespace = get_namespace_name(get_rel_namespace(rte->relid));
+ objecttag = "Relation Name";
+ break;
+ case T_FunctionScan:
+ {
+ FunctionScan *fscan = (FunctionScan *) plan;
+
+ /* Assert it's on a RangeFunction */
+ Assert(rte->rtekind == RTE_FUNCTION);
+
+ /*
+ * If the expression is still a function call of a single
+ * function, we can get the real name of the function.
+ * Otherwise, punt. (Even if it was a single function call
+ * originally, the optimizer could have simplified it away.)
+ */
+ if (list_length(fscan->functions) == 1)
+ {
+ RangeTblFunction *rtfunc = (RangeTblFunction *) linitial(fscan->functions);
+
+ if (IsA(rtfunc->funcexpr, FuncExpr))
+ {
+ FuncExpr *funcexpr = (FuncExpr *) rtfunc->funcexpr;
+ Oid funcid = funcexpr->funcid;
+
+ objectname = get_func_name(funcid);
+ if (es->verbose)
+ namespace =
+ get_namespace_name(get_func_namespace(funcid));
+ }
+ }
+ objecttag = "Function Name";
+ }
+ break;
+ case T_ValuesScan:
+ Assert(rte->rtekind == RTE_VALUES);
+ break;
+ case T_CteScan:
+
+ /* Assert it's on a non-self-reference CTE */
+ Assert(rte->rtekind == RTE_CTE);
+ Assert(!rte->self_reference);
+ objectname = rte->ctename;
+ objecttag = "CTE Name";
+ break;
+ case T_WorkTableScan:
+
+ /* Assert it's on a self-reference CTE */
+ Assert(rte->rtekind == RTE_CTE);
+ Assert(rte->self_reference);
+ objectname = rte->ctename;
+ objecttag = "CTE Name";
+ break;
+ default:
+ break;
+ }
+
+ if (es->format == REPORT_FORMAT_TEXT)
+ {
+ appendStringInfoString(es->str, " on");
+ if (namespace != NULL)
+ appendStringInfo(es->str, " %s.%s", quote_identifier(namespace),
+ quote_identifier(objectname));
+ else if (objectname != NULL)
+ appendStringInfo(es->str, " %s", quote_identifier(objectname));
+ if (objectname == NULL || strcmp(refname, objectname) != 0)
+ appendStringInfo(es->str, " %s", quote_identifier(refname));
+ }
+ else
+ {
+ if (objecttag != NULL && objectname != NULL)
+ ReportPropertyText(objecttag, objectname, es);
+ if (namespace != NULL)
+ ReportPropertyText("Schema", namespace, es);
+ ReportPropertyText("Alias", refname, es);
+ }
+}
+
+/*
+ * Show extra information for a ModifyTable node
+ *
+ * We have three objectives here. First, if there's more than one target
+ * table or it's different from the nominal target, identify the actual
+ * target(s). Second, give FDWs a chance to display extra info about foreign
+ * targets. Third, show information about ON CONFLICT.
+ */
+void
+show_modifytable_info(ModifyTableState *mtstate, List *ancestors,
+ ReportState *es)
+{
+ ModifyTable *node = (ModifyTable *) mtstate->ps.plan;
+ const char *operation;
+ const char *foperation;
+ bool labeltargets;
+ int j;
+ List *idxNames = NIL;
+ ListCell *lst;
+
+ switch (node->operation)
+ {
+ case CMD_INSERT:
+ operation = "Insert";
+ foperation = "Foreign Insert";
+ break;
+ case CMD_UPDATE:
+ operation = "Update";
+ foperation = "Foreign Update";
+ break;
+ case CMD_DELETE:
+ operation = "Delete";
+ foperation = "Foreign Delete";
+ break;
+ default:
+ operation = "???";
+ foperation = "Foreign ???";
+ break;
+ }
+
+ /* Should we explicitly label target relations? */
+ labeltargets = (mtstate->mt_nplans > 1 ||
+ (mtstate->mt_nplans == 1 &&
+ mtstate->resultRelInfo->ri_RangeTableIndex != node->nominalRelation));
+
+ if (labeltargets)
+ ReportOpenGroup("Target Tables", "Target Tables", false, es);
+
+ for (j = 0; j < mtstate->mt_nplans; j++)
+ {
+ ResultRelInfo *resultRelInfo = mtstate->resultRelInfo + j;
+ FdwRoutine *fdwroutine = resultRelInfo->ri_FdwRoutine;
+
+ if (labeltargets)
+ {
+
+ /* Open a group for this target */
+ ReportOpenGroup("Target Table", NULL, true, es);
+
+ /*
+ * In text mode, decorate each target with operation type, so that
+ * ReportTargetRel's output of " on foo" will read nicely.
+ */
+ if (es->format == REPORT_FORMAT_TEXT)
+ {
+ appendStringInfoSpaces(es->str, es->indent * 2);
+ appendStringInfoString(es->str,
+ fdwroutine ? foperation : operation);
+ }
+
+ /* Identify target */
+ ReportTargetRel((Plan *) node,
+ resultRelInfo->ri_RangeTableIndex,
+ es);
+
+ if (es->format == REPORT_FORMAT_TEXT)
+ {
+ appendStringInfoChar(es->str, '\n');
+ es->indent++;
+ }
+ }
+
+ /* Give FDW a chance if needed */
+ if (!resultRelInfo->ri_usesFdwDirectModify &&
+ fdwroutine != NULL &&
+ fdwroutine->ExplainForeignModify != NULL)
+ {
+ List *fdw_private = (List *) list_nth(node->fdwPrivLists, j);
+
+ fdwroutine->ExplainForeignModify(mtstate,
+ resultRelInfo,
+ fdw_private,
+ j,
+ es);
+ }
+
+ if (labeltargets)
+ {
+
+ /* Undo the indentation we added in text format */
+ if (es->format == REPORT_FORMAT_TEXT)
+ es->indent--;
+
+ /* Close the group */
+ ReportCloseGroup("Target Table", NULL, true, es);
+ }
+ }
+
+ /* Gather names of ON CONFLICT arbiter indexes */
+ foreach(lst, node->arbiterIndexes)
+ {
+ char *indexname = get_rel_name(lfirst_oid(lst));
+
+ idxNames = lappend(idxNames, indexname);
+ }
+
+ if (node->onConflictAction != ONCONFLICT_NONE)
+ {
+ ReportProperty("Conflict Resolution",
+ node->onConflictAction == ONCONFLICT_NOTHING ?
+ "NOTHING" : "UPDATE",
+ false, es);
+
+ /*
+ * Don't display arbiter indexes at all when DO NOTHING variant
+ * implicitly ignores all conflicts
+ */
+ if (idxNames)
+ ReportPropertyList("Conflict Arbiter Indexes", idxNames, es);
+
+ /* ON CONFLICT DO UPDATE WHERE qual is specially displayed */
+ if (node->onConflictWhere)
+ {
+ show_upper_qual((List *) node->onConflictWhere, "Conflict Filter",
+ &mtstate->ps, ancestors, es);
+ show_instrumentation_count("Rows Removed by Conflict Filter", 1, &mtstate->ps, es);
+ }
+
+ /* EXPLAIN ANALYZE display of actual outcome for each tuple proposed */
+ if (es->analyze && mtstate->ps.instrument)
+ {
+ double total;
+ double insert_path;
+ double other_path;
+
+ InstrEndLoop(mtstate->mt_plans[0]->instrument);
+
+ /* count the number of source rows */
+ total = mtstate->mt_plans[0]->instrument->ntuples;
+ other_path = mtstate->ps.instrument->nfiltered2;
+ insert_path = total - other_path;
+
+ ReportPropertyFloat("Tuples Inserted", insert_path, 0, es);
+ ReportPropertyFloat("Conflicting Tuples", other_path, 0, es);
+ }
+ }
+
+ if (labeltargets)
+ ReportCloseGroup("Target Tables", "Target Tables", false, es);
+}
+
+bool ReportHasChildren(Plan* plan, PlanState* planstate)
+{
+ bool haschildren;
+
+ haschildren = planstate->initPlan
+ || outerPlanState(planstate)
+ || innerPlanState(planstate)
+ || IsA(plan, ModifyTable)
+ || IsA(plan, Append)
+ || IsA(plan, MergeAppend)
+ || IsA(plan, BitmapAnd)
+ || IsA(plan, BitmapOr)
+ || IsA(plan, SubqueryScan)
+ || (IsA(planstate, CustomScanState) && ((CustomScanState*) planstate)->custom_ps != NIL)
+ || planstate->subPlan;
+
+ return haschildren;
+}
+
+/*
+ * Explain the constituent plans of a ModifyTable, Append, MergeAppend,
+ * BitmapAnd, or BitmapOr node.
+ *
+ * The ancestors list should already contain the immediate parent of these
+ * plans.
+ *
+ * Note: we don't actually need to examine the Plan list members, but
+ * we need the list in order to determine the length of the PlanState array.
+ */
+void
+ReportMemberNodes(List *plans, PlanState **planstates,
+ List *ancestors, ReportState *es,
+ functionNode fn)
+{
+ int nplans = list_length(plans);
+ int j;
+
+ for (j = 0; j < nplans; j++)
+ (*fn)(planstates[j], ancestors, "Member", NULL, es);
+}
+
+/*
+ * Explain a list of SubPlans (or initPlans, which also use SubPlan nodes).
+ *
+ * The ancestors list should already contain the immediate parent of these
+ * SubPlanStates.
+ */
+void
+ReportSubPlans(List *plans, List *ancestors, const char *relationship,
+ ReportState *es, functionNode fn)
+{
+ ListCell *lst;
+
+ foreach(lst, plans)
+ {
+ SubPlanState *sps = (SubPlanState *) lfirst(lst);
+ SubPlan *sp = (SubPlan *) sps->subplan;
+
+ /*
+ * There can be multiple SubPlan nodes referencing the same physical
+ * subplan (same plan_id, which is its index in PlannedStmt.subplans).
+ * We should print a subplan only once, so track which ones we already
+ * printed. This state must be global across the plan tree, since the
+ * duplicate nodes could be in different plan nodes, eg both a bitmap
+ * indexscan's indexqual and its parent heapscan's recheck qual. (We
+ * do not worry too much about which plan node we show the subplan as
+ * attached to in such cases.)
+ */
+ if (bms_is_member(sp->plan_id, es->printed_subplans))
+ continue;
+
+ es->printed_subplans = bms_add_member(es->printed_subplans, sp->plan_id);
+ (*fn)(sps->planstate, ancestors, relationship, sp->plan_name, es);
+ }
+}
+
+
+/*
+ * Explain a list of children of a CustomScan.
+ */
+void
+ReportCustomChildren(CustomScanState *css, List *ancestors, ReportState *es, functionNode fn)
+{
+ ListCell *cell;
+ const char *label =
+ (list_length(css->custom_ps) != 1 ? "children" : "child");
+
+ foreach(cell, css->custom_ps)
+ (*fn)((PlanState *) lfirst(cell), ancestors, label, NULL, es);
+}
+
+/*
+ * Explain a property, such as sort keys or targets, that takes the form of
+ * a list of unlabeled items. "data" is a list of C strings.
+ */
+void
+ReportPropertyList(const char *qlabel, List *data, ReportState *rpt)
+{
+ ListCell *lc;
+ bool first = true;
+
+ switch (rpt->format)
+ {
+ case REPORT_FORMAT_TEXT:
+ appendStringInfoSpaces(rpt->str, rpt->indent * 2);
+ appendStringInfo(rpt->str, "%s: ", qlabel);
+ foreach(lc, data)
+ {
+ if (!first)
+ appendStringInfoString(rpt->str, ", ");
+ appendStringInfoString(rpt->str, (const char *) lfirst(lc));
+ first = false;
+ }
+ appendStringInfoChar(rpt->str, '\n');
+ break;
+
+ case REPORT_FORMAT_XML:
+ ReportXMLTag(qlabel, X_OPENING, rpt);
+ foreach(lc, data)
+ {
+ char *str;
+
+ appendStringInfoSpaces(rpt->str, rpt->indent * 2 + 2);
+ appendStringInfoString(rpt->str, "<Item>");
+ str = escape_xml((const char *) lfirst(lc));
+ appendStringInfoString(rpt->str, str);
+ pfree(str);
+ appendStringInfoString(rpt->str, "</Item>\n");
+ }
+ ReportXMLTag(qlabel, X_CLOSING, rpt);
+ break;
+
+ case REPORT_FORMAT_JSON:
+ ReportJSONLineEnding(rpt);
+ appendStringInfoSpaces(rpt->str, rpt->indent * 2);
+ escape_json(rpt->str, qlabel);
+ appendStringInfoString(rpt->str, ": [");
+ foreach(lc, data)
+ {
+ if (!first)
+ appendStringInfoString(rpt->str, ", ");
+ escape_json(rpt->str, (const char *) lfirst(lc));
+ first = false;
+ }
+ appendStringInfoChar(rpt->str, ']');
+ break;
+
+ case REPORT_FORMAT_YAML:
+ ReportYAMLLineStarting(rpt);
+ appendStringInfo(rpt->str, "%s: ", qlabel);
+ foreach(lc, data)
+ {
+ appendStringInfoChar(rpt->str, '\n');
+ appendStringInfoSpaces(rpt->str, rpt->indent * 2 + 2);
+ appendStringInfoString(rpt->str, "- ");
+ escape_yaml(rpt->str, (const char *) lfirst(lc));
+ }
+ break;
+ }
+}
+
+/*
+ * Explain a property that takes the form of a list of unlabeled items within
+ * another list. "data" is a list of C strings.
+ */
+void
+ReportPropertyListNested(const char *qlabel, List *data, ReportState *rpt)
+{
+ ListCell *lc;
+ bool first = true;
+
+ switch (rpt->format)
+ {
+ case REPORT_FORMAT_TEXT:
+ case REPORT_FORMAT_XML:
+ ReportPropertyList(qlabel, data, rpt);
+ return;
+
+ case REPORT_FORMAT_JSON:
+ ReportJSONLineEnding(rpt);
+ appendStringInfoSpaces(rpt->str, rpt->indent * 2);
+ appendStringInfoChar(rpt->str, '[');
+ foreach(lc, data)
+ {
+ if (!first)
+ appendStringInfoString(rpt->str, ", ");
+ escape_json(rpt->str, (const char *) lfirst(lc));
+ first = false;
+ }
+ appendStringInfoChar(rpt->str, ']');
+ break;
+
+ case REPORT_FORMAT_YAML:
+ ReportYAMLLineStarting(rpt);
+ appendStringInfoString(rpt->str, "- [");
+ foreach(lc, data)
+ {
+ if (!first)
+ appendStringInfoString(rpt->str, ", ");
+ escape_yaml(rpt->str, (const char *) lfirst(lc));
+ first = false;
+ }
+ appendStringInfoChar(rpt->str, ']');
+ break;
+ }
+}
+
+/*
+ * Explain a simple property.
+ *
+ * If "numeric" is true, the value is a number (or other value that
+ * doesn't need quoting in JSON).
+ *
+ * This usually should not be invoked directly, but via one of the datatype
+ * specific routines ReportPropertyText, ReportPropertyInteger, etc.
+ */
+void
+ReportProperty(const char *qlabel, const char *value, bool numeric,
+ ReportState *rpt)
+{
+ switch (rpt->format)
+ {
+ case REPORT_FORMAT_TEXT:
+ appendStringInfoSpaces(rpt->str, rpt->indent * 2);
+ appendStringInfo(rpt->str, "%s: %s\n", qlabel, value);
+ break;
+
+ case REPORT_FORMAT_XML:
+ {
+ char *str;
+
+ appendStringInfoSpaces(rpt->str, rpt->indent * 2);
+ ReportXMLTag(qlabel, X_OPENING | X_NOWHITESPACE, rpt);
+ str = escape_xml(value);
+ appendStringInfoString(rpt->str, str);
+ pfree(str);
+ ReportXMLTag(qlabel, X_CLOSING | X_NOWHITESPACE, rpt);
+ appendStringInfoChar(rpt->str, '\n');
+ }
+ break;
+
+ case REPORT_FORMAT_JSON:
+ ReportJSONLineEnding(rpt);
+ appendStringInfoSpaces(rpt->str, rpt->indent * 2);
+ escape_json(rpt->str, qlabel);
+ appendStringInfoString(rpt->str, ": ");
+ if (numeric)
+ appendStringInfoString(rpt->str, value);
+ else
+ escape_json(rpt->str, value);
+ break;
+
+ case REPORT_FORMAT_YAML:
+ ReportYAMLLineStarting(rpt);
+ appendStringInfo(rpt->str, "%s: ", qlabel);
+ if (numeric)
+ appendStringInfoString(rpt->str, value);
+ else
+ escape_yaml(rpt->str, value);
+ break;
+ }
+}
+
+void ReportProperties(Plan* plan, PlanInfo* info, const char* plan_name,
+ const char* relationship, ReportState* rpt)
+{
+ if (rpt->format == REPORT_FORMAT_TEXT) {
+ if (plan_name) {
+ appendStringInfoSpaces(rpt->str, rpt->indent * 2);
+ appendStringInfo(rpt->str, "%s\n", plan_name);
+ rpt->indent++;
+ }
+
+ if (rpt->indent) {
+ appendStringInfoSpaces(rpt->str, rpt->indent * 2);
+ appendStringInfoString(rpt->str, "-> ");
+ rpt->indent += 2;
+ }
+
+ if (plan->parallel_aware) {
+ appendStringInfoString(rpt->str, "Parallel ");
+ }
+
+ appendStringInfoString(rpt->str, info->pname);
+ rpt->indent++;
+
+ } else {
+ ReportPropertyText("Node Type", info->sname, rpt);
+
+ if (info->strategy) {
+ ReportPropertyText("Strategy", info->strategy, rpt);
+ }
+
+ if (info->partialmode) {
+ ReportPropertyText("Partial Mode", info->partialmode, rpt);
+ }
+
+ if (info->operation) {
+ ReportPropertyText("Operation", info->operation, rpt);
+ }
+
+ if (relationship) {
+ ReportPropertyText("Parent Relationship", relationship, rpt);
+ }
+
+ if (plan_name) {
+ ReportPropertyText("Subplan Name", plan_name, rpt);
+ }
+
+ if (info->custom_name) {
+ ReportPropertyText("Custom Plan Provider", info->custom_name, rpt);
+ }
+
+ ReportPropertyBool("Parallel Aware", plan->parallel_aware, rpt);
+ }
+}
+
+/*
+ * Explain a string-valued property.
+ */
+void
+ReportPropertyText(const char *qlabel, const char *value, ReportState* rpt)
+{
+ ReportProperty(qlabel, value, false, rpt);
+}
+
+/*
+ * Explain an integer-valued property.
+ */
+void
+ReportPropertyInteger(const char *qlabel, int value, ReportState *rpt)
+{
+ char buf[32];
+
+ snprintf(buf, sizeof(buf), "%d", value);
+ ReportProperty(qlabel, buf, true, rpt);
+}
+
+/*
+ * Explain a long-integer-valued property.
+ */
+void
+ReportPropertyLong(const char *qlabel, long value, ReportState *rpt)
+{
+ char buf[32];
+
+ snprintf(buf, sizeof(buf), "%ld", value);
+ ReportProperty(qlabel, buf, true, rpt);
+}
+
+/*
+ * Explain a float-valued property, using the specified number of
+ * fractional digits.
+ */
+void
+ReportPropertyFloat(const char *qlabel, double value, int ndigits,
+ ReportState *rpt)
+{
+ char buf[256];
+
+ snprintf(buf, sizeof(buf), "%.*f", ndigits, value);
+ ReportProperty(qlabel, buf, true, rpt);
+}
+
+/*
+ * Explain a bool-valued property.
+ */
+void
+ReportPropertyBool(const char *qlabel, bool value, ReportState *rpt)
+{
+ ReportProperty(qlabel, value ? "true" : "false", true, rpt);
+}
+
+/*
+ * Open a group of related objects.
+ *
+ * objtype is the type of the group object, labelname is its label within
+ * a containing object (if any).
+ *
+ * If labeled is true, the group members will be labeled properties,
+ * while if it's false, they'll be unlabeled objects.
+ */
+void
+ReportOpenGroup(const char *objtype, const char *labelname,
+ bool labeled, ReportState *rpt)
+{
+ switch (rpt->format)
+ {
+ case REPORT_FORMAT_TEXT:
+
+ /* nothing to do */
+ break;
+
+ case REPORT_FORMAT_XML:
+ ReportXMLTag(objtype, X_OPENING, rpt);
+ rpt->indent++;
+ break;
+
+ case REPORT_FORMAT_JSON:
+ ReportJSONLineEnding(rpt);
+ appendStringInfoSpaces(rpt->str, 2 * rpt->indent);
+ if (labelname)
+ {
+ escape_json(rpt->str, labelname);
+ appendStringInfoString(rpt->str, ": ");
+ }
+ appendStringInfoChar(rpt->str, labeled ? '{' : '[');
+
+ /*
+ * In JSON format, the grouping_stack is an integer list. 0 means
+ * we've emitted nothing at this grouping level, 1 means we've
+ * emitted something (and so the next item needs a comma). See
+ * ReportJSONLineEnding().
+ */
+ rpt->grouping_stack = lcons_int(0, rpt->grouping_stack);
+ rpt->indent++;
+ break;
+
+ case REPORT_FORMAT_YAML:
+
+ /*
+ * In YAML format, the grouping stack is an integer list. 0 means
+ * we've emitted nothing at this grouping level AND this grouping
+ * level is unlabelled and must be marked with "- ". See
+ * PlanYAMLLineStarting().
+ */
+ ReportYAMLLineStarting(rpt);
+ if (labelname)
+ {
+ appendStringInfo(rpt->str, "%s: ", labelname);
+ rpt->grouping_stack = lcons_int(1, rpt->grouping_stack);
+ }
+ else
+ {
+ appendStringInfoString(rpt->str, "- ");
+ rpt->grouping_stack = lcons_int(0, rpt->grouping_stack);
+ }
+ rpt->indent++;
+ break;
+ }
+}
+
+/*
+ * Close a group of related objects.
+ * Parameters must match the corresponding ReportOpenGroup call.
+ */
+void
+ReportCloseGroup(const char *objtype, const char *labelname,
+ bool labeled, ReportState *rpt)
+{
+ switch (rpt->format)
+ {
+ case REPORT_FORMAT_TEXT:
+ /* nothing to do */
+ break;
+
+ case REPORT_FORMAT_XML:
+ rpt->indent--;
+ ReportXMLTag(objtype, X_CLOSING, rpt);
+ break;
+
+ case REPORT_FORMAT_JSON:
+ rpt->indent--;
+ appendStringInfoChar(rpt->str, '\n');
+ appendStringInfoSpaces(rpt->str, 2 * rpt->indent);
+ appendStringInfoChar(rpt->str, labeled ? '}' : ']');
+ rpt->grouping_stack = list_delete_first(rpt->grouping_stack);
+ break;
+
+ case REPORT_FORMAT_YAML:
+ rpt->indent--;
+ rpt->grouping_stack = list_delete_first(rpt->grouping_stack);
+ break;
+ }
+}
+
+
+/*
+ * Emit a "dummy" group that never has any members.
+ *
+ * objtype is the type of the group object, labelname is its label within
+ * a containing object (if any).
+ */
+void
+ReportDummyGroup(const char *objtype, const char *labelname, ReportState *rpt)
+{
+ switch (rpt->format)
+ {
+ case REPORT_FORMAT_TEXT:
+
+ /* nothing to do */
+
+ break;
+
+ case REPORT_FORMAT_XML:
+ ReportXMLTag(objtype, X_CLOSE_IMMEDIATE, rpt);
+ break;
+
+ case REPORT_FORMAT_JSON:
+ ReportJSONLineEnding(rpt);
+ appendStringInfoSpaces(rpt->str, 2 * rpt->indent);
+ if (labelname)
+ {
+ escape_json(rpt->str, labelname);
+ appendStringInfoString(rpt->str, ": ");
+ }
+ escape_json(rpt->str, objtype);
+ break;
+
+ case REPORT_FORMAT_YAML:
+ ReportYAMLLineStarting(rpt);
+ if (labelname)
+ {
+ escape_yaml(rpt->str, labelname);
+ appendStringInfoString(rpt->str, ": ");
+ }
+ else
+ {
+ appendStringInfoString(rpt->str, "- ");
+ }
+ escape_yaml(rpt->str, objtype);
+ break;
+ }
+}
+
+/*
+ * Emit the start-of-output boilerplate.
+ *
+ * This is just enough different from processing a subgroup that we need
+ * a separate pair of subroutines.
+ */
+void
+ReportBeginOutput(ReportState *rpt)
+{
+ switch (rpt->format)
+ {
+ case REPORT_FORMAT_TEXT:
+ /* nothing to do */
+ break;
+
+ case REPORT_FORMAT_XML:
+ appendStringInfoString(rpt->str,
+ "<explain xmlns=\"http://www.postgresql.org/2009/explain\">\n");
+ rpt->indent++;
+ break;
+
+ case REPORT_FORMAT_JSON:
+ /* top-level structure is an array of plans */
+ appendStringInfoChar(rpt->str, '[');
+ rpt->grouping_stack = lcons_int(0, rpt->grouping_stack);
+ rpt->indent++;
+ break;
+
+ case REPORT_FORMAT_YAML:
+ rpt->grouping_stack = lcons_int(0, rpt->grouping_stack);
+ break;
+ }
+}
+
+/*
+ * Emit the end-of-output boilerplate.
+ */
+void
+ReportEndOutput(ReportState* rpt)
+{
+ switch (rpt->format)
+ {
+ case REPORT_FORMAT_TEXT:
+ /* nothing to do */
+ break;
+
+ case REPORT_FORMAT_XML:
+ rpt->indent--;
+ appendStringInfoString(rpt->str, "</explain>");
+ break;
+
+ case REPORT_FORMAT_JSON:
+ rpt->indent--;
+ appendStringInfoString(rpt->str, "\n]");
+ rpt->grouping_stack = list_delete_first(rpt->grouping_stack);
+ break;
+
+ case REPORT_FORMAT_YAML:
+ rpt->grouping_stack = list_delete_first(rpt->grouping_stack);
+ break;
+ }
+}
+
+/*
+ * Put an appropriate separator between multiple plans
+ */
+void
+ReportSeparatePlans(ReportState* rpt)
+{
+ switch (rpt->format)
+ {
+ case REPORT_FORMAT_TEXT:
+ /* add a blank line */
+ appendStringInfoChar(rpt->str, '\n');
+ break;
+
+ case REPORT_FORMAT_XML:
+ case REPORT_FORMAT_JSON:
+ case REPORT_FORMAT_YAML:
+ /* nothing to do */
+ break;
+ }
+}
+
+/*
+ * Emit opening or closing XML tag.
+ *
+ * "flags" must contain X_OPENING, X_CLOSING, or X_CLOSE_IMMEDIATE.
+ * Optionally, OR in X_NOWHITESPACE to suppress the whitespace we'd normally
+ * add.
+ *
+ * XML restricts tag names more than our other output formats, eg they can't
+ * contain white space or slashes. Replace invalid characters with dashes,
+ * so that for example "I/O Read Time" becomes "I-O-Read-Time".
+ */
+static void
+ReportXMLTag(const char *tagname, int flags, ReportState *rpt)
+{
+ const char *s;
+ const char *valid = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_.";
+
+ if ((flags & X_NOWHITESPACE) == 0)
+ appendStringInfoSpaces(rpt->str, 2 * rpt->indent);
+ appendStringInfoCharMacro(rpt->str, '<');
+ if ((flags & X_CLOSING) != 0)
+ appendStringInfoCharMacro(rpt->str, '/');
+ for (s = tagname; *s; s++)
+ appendStringInfoChar(rpt->str, strchr(valid, *s) ? *s : '-');
+ if ((flags & X_CLOSE_IMMEDIATE) != 0)
+ appendStringInfoString(rpt->str, " /");
+ appendStringInfoCharMacro(rpt->str, '>');
+ if ((flags & X_NOWHITESPACE) == 0)
+ appendStringInfoCharMacro(rpt->str, '\n');
+}
+
+/*
+ * Emit a JSON line ending.
+ *
+ * JSON requires a comma after each property but the last. To facilitate this,
+ * in JSON format, the text emitted for each property begins just prior to the
+ * preceding line-break (and comma, if applicable).
+ */
+static void
+ReportJSONLineEnding(ReportState *rpt)
+{
+ Assert(rpt->format == REPORT_FORMAT_JSON);
+ if (linitial_int(rpt->grouping_stack) != 0)
+ appendStringInfoChar(rpt->str, ',');
+ else
+ linitial_int(rpt->grouping_stack) = 1;
+ appendStringInfoChar(rpt->str, '\n');
+}
+
+/*
+ * Indent a YAML line.
+ *
+ * YAML lines are ordinarily indented by two spaces per indentation level.
+ * The text emitted for each property begins just prior to the preceding
+ * line-break, except for the first property in an unlabelled group, for which
+ * it begins immediately after the "- " that introduces the group. The first
+ * property of the group appears on the same line as the opening "- ".
+ */
+static void
+ReportYAMLLineStarting(ReportState *rpt)
+{
+ Assert(rpt->format == REPORT_FORMAT_YAML);
+ if (linitial_int(rpt->grouping_stack) == 0)
+ {
+ linitial_int(rpt->grouping_stack) = 1;
+ }
+ else
+ {
+ appendStringInfoChar(rpt->str, '\n');
+ appendStringInfoSpaces(rpt->str, rpt->indent * 2);
+ }
+}
+
+/*
+ * YAML is a superset of JSON; unfortunately, the YAML quoting rules are
+ * ridiculously complicated -- as documented in sections 5.3 and 7.3.3 of
+ * http://yaml.org/spec/1.2/spec.html -- so we chose to just quote everything.
+ * Empty strings, strings with leading or trailing whitespace, and strings
+ * containing a variety of special characters must certainly be quoted or the
+ * output is invalid; and other seemingly harmless strings like "0xa" or
+ * "true" must be quoted, lest they be interpreted as a hexadecimal or Boolean
+ * constant rather than a string.
+ */
+static void
+escape_yaml(StringInfo buf, const char *str)
+{
+ escape_json(buf, str);
+}
+
+/*
+ * In text format, first line ends here
+ */
+void
+ReportNewLine(ReportState* rpt)
+{
+ if (rpt->format == REPORT_FORMAT_TEXT) {
+ appendStringInfoChar(rpt->str, '\n');
+ }
+}
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 486ddf1..fdb2f2e 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -365,6 +365,12 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
}
/*
+ * Initialize percent done
+ */
+ result->percent_done = 0;
+ result->plan_rows = 0;
+
+ /*
* Initialize any initPlans present in this node. The planner put them in
* a separate list for us.
*/
@@ -398,6 +404,9 @@ TupleTableSlot *
ExecProcNode(PlanState *node)
{
TupleTableSlot *result;
+ double computed_rows;
+ double total_rows;
+ unsigned short new_progress; /* % of progression for next percent */
CHECK_FOR_INTERRUPTS();
@@ -577,6 +586,22 @@ ExecProcNode(PlanState *node)
break;
}
+ /*
+ * Progress Query
+ */
+ node->plan_rows++;
+ computed_rows = node->plan_rows;
+ total_rows = node->plan->plan_rows;
+ if (total_rows != 0)
+ new_progress = (100 * computed_rows) / total_rows;
+ else
+ new_progress = 0;
+
+ if (new_progress > node->percent_done) {
+ elog(DEBUG5, "ExecProcNode %d%%\n", (unsigned short) new_progress);
+ node->percent_done = new_progress;
+ }
+
if (node->instrument)
InstrStopNode(node->instrument, TupIsNull(result) ? 0.0 : 1.0);
@@ -839,6 +864,12 @@ ExecEndNode(PlanState *node)
elog(ERROR, "unrecognized node type: %d", (int) nodeTag(node));
break;
}
+
+ /*
+ * Re initialize percent done
+ */
+ node->percent_done = 0;
+ node->plan_rows = 0;
}
/*
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index d240f9c..23ac10e 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -664,9 +664,14 @@ BitmapHeapRecheck(BitmapHeapScanState *node, TupleTableSlot *slot)
TupleTableSlot *
ExecBitmapHeapScan(BitmapHeapScanState *node)
{
- return ExecScan(&node->ss,
+ TupleTableSlot *hold;
+
+ hold = ExecScan(&node->ss,
(ExecScanAccessMtd) BitmapHeapNext,
(ExecScanRecheckMtd) BitmapHeapRecheck);
+
+ return hold;
+
}
/* ----------------------------------------------------------------
@@ -828,6 +833,12 @@ ExecInitBitmapHeapScan(BitmapHeapScan *node, EState *estate, int eflags)
scanstate->pstate = NULL;
/*
+ * Initialize percent done
+ */
+ //scanstate->ss.ps.percent_done = 0;
+ //scanstate->ss.ps.plan_rows = 0;
+
+ /*
* Miscellaneous initialization
*
* create expression context for node
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index 5550f6c..0b80c16 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -66,6 +66,7 @@ IndexOnlyNext(IndexOnlyScanState *node)
*/
estate = node->ss.ps.state;
direction = estate->es_direction;
+
/* flip direction if this is an overall backward scan */
if (ScanDirectionIsBackward(((IndexOnlyScan *) node->ss.ps.plan)->indexorderdir))
{
@@ -306,15 +307,19 @@ IndexOnlyRecheck(IndexOnlyScanState *node, TupleTableSlot *slot)
TupleTableSlot *
ExecIndexOnlyScan(IndexOnlyScanState *node)
{
+ TupleTableSlot *hold;
+
/*
* If we have runtime keys and they've not already been set up, do it now.
*/
if (node->ioss_NumRuntimeKeys != 0 && !node->ioss_RuntimeKeysReady)
ExecReScan((PlanState *) node);
- return ExecScan(&node->ss,
+ hold = ExecScan(&node->ss,
(ExecScanAccessMtd) IndexOnlyNext,
(ExecScanRecheckMtd) IndexOnlyRecheck);
+
+ return hold;
}
/* ----------------------------------------------------------------
@@ -476,6 +481,12 @@ ExecInitIndexOnlyScan(IndexOnlyScan *node, EState *estate, int eflags)
indexstate->ioss_HeapFetches = 0;
/*
+ * Initialize percent done
+ */
+ //indexstate->ss.ps.percent_done = 0;
+ //indexstate->ss.ps.plan_rows = 0;
+
+ /*
* Miscellaneous initialization
*
* create expression context for node
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 5afd02e..c5d4458 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -90,6 +90,7 @@ IndexNext(IndexScanState *node)
*/
estate = node->ss.ps.state;
direction = estate->es_direction;
+
/* flip direction if this is an overall backward scan */
if (ScanDirectionIsBackward(((IndexScan *) node->ss.ps.plan)->indexorderdir))
{
@@ -538,6 +539,8 @@ reorderqueue_pop(IndexScanState *node)
TupleTableSlot *
ExecIndexScan(IndexScanState *node)
{
+ TupleTableSlot *hold;
+
/*
* If we have runtime keys and they've not already been set up, do it now.
*/
@@ -545,13 +548,15 @@ ExecIndexScan(IndexScanState *node)
ExecReScan((PlanState *) node);
if (node->iss_NumOrderByKeys > 0)
- return ExecScan(&node->ss,
+ hold = ExecScan(&node->ss,
(ExecScanAccessMtd) IndexNextWithReorder,
(ExecScanRecheckMtd) IndexRecheck);
else
- return ExecScan(&node->ss,
+ hold = ExecScan(&node->ss,
(ExecScanAccessMtd) IndexNext,
(ExecScanRecheckMtd) IndexRecheck);
+
+ return hold;
}
/* ----------------------------------------------------------------
@@ -905,6 +910,12 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
indexstate->ss.ps.state = estate;
/*
+ * Initialize percent done
+ */
+ //indexstate->ss.ps.percent_done = 0;
+ //indexstate->ss.ps.plan_rows = 0;
+
+ /*
* Miscellaneous initialization
*
* create expression context for node
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
index 0247bd2..34b0608 100644
--- a/src/backend/executor/nodeSamplescan.c
+++ b/src/backend/executor/nodeSamplescan.c
@@ -99,9 +99,13 @@ SampleRecheck(SampleScanState *node, TupleTableSlot *slot)
TupleTableSlot *
ExecSampleScan(SampleScanState *node)
{
- return ExecScan((ScanState *) node,
+ TupleTableSlot* hold;
+
+ hold = ExecScan((ScanState *) node,
(ExecScanAccessMtd) SampleNext,
(ExecScanRecheckMtd) SampleRecheck);
+
+ return hold;
}
/* ----------------------------------------------------------------
@@ -155,6 +159,12 @@ ExecInitSampleScan(SampleScan *node, EState *estate, int eflags)
scanstate->ss.ps.state = estate;
/*
+ * Initialize progress report
+ */
+ //scanstate->ss.ps.percent_done = 0;
+ //scanstate->ss.ps.plan_rows = 0;
+
+ /*
* Miscellaneous initialization
*
* create expression context for node
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 5680464..e275b8a 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -124,9 +124,13 @@ SeqRecheck(SeqScanState *node, TupleTableSlot *slot)
TupleTableSlot *
ExecSeqScan(SeqScanState *node)
{
- return ExecScan((ScanState *) node,
- (ExecScanAccessMtd) SeqNext,
- (ExecScanRecheckMtd) SeqRecheck);
+ TupleTableSlot* hold;
+
+ hold = ExecScan((ScanState *) node,
+ (ExecScanAccessMtd) SeqNext,
+ (ExecScanRecheckMtd) SeqRecheck);
+
+ return hold;
}
/* ----------------------------------------------------------------
@@ -179,6 +183,12 @@ ExecInitSeqScan(SeqScan *node, EState *estate, int eflags)
scanstate->ss.ps.state = estate;
/*
+ * Initialize percent done
+ */
+ //scanstate->ss.ps.percent_done = 0;
+ //scanstate->ss.ps.plan_rows = 0;
+
+ /*
* Miscellaneous initialization
*
* create expression context for node
diff --git a/src/backend/nodes/bitmapset.c b/src/backend/nodes/bitmapset.c
index bf8545d..da0b871 100644
--- a/src/backend/nodes/bitmapset.c
+++ b/src/backend/nodes/bitmapset.c
@@ -193,6 +193,25 @@ bms_make_singleton(int x)
}
/*
+ * bms_prealloc - pre allocate space for a bitmapset
+ */
+Bitmapset *
+bms_prealloc(int x)
+{
+ Bitmapset *result;
+ int wordnum;
+
+ if (x < 0)
+ elog(ERROR, "negative bitmapset member not allowed");
+
+ wordnum = WORDNUM(x);
+ result = (Bitmapset *) palloc0(BITMAPSET_SIZE(wordnum + 1));
+ result->nwords = wordnum + 1;
+
+ return result;
+}
+
+/*
* bms_free - free a bitmapset
*
* Same as pfree except for allowing NULL input
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 28cef85..0f5f32d 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -4238,3 +4238,248 @@ bmsToString(const Bitmapset *bms)
outBitmapset(&str, bms);
return str.data;
}
+
+/*
+ * nodeName -
+ * extract node name from node pointer
+ */
+int
+planNodeInfo(struct Plan* plan, struct PlanInfo* info)
+{
+ const char *pname;
+ const char *sname;
+ const char *operation;
+ const char *partialmode;
+ const char *strategy;
+ const char * custom_name;
+ int ret;
+
+ ret = 0;
+
+ switch (nodeTag(plan))
+ {
+ case T_Result:
+ pname = sname = "Result";
+ break;
+ case T_ProjectSet:
+ pname = sname = "ProjectSet";
+ break;
+ case T_ModifyTable:
+ sname = "ModifyTable";
+ switch (((ModifyTable *) plan)->operation)
+ {
+ case CMD_INSERT:
+ pname = operation = "Insert";
+ break;
+ case CMD_UPDATE:
+ pname = operation = "Update";
+ break;
+ case CMD_DELETE:
+ pname = operation = "Delete";
+ break;
+ default:
+ pname = "???";
+ ret = -1;
+ break;
+ }
+ break;
+ case T_Append:
+ pname = sname = "Append";
+ break;
+ case T_MergeAppend:
+ pname = sname = "Merge Append";
+ break;
+ case T_RecursiveUnion:
+ pname = sname = "Recursive Union";
+ break;
+ case T_BitmapAnd:
+ pname = sname = "BitmapAnd";
+ break;
+ case T_BitmapOr:
+ pname = sname = "BitmapOr";
+ break;
+ case T_NestLoop:
+ pname = sname = "Nested Loop";
+ break;
+ case T_MergeJoin:
+ pname = "Merge"; /* "Join" gets added by jointype switch */
+ sname = "Merge Join";
+ break;
+ case T_HashJoin:
+ pname = "Hash"; /* "Join" gets added by jointype switch */
+ sname = "Hash Join";
+ break;
+ case T_SeqScan:
+ pname = sname = "Seq Scan";
+ break;
+ case T_SampleScan:
+ pname = sname = "Sample Scan";
+ break;
+ case T_Gather:
+ pname = sname = "Gather";
+ break;
+ case T_GatherMerge:
+ pname = sname = "Gather Merge";
+ break;
+ case T_IndexScan:
+ pname = sname = "Index Scan";
+ break;
+ case T_IndexOnlyScan:
+ pname = sname = "Index Only Scan";
+ break;
+ case T_BitmapIndexScan:
+ pname = sname = "Bitmap Index Scan";
+ break;
+ case T_BitmapHeapScan:
+ pname = sname = "Bitmap Heap Scan";
+ break;
+ case T_TidScan:
+ pname = sname = "Tid Scan";
+ break;
+ case T_SubqueryScan:
+ pname = sname = "Subquery Scan";
+ break;
+ case T_FunctionScan:
+ pname = sname = "Function Scan";
+ break;
+ case T_ValuesScan:
+ pname = sname = "Values Scan";
+ break;
+ case T_CteScan:
+ pname = sname = "CTE Scan";
+ break;
+ case T_WorkTableScan:
+ pname = sname = "WorkTable Scan";
+ break;
+ case T_ForeignScan:
+ sname = "Foreign Scan";
+ switch (((ForeignScan *) plan)->operation)
+ {
+ case CMD_SELECT:
+ pname = "Foreign Scan";
+ operation = "Select";
+ break;
+ case CMD_INSERT:
+ pname = "Foreign Insert";
+ operation = "Insert";
+ break;
+ case CMD_UPDATE:
+ pname = "Foreign Update";
+ operation = "Update";
+ break;
+ case CMD_DELETE:
+ pname = "Foreign Delete";
+ operation = "Delete";
+ break;
+ default:
+ pname = "???";
+ ret = -1;
+ break;
+ }
+ break;
+ case T_CustomScan:
+ sname = "Custom Scan";
+ custom_name = ((CustomScan *) plan)->methods->CustomName;
+ if (custom_name)
+ pname = psprintf("Custom Scan (%s)", custom_name);
+ else
+ pname = sname;
+ break;
+ case T_Material:
+ pname = sname = "Materialize";
+ break;
+ case T_Sort:
+ pname = sname = "Sort";
+ break;
+ case T_Group:
+ pname = sname = "Group";
+ break;
+ case T_Agg:
+ {
+ Agg *agg = (Agg *) plan;
+
+ sname = "Aggregate";
+ switch (agg->aggstrategy)
+ {
+ case AGG_PLAIN:
+ pname = "Aggregate";
+ strategy = "Plain";
+ break;
+ case AGG_SORTED:
+ pname = "GroupAggregate";
+ strategy = "Sorted";
+ break;
+ case AGG_HASHED:
+ pname = "HashAggregate";
+ strategy = "Hashed";
+ break;
+ default:
+ pname = "Aggregate ???";
+ strategy = "???";
+ ret = -1;
+ break;
+ }
+
+ if (DO_AGGSPLIT_SKIPFINAL(agg->aggsplit))
+ {
+ partialmode = "Partial";
+ pname = psprintf("%s %s", partialmode, pname);
+ }
+ else if (DO_AGGSPLIT_COMBINE(agg->aggsplit))
+ {
+ partialmode = "Finalize";
+ pname = psprintf("%s %s", partialmode, pname);
+ }
+ else
+ partialmode = "Simple";
+ }
+ break;
+ case T_WindowAgg:
+ pname = sname = "WindowAgg";
+ break;
+ case T_Unique:
+ pname = sname = "Unique";
+ break;
+ case T_SetOp:
+ sname = "SetOp";
+ switch (((SetOp *) plan)->strategy)
+ {
+ case SETOP_SORTED:
+ pname = "SetOp";
+ strategy = "Sorted";
+ break;
+ case SETOP_HASHED:
+ pname = "HashSetOp";
+ strategy = "Hashed";
+ break;
+ default:
+ pname = "SetOp ???";
+ strategy = "???";
+ ret = -1;
+ break;
+ }
+ break;
+ case T_LockRows:
+ pname = sname = "LockRows";
+ break;
+ case T_Limit:
+ pname = sname = "Limit";
+ break;
+ case T_Hash:
+ pname = sname = "Hash";
+ break;
+ default:
+ pname = sname = "???";
+ elog(LOG, "HERE ???: %d", nodeTag(plan));
+ ret = -1;
+ break;
+ }
+
+ info->pname = pname;
+ info->sname = sname;
+ info->operation = operation;
+ info->partialmode = partialmode;
+ info->strategy = strategy;
+
+ return ret;
+}
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 89d2836..597e572 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -268,7 +268,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
DropTransformStmt
DropUserMappingStmt ExplainStmt FetchStmt
GrantStmt GrantRoleStmt ImportForeignSchemaStmt IndexStmt InsertStmt
- ListenStmt LoadStmt LockStmt NotifyStmt ExplainableStmt PreparableStmt
+ ListenStmt LoadStmt LockStmt NotifyStmt ExplainableStmt PreparableStmt ProgressStmt
CreateFunctionStmt AlterFunctionStmt ReindexStmt RemoveAggrStmt
RemoveFuncStmt RemoveOperStmt RenameStmt RevokeStmt RevokeRoleStmt
RuleActionStmt RuleActionStmtOrEmpty RuleStmt
@@ -499,6 +499,11 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
%type <defelt> explain_option_elem
%type <list> explain_option_list
+%type <list> progress_option_list
+%type <defelt> progress_option_elem
+%type <str> progress_option_name
+%type <node> progress_option_arg
+
%type <ival> reindex_target_type reindex_target_multitable
%type <ival> reindex_option_list reindex_option_elem
@@ -611,7 +616,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
ASSERTION ASSIGNMENT ASYMMETRIC AT ATTACH ATTRIBUTE AUTHORIZATION
BACKWARD BEFORE BEGIN_P BETWEEN BIGINT BINARY BIT
- BOOLEAN_P BOTH BY
+ BOOLEAN_P BOTH BUFFERS BY
CACHE CALLED CASCADE CASCADED CASE CAST CATALOG_P CHAIN CHAR_P
CHARACTER CHARACTERISTICS CHECK CHECKPOINT CLASS CLOSE
@@ -632,7 +637,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
EXTENSION EXTERNAL EXTRACT
FALSE_P FAMILY FETCH FILTER FIRST_P FLOAT_P FOLLOWING FOR
- FORCE FOREIGN FORWARD FREEZE FROM FULL FUNCTION FUNCTIONS
+ FORCE FOREIGN FORMAT FORWARD FREEZE FROM FULL FUNCTION FUNCTIONS
GENERATED GLOBAL GRANT GRANTED GREATEST GROUP_P GROUPING
@@ -662,7 +667,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
PARALLEL PARSER PARTIAL PARTITION PASSING PASSWORD PLACING PLANS POLICY
POSITION PRECEDING PRECISION PRESERVE PREPARE PREPARED PRIMARY
- PRIOR PRIVILEGES PROCEDURAL PROCEDURE PROGRAM PUBLICATION
+ PRIOR PRIVILEGES PROCEDURAL PROCEDURE PROGRAM PROGRESS PUBLICATION
QUOTE
@@ -678,7 +683,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
SUBSCRIPTION SUBSTRING SYMMETRIC SYSID SYSTEM_P
TABLE TABLES TABLESAMPLE TABLESPACE TEMP TEMPLATE TEMPORARY TEXT_P THEN
- TIME TIMESTAMP TO TRAILING TRANSACTION TRANSFORM TREAT TRIGGER TRIM TRUE_P
+ TIME TIMING TIMESTAMP TO TRAILING TRANSACTION TRANSFORM TREAT TRIGGER TRIM TRUE_P
TRUNCATE TRUSTED TYPE_P TYPES_P
UNBOUNDED UNCOMMITTED UNENCRYPTED UNION UNIQUE UNKNOWN UNLISTEN UNLOGGED
@@ -910,6 +915,7 @@ stmt :
| DropdbStmt
| ExecuteStmt
| ExplainStmt
+ | ProgressStmt
| FetchStmt
| GrantStmt
| GrantRoleStmt
@@ -10378,6 +10384,85 @@ explain_option_arg:
/*****************************************************************************
*
+ * PROGRESS:
+ * PROGRESS <backend_id>
+ *
+ *****************************************************************************/
+
+ProgressStmt:
+PROGRESS Iconst
+ {
+ ProgressStmt *n = makeNode(ProgressStmt);
+
+ n->pid = $2;
+ $$ = (Node *) n;
+}
+| PROGRESS VERBOSE Iconst
+ {
+ ProgressStmt *n = makeNode(ProgressStmt);
+
+ n->pid = $3;
+ n->options = list_make1(makeDefElem("verbose", NULL, @2));
+ $$ = (Node *) n;
+}
+| PROGRESS BUFFERS Iconst
+ {
+ ProgressStmt *n = makeNode(ProgressStmt);
+
+ n->pid = $3;
+ n->options = list_make1(makeDefElem("buffers", NULL, @2));
+ $$ = (Node *) n;
+}
+| PROGRESS TIMING Iconst
+ {
+ ProgressStmt *n = makeNode(ProgressStmt);
+
+ n->pid = $3;
+ n->options = list_make1(makeDefElem("timing", NULL, @2));
+ $$ = (Node *) n;
+}
+| PROGRESS '(' progress_option_list ')' Iconst
+ {
+ ProgressStmt *n = makeNode(ProgressStmt);
+
+ n->pid = $5;
+ n->options = $3;
+ $$ = (Node *) n;
+};
+
+progress_option_list:
+progress_option_elem {
+ $$ = list_make1($1);
+}
+| progress_option_list ',' progress_option_elem {
+ $$ = lappend($1, $3);
+};
+
+progress_option_elem:
+progress_option_name progress_option_arg {
+ $$ = makeDefElem($1, $2, @1);
+};
+
+progress_option_name: FORMAT
+{
+ $$ = "format";
+}
+| BUFFERS {
+ $$ = "buffers";
+}
+| TIMING {
+ $$ = "timing";
+}
+;
+
+progress_option_arg: NonReservedWord
+{
+ $$ = (Node*) makeString($1);
+};
+
+
+/*****************************************************************************
+ *
* QUERY:
* PREPARE <plan_name> [(args, ...)] AS <query>
*
@@ -14811,6 +14896,7 @@ unreserved_keyword:
| PROCEDURAL
| PROCEDURE
| PROGRAM
+ | PROGRESS
| PUBLICATION
| QUOTE
| RANGE
@@ -15031,6 +15117,7 @@ reserved_keyword:
| ASC
| ASYMMETRIC
| BOTH
+ | BUFFERS
| CASE
| CAST
| CHECK
@@ -15056,6 +15143,7 @@ reserved_keyword:
| FETCH
| FOR
| FOREIGN
+ | FORMAT
| FROM
| GRANT
| GROUP_P
@@ -15086,6 +15174,7 @@ reserved_keyword:
| SYMMETRIC
| TABLE
| THEN
+ | TIMING
| TO
| TRAILING
| TRUE_P
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 6831342..d026cd5 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -121,6 +121,7 @@
#include "storage/pmsignal.h"
#include "storage/proc.h"
#include "tcop/tcopprot.h"
+#include "executor/progress.h"
#include "utils/builtins.h"
#include "utils/datetime.h"
#include "utils/dynamic_loader.h"
diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c
index 4ca0ea4..953b8bf 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c
@@ -64,6 +64,12 @@ struct BufFile
off_t *offsets; /* palloc'd array with numFiles entries */
/*
+ * palloc's array of blk I/O stat
+ */
+ int *bytes_read;
+ int *bytes_write;
+
+ /*
* offsets[i] is the current seek position of files[i]. We use this to
* avoid making redundant FileSeek calls.
*/
@@ -109,8 +115,13 @@ makeBufFile(File firstfile)
file->numFiles = 1;
file->files = (File *) palloc(sizeof(File));
file->files[0] = firstfile;
+
file->offsets = (off_t *) palloc(sizeof(off_t));
file->offsets[0] = 0L;
+
+ file->bytes_read = (int*) palloc0(sizeof(int));
+ file->bytes_write = (int*) palloc0(sizeof(int));
+
file->isTemp = false;
file->isInterXact = false;
file->dirty = false;
@@ -146,6 +157,10 @@ extendBufFile(BufFile *file)
(file->numFiles + 1) * sizeof(File));
file->offsets = (off_t *) repalloc(file->offsets,
(file->numFiles + 1) * sizeof(off_t));
+
+ file->bytes_read = (int*) repalloc(file->bytes_read, (file->numFiles + 1) * sizeof(int));
+ file->bytes_write = (int*) repalloc(file->bytes_write, (file->numFiles + 1) * sizeof(int));
+
file->files[file->numFiles] = pfile;
file->offsets[file->numFiles] = 0L;
file->numFiles++;
@@ -212,6 +227,10 @@ BufFileClose(BufFile *file)
/* release the buffer space */
pfree(file->files);
pfree(file->offsets);
+
+ pfree(file->bytes_read);
+ pfree(file->bytes_write);
+
pfree(file);
}
@@ -261,6 +280,9 @@ BufFileLoadBuffer(BufFile *file)
WAIT_EVENT_BUFFILE_READ);
if (file->nbytes < 0)
file->nbytes = 0;
+
+ file->bytes_read[file->curFile] += file->nbytes;
+
file->offsets[file->curFile] += file->nbytes;
/* we choose not to advance curOffset here */
@@ -327,6 +349,9 @@ BufFileDumpBuffer(BufFile *file)
WAIT_EVENT_BUFFILE_WRITE);
if (bytestowrite <= 0)
return; /* failed to write */
+
+ file->bytes_write[file->curFile] += bytestowrite;
+
file->offsets[file->curFile] += bytestowrite;
file->curOffset += bytestowrite;
wpos += bytestowrite;
@@ -611,3 +636,25 @@ BufFileTellBlock(BufFile *file)
}
#endif
+
+struct buffile_state* BufFileState(BufFile *file)
+{
+ struct buffile_state* bfs;
+ int i;
+
+ if (file->numFiles == 0)
+ return NULL;
+
+ bfs = (struct buffile_state*) palloc0(sizeof(struct buffile_state));
+ bfs->numFiles = file->numFiles;
+
+ bfs->bytes_read = (int*) palloc0(file->numFiles * sizeof(int));
+ bfs->bytes_write = (int*) palloc0(file->numFiles * sizeof(int));
+
+ for (i = 0; i < file->numFiles; i++) {
+ bfs->bytes_read[i] = file->bytes_read[i];
+ bfs->bytes_write[i] = file->bytes_write[i];
+ }
+
+ return bfs;
+}
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 2d1ed14..dc5bc37 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -46,6 +46,7 @@
#include "storage/spin.h"
#include "utils/backend_random.h"
#include "utils/snapmgr.h"
+#include "executor/progress.h"
shmem_startup_hook_type shmem_startup_hook = NULL;
@@ -150,6 +151,7 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
size = add_size(size, SyncScanShmemSize());
size = add_size(size, AsyncShmemSize());
size = add_size(size, BackendRandomShmemSize());
+ size = add_size(size, ProgressShmemSize());
#ifdef EXEC_BACKEND
size = add_size(size, ShmemBackendArraySize());
#endif
@@ -270,6 +272,7 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
SyncScanShmemInit();
AsyncShmemInit();
BackendRandomShmemInit();
+ ProgressShmemInit();
#ifdef EXEC_BACKEND
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index ebf6a92..6a9ecc4 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -172,6 +172,9 @@ static inline void ProcArrayEndTransactionInternal(PGPROC *proc,
PGXACT *pgxact, TransactionId latestXid);
static void ProcArrayGroupClearXid(PGPROC *proc, TransactionId latestXid);
+/* Debugging primitive */
+static void dump_procs(void);
+
/*
* Report shared-memory space needed by CreateSharedProcArray.
*/
@@ -1257,6 +1260,60 @@ TransactionIdIsActive(TransactionId xid)
/*
+ * Convert process id to backend id.
+ * Needed for cmds/progress.c
+ */
+BackendId ProcPidGetBackendId(int pid)
+{
+ ProcArrayStruct *arrayP = procArray;
+ BackendId bid = InvalidBackendId;
+ int i;
+
+ //dump_procs();
+
+ LWLockAcquire(ProcArrayLock, LW_SHARED);
+
+ for (i = 0; i < arrayP->numProcs; i++) {
+ int pgprocno;
+ volatile PGPROC* proc;
+
+ pgprocno = arrayP->pgprocnos[i];
+ proc = &allProcs[pgprocno];
+ if (proc->pid == pid) {
+ bid = proc->backendId;
+ break;
+ }
+ }
+
+ LWLockRelease(ProcArrayLock);
+
+ return bid;
+}
+
+static void dump_procs(void)
+{
+ ProcArrayStruct *arrayP = procArray;
+ int i;
+
+ LWLockAcquire(ProcArrayLock, LW_SHARED);
+
+
+ for (i = 0; i < arrayP->numProcs; i++) {
+ int pgprocno;
+ volatile PGPROC* proc;
+
+ pgprocno = arrayP->pgprocnos[i];
+ proc = &allProcs[pgprocno];
+ elog(LOG, "pgprocno = %d, proc->pid = %d, proc->backendId = %d\n",
+ pgprocno,
+ proc->pid,
+ proc->backendId);
+ }
+
+ LWLockRelease(ProcArrayLock);
+}
+
+/*
* GetOldestXmin -- returns oldest transaction that was running
* when any current transaction was started.
*
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index 4a21d55..31a9e26 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -26,6 +26,7 @@
#include "storage/shmem.h"
#include "storage/sinval.h"
#include "tcop/tcopprot.h"
+#include "executor/progress.h"
/*
@@ -288,6 +289,9 @@ procsignal_sigusr1_handler(SIGNAL_ARGS)
if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN))
RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN);
+ if (CheckProcSignal(PROCSIG_PROGRESS))
+ HandleProgressSignal();
+
SetLatch(MyLatch);
latch_sigusr1_handler();
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 3e13394..f59de86 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -451,6 +451,11 @@ InitializeLWLocks(void)
for (id = 0; id < NUM_PREDICATELOCK_PARTITIONS; id++, lock++)
LWLockInitialize(&lock->lock, LWTRANCHE_PREDICATE_LOCK_MANAGER);
+ /* Initialize progress LWLocks in main array for MaxBackends */
+ lock = MainLWLockArray + NUM_INDIVIDUAL_LWLOCKS +
+ NUM_BUFFER_PARTITIONS + NUM_LOCK_PARTITIONS +
+ NUM_PREDICATELOCK_PARTITIONS;
+
/* Initialize named tranches. */
if (NamedLWLockTrancheRequests > 0)
{
@@ -494,7 +499,7 @@ RegisterLWLockTranches(void)
if (LWLockTrancheArray == NULL)
{
- LWLockTranchesAllocated = 64;
+ LWLockTranchesAllocated = 65;
LWLockTrancheArray = (char **)
MemoryContextAllocZero(TopMemoryContext,
LWLockTranchesAllocated * sizeof(char *));
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index e6025ec..fd2ef26 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -50,3 +50,4 @@ OldSnapshotTimeMapLock 42
BackendRandomLock 43
LogicalRepWorkerLock 44
CLogTruncationLock 45
+ProgressLock 46
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 75c2d9a..932f60f 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -42,6 +42,7 @@
#include "catalog/pg_type.h"
#include "commands/async.h"
#include "commands/prepare.h"
+#include "executor/progress.h"
#include "libpq/libpq.h"
#include "libpq/pqformat.h"
#include "libpq/pqsignal.h"
@@ -2999,6 +3000,9 @@ ProcessInterrupts(void)
if (ParallelMessagePending)
HandleParallelMessages();
+
+ if (progress_requested)
+ HandleProgressRequest();
}
@@ -3793,6 +3797,12 @@ PostgresMain(int argc, char *argv[],
if (!IsUnderPostmaster)
PgStartTime = GetCurrentTimestamp();
+ /* Init Progress reporting */
+ if (IsUnderPostmaster) {
+ ProgressBackendInit();
+ on_proc_exit(ProgressBackendExit, 0);
+ }
+
/*
* POSTGRES main processing loop begins here
*
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index e30aeb1..260f185 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -26,6 +26,7 @@
#include "tcop/utility.h"
#include "utils/memutils.h"
#include "utils/snapmgr.h"
+#include "executor/progress.h"
/*
@@ -96,6 +97,10 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
/* not yet executed */
qd->already_executed = false;
+ /* Track the QueryDesc from global variables */
+ MyQueryDesc = qd;
+ IsQueryDescValid = true;
+
return qd;
}
@@ -114,6 +119,9 @@ FreeQueryDesc(QueryDesc *qdesc)
/* Only the QueryDesc itself need be freed */
pfree(qdesc);
+
+ MyQueryDesc = NULL;
+ IsQueryDescValid = false;
}
@@ -152,6 +160,13 @@ ProcessQuery(PlannedStmt *plan,
dest, params, queryEnv, 0);
/*
+ * Create the Progress String buffer used to report progress
+ * Space needs to be allocated now because such buffer will be used under
+ * signal context when memory allocation is not possible.
+ */
+ //progress_state = CreateProgressState();
+
+ /*
* Call ExecutorStart to prepare the plan for execution
*/
ExecutorStart(queryDesc, 0);
@@ -207,6 +222,7 @@ ProcessQuery(PlannedStmt *plan,
ExecutorEnd(queryDesc);
FreeQueryDesc(queryDesc);
+ //FreeProgressState(progress_state);
}
/*
@@ -506,6 +522,15 @@ PortalStart(Portal portal, ParamListInfo params,
0);
/*
+ * Create the Progress String buffer used to report progress
+ * Space needs to be allocated now because such buffer will be used under
+ * signal context when memory allocation is not possible.
+ */
+ //progress_state = CreateProgressState();
+ //if (progress_state == NULL)
+ // elog(LOG, "ERROR: progress_state is NULL");
+
+ /*
* If it's a scrollable cursor, executor needs to support
* REWIND and backwards scan, as well as whatever the caller
* might've asked for.
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 24e5c42..2837655 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -67,6 +67,7 @@
#include "utils/acl.h"
#include "utils/guc.h"
#include "utils/syscache.h"
+#include "executor/progress.h"
/* Hook for plugins to get control in ProcessUtility() */
@@ -681,6 +682,10 @@ standard_ProcessUtility(PlannedStmt *pstmt,
queryEnv, dest);
break;
+ case T_ProgressStmt:
+ ProgressSendRequest(pstate, (ProgressStmt*) parsetree, dest);
+ break;
+
case T_AlterSystemStmt:
PreventTransactionChain(isTopLevel, "ALTER SYSTEM");
AlterSystemSetConfigFile((AlterSystemStmt *) parsetree);
@@ -2505,6 +2510,10 @@ CreateCommandTag(Node *parsetree)
tag = "EXPLAIN";
break;
+ case T_ProgressStmt:
+ tag = "PROGRESS";
+ break;
+
case T_CreateTableAsStmt:
switch (((CreateTableAsStmt *) parsetree)->relkind)
{
@@ -2846,6 +2855,7 @@ CreateCommandTag(Node *parsetree)
default:
elog(WARNING, "unrecognized node type: %d",
(int) nodeTag(parsetree));
+ elog(WARNING, "here");
tag = "???";
break;
}
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index 08b6030..cf3c3fc 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -22,6 +22,8 @@
#include "libpq/pqcomm.h"
#include "miscadmin.h"
#include "storage/backendid.h"
+#include "executor/execdesc.h"
+#include "executor/progress.h"
ProtocolVersion FrontendProtocol;
@@ -86,6 +88,13 @@ char *DatabasePath = NULL;
pid_t PostmasterPid = 0;
/*
+ * Global QueryDesc pointer.
+ * This is needed from signal context to locate the QueryDesc we are in
+ */
+QueryDesc* MyQueryDesc;
+bool IsQueryDescValid = false;
+
+/*
* IsPostmasterEnvironment is true in a postmaster process and any postmaster
* child process; it is false in a standalone process (bootstrap or
* standalone backend). IsUnderPostmaster is true in postmaster child
@@ -137,3 +146,6 @@ int VacuumPageDirty = 0;
int VacuumCostBalance = 0; /* working state for vacuum */
bool VacuumCostActive = false;
+
+StringInfo progress_str;
+ReportState* progress_state;
diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c
index 5f62cd5..3cf1caa 100644
--- a/src/backend/utils/sort/tuplesort.c
+++ b/src/backend/utils/sort/tuplesort.c
@@ -224,20 +224,6 @@ typedef union SlabSlot
} SlabSlot;
/*
- * Possible states of a Tuplesort object. These denote the states that
- * persist between calls of Tuplesort routines.
- */
-typedef enum
-{
- TSS_INITIAL, /* Loading tuples; still within memory limit */
- TSS_BOUNDED, /* Loading tuples into bounded-size heap */
- TSS_BUILDRUNS, /* Loading tuples; writing to tape */
- TSS_SORTEDINMEM, /* Sort completed entirely in memory */
- TSS_SORTEDONTAPE, /* Sort completed, final run is on tape */
- TSS_FINALMERGE /* Performing final merge on-the-fly */
-} TupSortStatus;
-
-/*
* Parameters for calculation of number of tapes to use --- see inittapes()
* and tuplesort_merge_order().
*
@@ -271,6 +257,7 @@ typedef int (*SortTupleComparator) (const SortTuple *a, const SortTuple *b,
struct Tuplesortstate
{
TupSortStatus status; /* enumerated value as shown above */
+ TupSortSubStatus sub_status; /* Sub status to track when creating and dumping to tape set */
int nKeys; /* number of columns in sort key */
bool randomAccess; /* did caller request random access? */
bool bounded; /* did caller specify a maximum number of
@@ -420,6 +407,20 @@ struct Tuplesortstate
int activeTapes; /* # of active input tapes in merge pass */
/*
+ * Tapeset read and write per tape
+ */
+ int *tp_read;
+ int *tp_write;
+
+ /*
+ * Total of all tuples written and read
+ */
+ int tp_read_effective;
+ int tp_write_effective;
+ int tp_read_merge;
+ int tp_write_merge;
+
+ /*
* These variables are used after completion of sorting to keep track of
* the next tuple to return. (In the tape case, the tape's current read
* position is also critical state.)
@@ -709,6 +710,7 @@ tuplesort_begin_common(int workMem, bool randomAccess)
#endif
state->status = TSS_INITIAL;
+ state->sub_status = TSSS_INVALID;
state->randomAccess = randomAccess;
state->bounded = false;
state->tuples = true;
@@ -721,6 +723,12 @@ tuplesort_begin_common(int workMem, bool randomAccess)
state->memtupcount = 0;
+ state->tp_read_effective = 0;
+ state->tp_write_effective = 0;
+
+ state->tp_read_merge = 0;
+ state->tp_write_merge = 0;
+
/*
* Initial size of array must be more than ALLOCSET_SEPARATE_THRESHOLD;
* see comments in grow_memtuples().
@@ -1619,12 +1627,15 @@ puttuple_common(Tuplesortstate *state, SortTuple *tuple)
/*
* Nope; time to switch to tape-based operation.
*/
+ state->sub_status = TSSS_INIT_TAPES;
inittapes(state);
/*
* Dump tuples until we are back under the limit.
*/
+ state->sub_status = TSSS_DUMPING_TUPLES;
dumptuples(state, false);
+ state->sub_status = TSSS_INVALID;
break;
case TSS_BOUNDED:
@@ -1716,7 +1727,9 @@ puttuple_common(Tuplesortstate *state, SortTuple *tuple)
/*
* If we are over the memory limit, dump tuples till we're under.
*/
+ state->sub_status = TSSS_DUMPING_TUPLES;
dumptuples(state, false);
+ state->sub_status = TSSS_INVALID;
break;
default:
@@ -1788,6 +1801,7 @@ tuplesort_performsort(Tuplesortstate *state)
* We were able to accumulate all the tuples within the allowed
* amount of memory. Just qsort 'em and we're done.
*/
+ state->sub_status = TSSS_SORTING_IN_MEM;
tuplesort_sort_memtuples(state);
state->current = 0;
state->eof_reached = false;
@@ -1803,12 +1817,14 @@ tuplesort_performsort(Tuplesortstate *state)
* in memory, using a heap to eliminate excess tuples. Now we
* have to transform the heap to a properly-sorted array.
*/
+ state->sub_status = TSSS_SORTING_IN_MEM;
sort_bounded_heap(state);
state->current = 0;
state->eof_reached = false;
state->markpos_offset = 0;
state->markpos_eof = false;
state->status = TSS_SORTEDINMEM;
+ state->sub_status = TSSS_INVALID;
break;
case TSS_BUILDRUNS:
@@ -1819,12 +1835,15 @@ tuplesort_performsort(Tuplesortstate *state)
* run (or, if !randomAccess, one run per tape). Note that
* mergeruns sets the correct state->status.
*/
+ state->sub_status = TSSS_DUMPING_TUPLES;
dumptuples(state, true);
- mergeruns(state);
+ state->sub_status = TSSS_MERGING_TAPES;
+ mergeruns(state); // set TSS_SORTEDONTAPE
state->eof_reached = false;
state->markpos_block = 0L;
state->markpos_offset = 0;
state->markpos_eof = false;
+ state->sub_status = TSSS_INVALID;
break;
default:
@@ -1867,11 +1886,15 @@ tuplesort_gettuple_common(Tuplesortstate *state, bool forward,
case TSS_SORTEDINMEM:
Assert(forward || state->randomAccess);
Assert(!state->slabAllocatorUsed);
+
+ state->sub_status = TSSS_FETCHING_FROM_MEM;
+
if (forward)
{
if (state->current < state->memtupcount)
{
*stup = state->memtuples[state->current++];
+ state->tp_read_effective++;
return true;
}
state->eof_reached = true;
@@ -1904,6 +1927,7 @@ tuplesort_gettuple_common(Tuplesortstate *state, bool forward,
return false;
}
*stup = state->memtuples[state->current - 1];
+ state->tp_read_effective++;
return true;
}
break;
@@ -1912,6 +1936,8 @@ tuplesort_gettuple_common(Tuplesortstate *state, bool forward,
Assert(forward || state->randomAccess);
Assert(state->slabAllocatorUsed);
+ state->sub_status = TSSS_FETCHING_FROM_TAPES;
+
/*
* The slot that held the tuple that we returned in previous
* gettuple call can now be reused.
@@ -1930,6 +1956,8 @@ tuplesort_gettuple_common(Tuplesortstate *state, bool forward,
if ((tuplen = getlen(state, state->result_tape, true)) != 0)
{
READTUP(state, stup, state->result_tape, tuplen);
+ state->tp_read[state->result_tape]++;
+ state->tp_read_effective++;
/*
* Remember the tuple we return, so that we can recycle
@@ -2018,6 +2046,8 @@ tuplesort_gettuple_common(Tuplesortstate *state, bool forward,
if (nmoved != tuplen)
elog(ERROR, "bogus tuple length in backward scan");
READTUP(state, stup, state->result_tape, tuplen);
+ state->tp_read[state->result_tape]++;
+ state->tp_read_effective++;
/*
* Remember the tuple we return, so that we can recycle its memory
@@ -2032,6 +2062,8 @@ tuplesort_gettuple_common(Tuplesortstate *state, bool forward,
/* We are managing memory ourselves, with the slab allocator. */
Assert(state->slabAllocatorUsed);
+ state->sub_status = TSSS_FETCHING_FROM_TAPES_WITH_MERGE;
+
/*
* The slab slot holding the tuple that we returned in previous
* gettuple call can now be reused.
@@ -2051,6 +2083,7 @@ tuplesort_gettuple_common(Tuplesortstate *state, bool forward,
SortTuple newtup;
*stup = state->memtuples[0];
+ state->tp_read_effective++;
/*
* Remember the tuple we return, so that we can recycle its
@@ -2412,6 +2445,9 @@ inittapes(Tuplesortstate *state)
state->tp_runs = (int *) palloc0(maxTapes * sizeof(int));
state->tp_dummy = (int *) palloc0(maxTapes * sizeof(int));
state->tp_tapenum = (int *) palloc0(maxTapes * sizeof(int));
+
+ state->tp_read = (int *) palloc0(maxTapes * sizeof(int));
+ state->tp_write = (int *) palloc0(maxTapes * sizeof(int));
/*
* Give replacement selection a try based on user setting. There will be
@@ -2461,7 +2497,10 @@ inittapes(Tuplesortstate *state)
state->tp_runs[j] = 0;
state->tp_dummy[j] = 1;
state->tp_tapenum[j] = j;
+ state->tp_read[j] = 0;
+ state->tp_write[j] = 0;
}
+
state->tp_fib[state->tapeRange] = 0;
state->tp_dummy[state->tapeRange] = 0;
@@ -2814,6 +2853,8 @@ mergeonerun(Tuplesortstate *state)
/* write the tuple to destTape */
srcTape = state->memtuples[0].tupindex;
WRITETUP(state, destTape, &state->memtuples[0]);
+ state->tp_write[destTape]++;
+ state->tp_write_merge++;
/* recycle the slot of the tuple we just wrote out, for the next read */
if (state->memtuples[0].tuple)
@@ -2917,6 +2958,8 @@ mergereadnext(Tuplesortstate *state, int srcTape, SortTuple *stup)
return false;
}
READTUP(state, stup, srcTape, tuplen);
+ state->tp_read[srcTape]++;
+ state->tp_read_merge++;
return true;
}
@@ -2960,6 +3003,8 @@ dumptuples(Tuplesortstate *state, bool alltuples)
Assert(state->memtupcount > 0);
WRITETUP(state, state->tp_tapenum[state->destTape],
&state->memtuples[0]);
+ state->tp_write[state->tp_tapenum[state->destTape]]++;
+ state->tp_write_effective++;
tuplesort_heap_delete_top(state, true);
}
else
@@ -3097,6 +3142,8 @@ dumpbatch(Tuplesortstate *state, bool alltuples)
{
WRITETUP(state, state->tp_tapenum[state->destTape],
&state->memtuples[i]);
+ state->tp_write[state->tp_tapenum[state->destTape]]++;
+ state->tp_write_effective++;
state->memtupcount--;
}
@@ -4448,3 +4495,68 @@ free_sort_tuple(Tuplesortstate *state, SortTuple *stup)
FREEMEM(state, GetMemoryChunkSpace(stup->tuple));
pfree(stup->tuple);
}
+
+TupSortStatus tuplesort_status(Tuplesortstate* ts_state)
+{
+ return ts_state->status;
+}
+
+int tuplesort_memtupcount(Tuplesortstate* ts_state)
+{
+ return ts_state->memtupcount;
+}
+
+int tuplesort_memtupsize(Tuplesortstate* ts_state)
+{
+ return ts_state->memtupsize;
+}
+
+int tuplesort_sub_status(Tuplesortstate* ts_state)
+{
+ return ts_state->sub_status;
+}
+
+int tuplesort_get_max_tapes(Tuplesortstate* ts_state)
+{
+ return ts_state->maxTapes;
+}
+
+struct ts_report* tuplesort_get_state(Tuplesortstate* tss)
+{
+ int i;
+ struct ts_report* tsr;
+
+ tsr = (struct ts_report*) palloc0(sizeof(struct ts_report));
+
+ tsr->status = tss->status;
+ tsr->sub_status = tss->sub_status;
+
+ tsr->memtupcount = tss->memtupcount;
+ tsr->memtupsize = tss->memtupsize;
+
+ tsr->maxTapes= tss->maxTapes;
+ tsr->activeTapes = tss->activeTapes;
+ tsr->result_tape = tss->result_tape;
+
+ tsr->tp_fib = (int*) palloc0(tsr->maxTapes * sizeof(int));
+ tsr->tp_runs = (int*) palloc0(tsr->maxTapes * sizeof(int));
+ tsr->tp_dummy = (int*) palloc0(tsr->maxTapes * sizeof(int));
+ tsr->tp_read = (int*) palloc0(tsr->maxTapes * sizeof(int));
+ tsr->tp_write = (int*) palloc0(tsr->maxTapes * sizeof(int));
+
+ tsr->tp_read_effective = tss->tp_read_effective;
+ tsr->tp_write_effective = tss->tp_write_effective;
+
+ tsr->tp_read_merge = tss->tp_read_merge;
+ tsr->tp_write_merge = tss->tp_write_merge;
+
+ for (i = 0; i < tss->maxTapes; i++) {
+ tsr->tp_fib[i] = tss->tp_fib[i];
+ tsr->tp_runs[i] = tss->tp_runs[i];
+ tsr->tp_dummy[i] = tss->tp_dummy[i];
+ tsr->tp_read[i] = tss->tp_read[i];
+ tsr->tp_write[i] = tss->tp_write[i];
+ }
+
+ return tsr;
+}
diff --git a/src/backend/utils/sort/tuplestore.c b/src/backend/utils/sort/tuplestore.c
index b3f6be7..1b0341b 100644
--- a/src/backend/utils/sort/tuplestore.c
+++ b/src/backend/utils/sort/tuplestore.c
@@ -66,17 +66,6 @@
/*
- * Possible states of a Tuplestore object. These denote the states that
- * persist between calls of Tuplestore routines.
- */
-typedef enum
-{
- TSS_INMEM, /* Tuples still fit in memory */
- TSS_WRITEFILE, /* Writing to temp file */
- TSS_READFILE /* Reading from temp file */
-} TupStoreStatus;
-
-/*
* State for a single read pointer. If we are in state INMEM then all the
* read pointers' "current" fields denote the read positions. In state
* WRITEFILE, the file/offset fields denote the read positions. In state
@@ -158,9 +147,11 @@ struct Tuplestorestate
* includes the deleted pointers.
*/
void **memtuples; /* array of pointers to palloc'd tuples */
- int memtupdeleted; /* the first N slots are currently unused */
- int memtupcount; /* number of tuples currently present */
- int memtupsize; /* allocated length of memtuples array */
+ int memtupcount; /* number of tuples currently present */
+ int memtupskipped; /* number of tuples skipped */
+ int memtupread; /* number of tuples read */
+ int memtupdeleted; /* the first N slots are currently unused */
+ int memtupsize; /* allocated length of memtuples array */
bool growmemtuples; /* memtuples' growth still underway? */
/*
@@ -178,6 +169,11 @@ struct Tuplestorestate
int writepos_file; /* file# (valid if READFILE state) */
off_t writepos_offset; /* offset (valid if READFILE state) */
+
+ int tuples_count;
+ int tuples_skipped;
+ int tuples_read; /* may exceed tuples_count if multiple readers */
+ int tuples_deleted;
};
#define COPYTUP(state,tup) ((*(state)->copytup) (state, tup))
@@ -268,6 +264,14 @@ tuplestore_begin_common(int eflags, bool interXact, int maxKBytes)
state->memtupdeleted = 0;
state->memtupcount = 0;
+ state->memtupskipped = 0;
+ state->memtupread = 0;
+
+ state->tuples_count = 0;
+ state->tuples_read = 0;
+ state->tuples_skipped = 0;
+ state->tuples_deleted = 0;
+
state->tuples = 0;
/*
@@ -285,8 +289,7 @@ tuplestore_begin_common(int eflags, bool interXact, int maxKBytes)
state->activeptr = 0;
state->readptrcount = 1;
state->readptrsize = 8; /* arbitrary */
- state->readptrs = (TSReadPointer *)
- palloc(state->readptrsize * sizeof(TSReadPointer));
+ state->readptrs = (TSReadPointer *) palloc(state->readptrsize * sizeof(TSReadPointer));
state->readptrs[0].eflags = eflags;
state->readptrs[0].eof_reached = false;
@@ -442,6 +445,9 @@ tuplestore_clear(Tuplestorestate *state)
readptr->eof_reached = false;
readptr->current = 0;
}
+
+ state->tuples_count = 0;
+ state->tuples_read = 0;
}
/*
@@ -801,7 +807,8 @@ tuplestore_puttuple_common(Tuplestorestate *state, void *tuple)
/* Stash the tuple in the in-memory array */
state->memtuples[state->memtupcount++] = tuple;
-
+ state->tuples_count++;
+
/*
* Done if we still fit in available memory and have array slots.
*/
@@ -851,6 +858,7 @@ tuplestore_puttuple_common(Tuplestorestate *state, void *tuple)
}
WRITETUP(state, tuple);
+ state->tuples_count++;
break;
case TSS_READFILE:
@@ -884,6 +892,7 @@ tuplestore_puttuple_common(Tuplestorestate *state, void *tuple)
}
WRITETUP(state, tuple);
+ state->tuples_count++;
break;
default:
elog(ERROR, "invalid tuplestore state");
@@ -920,6 +929,7 @@ tuplestore_gettuple(Tuplestorestate *state, bool forward,
if (readptr->current < state->memtupcount)
{
/* We have another tuple, so return it */
+ state->tuples_read++;
return state->memtuples[readptr->current++];
}
readptr->eof_reached = true;
@@ -950,6 +960,7 @@ tuplestore_gettuple(Tuplestorestate *state, bool forward,
Assert(!state->truncated);
return NULL;
}
+ state->tuples_read++;
return state->memtuples[readptr->current - 1];
}
break;
@@ -981,6 +992,7 @@ tuplestore_gettuple(Tuplestorestate *state, bool forward,
if ((tuplen = getlen(state, true)) != 0)
{
tup = READTUP(state, tuplen);
+ state->tuples_read++;
return tup;
}
else
@@ -1053,6 +1065,7 @@ tuplestore_gettuple(Tuplestorestate *state, bool forward,
(errcode_for_file_access(),
errmsg("could not seek in tuplestore temporary file: %m")));
tup = READTUP(state, tuplen);
+ state->tuples_read++;
return tup;
default:
@@ -1151,6 +1164,7 @@ tuplestore_skiptuples(Tuplestorestate *state, int64 ntuples, bool forward)
if (state->memtupcount - readptr->current >= ntuples)
{
readptr->current += ntuples;
+ state->tuples_skipped++;
return true;
}
readptr->current = state->memtupcount;
@@ -1168,6 +1182,7 @@ tuplestore_skiptuples(Tuplestorestate *state, int64 ntuples, bool forward)
if (readptr->current - state->memtupdeleted > ntuples)
{
readptr->current -= ntuples;
+ state->tuples_skipped++;
return true;
}
Assert(!state->truncated);
@@ -1191,6 +1206,7 @@ tuplestore_skiptuples(Tuplestorestate *state, int64 ntuples, bool forward)
pfree(tuple);
CHECK_FOR_INTERRUPTS();
}
+ state->tuples_skipped++;
return true;
}
}
@@ -1221,6 +1237,7 @@ dumptuples(Tuplestorestate *state)
if (i >= state->memtupcount)
break;
WRITETUP(state, state->memtuples[i]);
+ state->tuples_count++;
}
state->memtupdeleted = 0;
state->memtupcount = 0;
@@ -1457,6 +1474,28 @@ tuplestore_in_memory(Tuplestorestate *state)
return (state->status == TSS_INMEM);
}
+unsigned int
+tuplestore_status(Tuplestorestate *state)
+{
+ return state->status;
+}
+
+void
+tuplestore_get_state(Tuplestorestate *state, struct tss_report* tss)
+{
+ tss->memtupcount = state->memtupcount;
+ tss->memtupskipped = state->memtupskipped;
+ tss->memtupread = state->memtupread;
+ tss->memtupdeleted = state->memtupdeleted;
+
+ tss->tuples_count = state->tuples_count;
+ tss->tuples_read = state->tuples_read;
+ tss->tuples_skipped = state->tuples_skipped;
+ tss->tuples_deleted = state->tuples_deleted;
+ tss->readptrcount = state->readptrcount;
+
+ tss->status = state->status;
+}
/*
* Tape interface routines
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index b77f81d..b213ab9 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -16,42 +16,13 @@
#include "executor/executor.h"
#include "lib/stringinfo.h"
#include "parser/parse_node.h"
-
-typedef enum ExplainFormat
-{
- EXPLAIN_FORMAT_TEXT,
- EXPLAIN_FORMAT_XML,
- EXPLAIN_FORMAT_JSON,
- EXPLAIN_FORMAT_YAML
-} ExplainFormat;
-
-typedef struct ExplainState
-{
- StringInfo str; /* output buffer */
- /* options */
- bool verbose; /* be verbose */
- bool analyze; /* print actual times */
- bool costs; /* print estimated costs */
- bool buffers; /* print buffer usage */
- bool timing; /* print detailed node timing */
- bool summary; /* print total planning and execution timing */
- ExplainFormat format; /* output format */
- /* state for output formatting --- not reset for each new plan tree */
- int indent; /* current indentation level */
- List *grouping_stack; /* format-specific grouping state */
- /* state related to the current plan tree (filled by ExplainPrintPlan) */
- PlannedStmt *pstmt; /* top of plan */
- List *rtable; /* range table */
- List *rtable_names; /* alias names for RTEs */
- List *deparse_cxt; /* context list for deparsing expressions */
- Bitmapset *printed_subplans; /* ids of SubPlans we've printed */
-} ExplainState;
+#include "commands/report.h"
/* Hook for plugins to get control in ExplainOneQuery() */
typedef void (*ExplainOneQuery_hook_type) (Query *query,
int cursorOptions,
IntoClause *into,
- ExplainState *es,
+ ReportState *es,
const char *queryString,
ParamListInfo params);
extern PGDLLIMPORT ExplainOneQuery_hook_type ExplainOneQuery_hook;
@@ -64,41 +35,23 @@ extern PGDLLIMPORT explain_get_index_name_hook_type explain_get_index_name_hook;
extern void ExplainQuery(ParseState *pstate, ExplainStmt *stmt, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv, DestReceiver *dest);
-extern ExplainState *NewExplainState(void);
-
extern TupleDesc ExplainResultDesc(ExplainStmt *stmt);
extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
- ExplainState *es, const char *queryString,
+ ReportState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv);
extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
- ExplainState *es, const char *queryString,
+ ReportState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration);
-extern void ExplainPrintPlan(ExplainState *es, QueryDesc *queryDesc);
-extern void ExplainPrintTriggers(ExplainState *es, QueryDesc *queryDesc);
-
-extern void ExplainQueryText(ExplainState *es, QueryDesc *queryDesc);
-
-extern void ExplainBeginOutput(ExplainState *es);
-extern void ExplainEndOutput(ExplainState *es);
-extern void ExplainSeparatePlans(ExplainState *es);
+extern void ExplainPrintPlan(ReportState *es, QueryDesc *queryDesc);
+extern void ExplainPrintTriggers(ReportState *es, QueryDesc *queryDesc);
-extern void ExplainPropertyList(const char *qlabel, List *data,
- ExplainState *es);
-extern void ExplainPropertyListNested(const char *qlabel, List *data,
- ExplainState *es);
-extern void ExplainPropertyText(const char *qlabel, const char *value,
- ExplainState *es);
-extern void ExplainPropertyInteger(const char *qlabel, int value,
- ExplainState *es);
-extern void ExplainPropertyLong(const char *qlabel, long value,
- ExplainState *es);
-extern void ExplainPropertyFloat(const char *qlabel, double value, int ndigits,
- ExplainState *es);
-extern void ExplainPropertyBool(const char *qlabel, bool value,
- ExplainState *es);
+extern void ExplainBeginOutput(ReportState *es);
+extern void ExplainEndOutput(ReportState *es);
+extern void ExplainSeparatePlans(ReportState *es);
+extern void show_buffer_usage(ReportState* es, const BufferUsage* usage);
#endif /* EXPLAIN_H */
diff --git a/src/include/commands/prepare.h b/src/include/commands/prepare.h
index c60e6f3..7ec2410 100644
--- a/src/include/commands/prepare.h
+++ b/src/include/commands/prepare.h
@@ -42,7 +42,7 @@ extern void ExecuteQuery(ExecuteStmt *stmt, IntoClause *intoClause,
DestReceiver *dest, char *completionTag);
extern void DeallocateQuery(DeallocateStmt *stmt);
extern void ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into,
- ExplainState *es, const char *queryString,
+ ReportState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv);
/* Low-level access to stored prepared statements */
diff --git a/src/include/commands/report.h b/src/include/commands/report.h
new file mode 100644
index 0000000..1eb3606
--- /dev/null
+++ b/src/include/commands/report.h
@@ -0,0 +1,136 @@
+/*-------------------------------------------------------------------------
+ *
+ * report.h
+ *
+ * prototypes for report.c
+ *
+ * Portions Copyright (c) 1996-2017, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994-5, Regents of the University of California
+ *
+ * src/include/commands/report.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef REPORT_H
+#define REPORT_H
+
+#include "executor/executor.h"
+#include "lib/stringinfo.h"
+#include "parser/parse_node.h"
+
+typedef enum ReportFormat {
+ REPORT_FORMAT_TEXT,
+ REPORT_FORMAT_XML,
+ REPORT_FORMAT_JSON,
+ REPORT_FORMAT_YAML
+} ReportFormat;
+
+/*
+ * The top level report state data
+ */
+typedef struct ReportState {
+ /*
+ * options
+ */
+ bool verbose; /* be verbose */
+ bool buffers; /* print buffer usage */
+ bool timing; /* print detailed node timing */
+ bool analyze;
+ bool costs;
+ bool summary;
+
+ /*
+ * State for output formating
+ */
+ StringInfo str; /* output buffer */
+ ReportFormat format; /* format used to output progress */
+ int indent; /* current indentation level */
+ List* grouping_stack; /* format-specific grouping state */
+
+ MemoryContext memcontext;
+ /*
+ * State related to current plan/execution tree
+ */
+ PlannedStmt* pstmt;
+ struct Plan* plan;
+ struct PlanState* planstate;
+ List* rtable;
+ List* rtable_names;
+ List* deparse_cxt; /* context list for deparsing expressions */
+ EState* es; /* Top level data */
+ Bitmapset* printed_subplans; /* ids of SubPlans we've printed */
+} ReportState;
+
+/* OR-able flags for ReportXMLTag() */
+#define X_OPENING 0
+#define X_CLOSING 1
+#define X_CLOSE_IMMEDIATE 2
+#define X_NOWHITESPACE 4
+
+extern ReportState* CreateReportState(int needed);
+extern void FreeReportState(ReportState* prg);
+extern int SetReportStateCosts(ReportState* prg, bool costs);
+
+extern void ReportOpenGroup(const char *objtype, const char *labelname, bool labeled, ReportState *rpt);
+extern void ReportCloseGroup(const char *objtype, const char *labelname, bool labeled, ReportState *rpt);
+
+extern void ReportBeginOutput(ReportState *rpt);
+extern void ReportEndOutput(ReportState* rpt);
+extern void ReportSeparatePlans(ReportState* rpt);
+
+extern void ReportProperty(const char *qlabel, const char *value, bool numeric, ReportState *rpt);
+extern void ReportProperties(Plan* plan, PlanInfo* info, const char* plan_name, const char* relationship, ReportState* rpt);
+extern void ReportPropertyList(const char *qlabel, List *data, ReportState *rpt);
+extern void ReportPropertyListNested(const char *qlabel, List *data, ReportState *rpt);
+extern void ReportPropertyText(const char *qlabel, const char *value, ReportState* rpt);
+extern void ReportPropertyInteger(const char *qlabel, int value, ReportState *rpt);
+extern void ReportPropertyLong(const char *qlabel, long value, ReportState *rpt);
+extern void ReportPropertyFloat(const char *qlabel, double value, int ndigits, ReportState *rpt);
+extern void ReportPropertyBool(const char *qlabel, bool value, ReportState *rpt);
+extern void ReportNewLine(ReportState* rpt);
+
+extern void ReportDummyGroup(const char *objtype, const char *labelname, ReportState *rpt);
+
+extern void ReportScanTarget(Scan *plan, ReportState *es);
+extern void ReportTargetRel(Plan *plan, Index rti, ReportState *es);
+extern void ReportIndexScanDetails(Oid indexid, ScanDirection indexorderdir, ReportState *es);
+extern void ReportModifyTarget(ModifyTable *plan, ReportState *es);
+
+extern bool ReportHasChildren(Plan* plan, PlanState* planstate);
+
+extern void ReportQueryText(ReportState *es, QueryDesc *queryDesc);
+extern bool ReportPreScanNode(PlanState *planstate, Bitmapset **rels_used);
+
+typedef void (*functionNode)(PlanState* planstate, List* ancestors, const char* relationship,
+ const char* plan_name, ReportState* ps);
+
+extern void ReportMemberNodes(List *plans, PlanState **planstates, List *ancestors, ReportState *es, functionNode fn);
+extern void ReportSubPlans(List *plans, List *ancestors, const char *relationship, ReportState *es, functionNode fn);
+extern void ReportCustomChildren(CustomScanState *css, List *ancestors, ReportState *es, functionNode fn);
+
+extern void show_expression(Node *node, const char *qlabel, PlanState *planstate, List *ancestors, bool useprefix, ReportState *es);
+extern void show_qual(List *qual, const char *qlabel, PlanState *planstate, List *ancestors, bool useprefix, ReportState *es);
+extern void show_scan_qual(List *qual, const char *qlabel, PlanState *planstate, List *ancestors, ReportState *es);
+extern void show_upper_qual(List *qual, const char *qlabel, PlanState *planstate, List *ancestors, ReportState *es);
+extern void show_sort_keys(SortState *sortstate, List *ancestors, ReportState *es);
+extern void show_merge_append_keys(MergeAppendState *mstate, List *ancestors, ReportState *es);
+extern void show_agg_keys(AggState *astate, List *ancestors, ReportState *es);
+extern void show_grouping_sets(PlanState *planstate, Agg *agg, List *ancestors, ReportState *es);
+extern void show_grouping_set_keys(PlanState *planstate, Agg *aggnode, Sort *sortnode,
+ List *context, bool useprefix, List *ancestors, ReportState *es);
+extern void show_group_keys(GroupState *gstate, List *ancestors, ReportState *es);
+extern void show_sort_group_keys(PlanState *planstate, const char *qlabel, int nkeys, AttrNumber *keycols,
+ Oid *sortOperators, Oid *collations, bool *nullsFirst, List *ancestors, ReportState *es);
+extern void show_sortorder_options(StringInfo buf, Node *sortexpr, Oid sortOperator, Oid collation, bool nullsFirst);
+extern void show_tablesample(TableSampleClause *tsc, PlanState *planstate, List *ancestors, ReportState *es);
+extern void show_sort_info(SortState *sortstate, ReportState *es);
+extern void show_hash_info(HashState *hashstate, ReportState *es);
+extern void show_tidbitmap_info(BitmapHeapScanState *planstate, ReportState *es);
+extern void show_instrumentation_count(const char *qlabel, int which, PlanState *planstate, ReportState *es);
+extern void show_foreignscan_info(ForeignScanState *fsstate, ReportState *es);
+extern const char *explain_get_index_name(Oid indexId);
+extern void show_modifytable_info(ModifyTableState *mtstate, List *ancestors, ReportState *es);
+extern void show_plan_tlist(PlanState *planstate, List *ancestors, ReportState *es);
+extern void show_control_qual(PlanState *planstate, List *ancestors, ReportState *es);
+
+#endif
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index 37de6f2..5cc8359 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -67,4 +67,6 @@ extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
extern void FreeQueryDesc(QueryDesc *qdesc);
+extern PGDLLIMPORT QueryDesc* MyQueryDesc;
+extern PGDLLIMPORT bool IsQueryDescValid;
#endif /* EXECDESC_H */
diff --git a/src/include/executor/progress.h b/src/include/executor/progress.h
new file mode 100644
index 0000000..3234415
--- /dev/null
+++ b/src/include/executor/progress.h
@@ -0,0 +1,52 @@
+/*-------------------------------------------------------------------------
+ *
+ * progress.h
+ * Progress of query: PROGRESS
+ *
+ * Portions Copyright (c) 1996-2011, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/commands/progress.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef PROGRESS_H
+#define PROGRESS_H
+
+/* This is temporary and needed for EXPLAIN_FORMAT_ macros */
+#include "commands/report.h"
+
+/*
+ * This is arbitratry defined
+ * TODO: Add a guc variable to enable dynamic definition
+ */
+#define PROGRESS_AREA_SIZE (4096 * 64)
+
+/*
+ * Track when a progress report has been requested
+ */
+extern volatile bool progress_requested;
+
+/*
+ * global parameters in local backend memory
+ */
+extern StringInfo progress_str;
+extern ReportState* progress_state;
+
+/*
+ * Init and Fini functions
+ */
+extern size_t ProgressShmemSize(void);
+extern void ProgressShmemInit(void);
+extern void ProgressBackendInit(void);
+extern void ProgressBackendExit(int code, Datum arg);
+
+/*
+ * external functions
+ */
+extern void ProgressSendRequest(ParseState* pstate, ProgressStmt* stmt, DestReceiver* dest);
+extern void HandleProgressSignal(void);
+extern void HandleProgressRequest(void);
+
+#endif /* PROGRESS_H */
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 6ca44f7..4fe247a 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -16,8 +16,8 @@
#include "nodes/execnodes.h"
#include "nodes/relation.h"
-/* To avoid including explain.h here, reference ExplainState thus: */
-struct ExplainState;
+/* To avoid including explain.h here, reference ReportState thus: */
+struct ReportState;
/*
@@ -120,16 +120,16 @@ typedef HeapTuple (*RefetchForeignRow_function) (EState *estate,
bool *updated);
typedef void (*ExplainForeignScan_function) (ForeignScanState *node,
- struct ExplainState *es);
+ struct ReportState *es);
typedef void (*ExplainForeignModify_function) (ModifyTableState *mtstate,
ResultRelInfo *rinfo,
List *fdw_private,
int subplan_index,
- struct ExplainState *es);
+ struct ReportState *es);
typedef void (*ExplainDirectModify_function) (ForeignScanState *node,
- struct ExplainState *es);
+ struct ReportState *es);
typedef int (*AcquireSampleRowsFunc) (Relation relation, int elevel,
HeapTuple *rows, int targrows,
diff --git a/src/include/nodes/bitmapset.h b/src/include/nodes/bitmapset.h
index 109f7b0..ff11d77 100644
--- a/src/include/nodes/bitmapset.h
+++ b/src/include/nodes/bitmapset.h
@@ -66,6 +66,7 @@ typedef enum
extern Bitmapset *bms_copy(const Bitmapset *a);
extern bool bms_equal(const Bitmapset *a, const Bitmapset *b);
extern Bitmapset *bms_make_singleton(int x);
+extern Bitmapset *bms_prealloc(int max);
extern void bms_free(Bitmapset *a);
extern Bitmapset *bms_union(const Bitmapset *a, const Bitmapset *b);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 4330a85..9ff4224 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -809,6 +809,9 @@ typedef struct PlanState
Instrumentation *instrument; /* Optional runtime stats for this node */
WorkerInstrumentation *worker_instrument; /* per-worker instrumentation */
+ double plan_rows; /* number of rows returned so far */
+ unsigned short percent_done; /* percentage of execution computed so far */
+
/*
* Common structural data for all Plan types. These links to subsidiary
* state trees parallel links in the associated plan tree (except for the
diff --git a/src/include/nodes/extensible.h b/src/include/nodes/extensible.h
index 0b02cc1..5ebf1cc 100644
--- a/src/include/nodes/extensible.h
+++ b/src/include/nodes/extensible.h
@@ -19,6 +19,7 @@
#include "nodes/execnodes.h"
#include "nodes/plannodes.h"
#include "nodes/relation.h"
+#include "executor/progress.h"
/* maximum length of an extensible node identifier */
#define EXTNODENAME_MAX_LEN 64
@@ -144,7 +145,10 @@ typedef struct CustomExecMethods
/* Optional: print additional information in EXPLAIN */
void (*ExplainCustomScan) (CustomScanState *node,
List *ancestors,
- ExplainState *es);
+ ReportState *es);
+
+ /* Optional: report execution progress state for PROGRESS */
+ void (*ProgressCustomScan)(CustomScanState *node, List *ancestors, ReportState *ps);
} CustomExecMethods;
extern void RegisterCustomScanMethods(const CustomScanMethods *methods);
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index f59d719..f1c2dfa 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -340,6 +340,7 @@ typedef enum NodeTag
T_DropdbStmt,
T_VacuumStmt,
T_ExplainStmt,
+ T_ProgressStmt,
T_CreateTableAsStmt,
T_CreateSeqStmt,
T_AlterSeqStmt,
@@ -626,6 +627,13 @@ extern void *copyObjectImpl(const void *obj);
*/
extern bool equal(const void *a, const void *b);
+/*
+ * plan nodes functions
+ */
+struct PlanInfo;
+struct Plan;
+
+extern int planNodeInfo(struct Plan* plan, struct PlanInfo* info);
/*
* Typedefs for identifying qualifier selectivities and plan costs as such.
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 9f57388..c105b4b 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -3085,6 +3085,17 @@ typedef struct ExplainStmt
} ExplainStmt;
/* ----------------------
+ * PROGRESS Statement
+ * ----------------------
+ */
+typedef struct ProgressStmt {
+ NodeTag type;
+
+ int pid; /* pid of the monitored process */
+ List* options; /* format output option */
+} ProgressStmt;
+
+/* ----------------------
* CREATE TABLE AS Statement (a/k/a SELECT INTO)
*
* A query written as CREATE TABLE AS will produce this node type natively.
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index cba9155..bae48ce 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -163,6 +163,17 @@ typedef struct Plan
#define innerPlan(node) (((Plan *)(node))->righttree)
#define outerPlan(node) (((Plan *)(node))->lefttree)
+/*
+ * Structure used to fetch Plan node informations in text format
+ */
+typedef struct PlanInfo {
+ const char* pname; /* node type name for text output */
+ const char *sname; /* node type name for non-text output */
+ const char *strategy;
+ const char *partialmode;
+ const char *operation;
+ const char *custom_name;
+} PlanInfo;
/* ----------------
* Result node -
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index 37542aa..5764e14 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -61,6 +61,7 @@ PG_KEYWORD("binary", BINARY, TYPE_FUNC_NAME_KEYWORD)
PG_KEYWORD("bit", BIT, COL_NAME_KEYWORD)
PG_KEYWORD("boolean", BOOLEAN_P, COL_NAME_KEYWORD)
PG_KEYWORD("both", BOTH, RESERVED_KEYWORD)
+PG_KEYWORD("buffers", BUFFERS, RESERVED_KEYWORD)
PG_KEYWORD("by", BY, UNRESERVED_KEYWORD)
PG_KEYWORD("cache", CACHE, UNRESERVED_KEYWORD)
PG_KEYWORD("called", CALLED, UNRESERVED_KEYWORD)
@@ -168,6 +169,7 @@ PG_KEYWORD("following", FOLLOWING, UNRESERVED_KEYWORD)
PG_KEYWORD("for", FOR, RESERVED_KEYWORD)
PG_KEYWORD("force", FORCE, UNRESERVED_KEYWORD)
PG_KEYWORD("foreign", FOREIGN, RESERVED_KEYWORD)
+PG_KEYWORD("format", FORMAT, RESERVED_KEYWORD)
PG_KEYWORD("forward", FORWARD, UNRESERVED_KEYWORD)
PG_KEYWORD("freeze", FREEZE, TYPE_FUNC_NAME_KEYWORD)
PG_KEYWORD("from", FROM, RESERVED_KEYWORD)
@@ -312,6 +314,7 @@ PG_KEYWORD("privileges", PRIVILEGES, UNRESERVED_KEYWORD)
PG_KEYWORD("procedural", PROCEDURAL, UNRESERVED_KEYWORD)
PG_KEYWORD("procedure", PROCEDURE, UNRESERVED_KEYWORD)
PG_KEYWORD("program", PROGRAM, UNRESERVED_KEYWORD)
+PG_KEYWORD("progress", PROGRESS, UNRESERVED_KEYWORD)
PG_KEYWORD("publication", PUBLICATION, UNRESERVED_KEYWORD)
PG_KEYWORD("quote", QUOTE, UNRESERVED_KEYWORD)
PG_KEYWORD("range", RANGE, UNRESERVED_KEYWORD)
@@ -397,6 +400,7 @@ PG_KEYWORD("text", TEXT_P, UNRESERVED_KEYWORD)
PG_KEYWORD("then", THEN, RESERVED_KEYWORD)
PG_KEYWORD("time", TIME, COL_NAME_KEYWORD)
PG_KEYWORD("timestamp", TIMESTAMP, COL_NAME_KEYWORD)
+PG_KEYWORD("timing", TIMING, RESERVED_KEYWORD)
PG_KEYWORD("to", TO, RESERVED_KEYWORD)
PG_KEYWORD("trailing", TRAILING, RESERVED_KEYWORD)
PG_KEYWORD("transaction", TRANSACTION, UNRESERVED_KEYWORD)
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 5e029c0..9b15433 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -812,7 +812,8 @@ typedef enum
WAIT_EVENT_SAFE_SNAPSHOT,
WAIT_EVENT_SYNC_REP,
WAIT_EVENT_LOGICAL_SYNC_DATA,
- WAIT_EVENT_LOGICAL_SYNC_STATE_CHANGE
+ WAIT_EVENT_LOGICAL_SYNC_STATE_CHANGE,
+ WAIT_EVENT_PROGRESS
} WaitEventIPC;
/* ----------
diff --git a/src/include/storage/buffile.h b/src/include/storage/buffile.h
index fe00bf0..8fe0cb8 100644
--- a/src/include/storage/buffile.h
+++ b/src/include/storage/buffile.h
@@ -30,6 +30,13 @@
typedef struct BufFile BufFile;
+struct buffile_state {
+ int numFiles;
+
+ int* bytes_read;
+ int* bytes_write;
+};
+
/*
* prototypes for functions in buffile.c
*/
@@ -41,5 +48,6 @@ extern size_t BufFileWrite(BufFile *file, void *ptr, size_t size);
extern int BufFileSeek(BufFile *file, int fileno, off_t offset, int whence);
extern void BufFileTell(BufFile *file, int *fileno, off_t *offset);
extern int BufFileSeekBlock(BufFile *file, long blknum);
+extern struct buffile_state* BufFileState(BufFile *file);
#endif /* BUFFILE_H */
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index 9b42e49..1a113ac 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -86,6 +86,9 @@ extern RunningTransactions GetRunningTransactionData(void);
extern bool TransactionIdIsInProgress(TransactionId xid);
extern bool TransactionIdIsActive(TransactionId xid);
+
+extern BackendId ProcPidGetBackendId(int pid);
+
extern TransactionId GetOldestXmin(Relation rel, int flags);
extern TransactionId GetOldestActiveTransactionId(void);
extern TransactionId GetOldestSafeDecodingTransactionId(void);
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index d068dde..c0f3dbe 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -41,6 +41,9 @@ typedef enum
PROCSIG_RECOVERY_CONFLICT_BUFFERPIN,
PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK,
+ /* progress monitoring */
+ PROCSIG_PROGRESS,
+
NUM_PROCSIGNALS /* Must be last! */
} ProcSignalReason;
diff --git a/src/include/utils/tuplesort.h b/src/include/utils/tuplesort.h
index 14b9026..36f7d00 100644
--- a/src/include/utils/tuplesort.h
+++ b/src/include/utils/tuplesort.h
@@ -25,6 +25,32 @@
#include "fmgr.h"
#include "utils/relcache.h"
+/*
+ * Possible states of a Tuplesort object. These denote the states that
+ * persist between calls of Tuplesort routines.
+ */
+typedef enum
+{
+ TSS_INITIAL, /* Loading tuples; still within memory limit */
+ TSS_BOUNDED, /* Loading tuples into bounded-size heap */
+ TSS_BUILDRUNS, /* Loading tuples; writing to tape */
+ TSS_SORTEDINMEM, /* Sort completed entirely in memory */
+ TSS_SORTEDONTAPE, /* Sort completed, final run is on tape */
+ TSS_FINALMERGE /* Performing final merge on-the-fly */
+} TupSortStatus;
+
+typedef enum
+{
+ TSSS_INVALID, /* Invalid sub status */
+ TSSS_INIT_TAPES, /* Creating tapes */
+ TSSS_DUMPING_TUPLES, /* dumping tuples from mem to tapes */
+ TSSS_SORTING_IN_MEM,
+ TSSS_SORTING_ON_TAPES,
+ TSSS_MERGING_TAPES,
+ TSSS_FETCHING_FROM_MEM,
+ TSSS_FETCHING_FROM_TAPES,
+ TSSS_FETCHING_FROM_TAPES_WITH_MERGE
+} TupSortSubStatus;
/* Tuplesortstate is an opaque type whose details are not known outside
* tuplesort.c.
@@ -32,9 +58,45 @@
typedef struct Tuplesortstate Tuplesortstate;
/*
+ * Used to fetch state of Tuplesortstate
+ */
+struct ts_report {
+ TupSortStatus status;
+ TupSortSubStatus sub_status;
+
+ int memtupcount;
+ Size memtupsize;
+
+ int maxTapes;
+
+
+ int* tp_fib;
+ int* tp_runs;
+ int* tp_dummy;
+ int* tp_tapenum;
+ int activeTapes;
+ int result_tape;
+
+ int* tp_read;
+ int* tp_write;
+
+ /*
+ * Effective rows in/out from sort
+ */
+ int tp_read_effective;
+ int tp_write_effective;
+
+ /*
+ * Rows in/out needed to perform sort
+ */
+ int tp_read_merge;
+ int tp_write_merge;
+};
+
+/*
* We provide multiple interfaces to what is essentially the same code,
* since different callers have different data to be sorted and want to
- * specify the sort key information differently. There are two APIs for
+ * specify the sortkey information differently. There are two APIs for
* sorting HeapTuples and two more for sorting IndexTuples. Yet another
* API supports sorting bare Datums.
*
@@ -123,4 +185,11 @@ extern void tuplesort_rescan(Tuplesortstate *state);
extern void tuplesort_markpos(Tuplesortstate *state);
extern void tuplesort_restorepos(Tuplesortstate *state);
+extern TupSortStatus tuplesort_status(Tuplesortstate* state);
+extern int tuplesort_memtupcount(Tuplesortstate* state);
+extern int tuplesort_memtupsize(Tuplesortstate* state);
+extern int tuplesort_sub_status(Tuplesortstate* state);
+extern int tuplesort_get_max_tapes(Tuplesortstate* state);
+extern struct ts_report* tuplesort_get_state(Tuplesortstate* tss);
+
#endif /* TUPLESORT_H */
diff --git a/src/include/utils/tuplestore.h b/src/include/utils/tuplestore.h
index b31ede8..c9d9e3d 100644
--- a/src/include/utils/tuplestore.h
+++ b/src/include/utils/tuplestore.h
@@ -33,6 +33,15 @@
#include "executor/tuptable.h"
+/*
+ * Possible states of a Tuplestore object. These denote the states that
+ * persist between calls of Tuplestore routines.
+ */
+typedef enum {
+ TSS_INMEM, /* Tuples still fit in memory */
+ TSS_WRITEFILE, /* Writing to temp file */
+ TSS_READFILE /* Reading from temp file */
+} TupStoreStatus;
/* Tuplestorestate is an opaque type whose details are not known outside
* tuplestore.c.
@@ -40,6 +49,24 @@
typedef struct Tuplestorestate Tuplestorestate;
/*
+ * Use dto fetch progress/status of Tuplestore
+ */
+struct tss_report {
+ int memtupcount;
+ int memtupskipped;
+ int memtupread;
+ int memtupdeleted;
+
+ int tuples_count;
+ int tuples_skipped;
+ int tuples_read;
+ int tuples_deleted;
+ int readptrcount;
+
+ int status;
+};
+
+/*
* Currently we only need to store MinimalTuples, but it would be easy
* to support the same behavior for IndexTuples and/or bare Datums.
*/
@@ -69,6 +96,8 @@ extern void tuplestore_copy_read_pointer(Tuplestorestate *state,
extern void tuplestore_trim(Tuplestorestate *state);
extern bool tuplestore_in_memory(Tuplestorestate *state);
+extern unsigned int tuplestore_status(Tuplestorestate *state);
+
extern bool tuplestore_gettupleslot(Tuplestorestate *state, bool forward,
bool copy, TupleTableSlot *slot);
@@ -88,4 +117,8 @@ extern void tuplestore_clear(Tuplestorestate *state);
extern void tuplestore_end(Tuplestorestate *state);
+extern bool tuplestore_in_memory(Tuplestorestate *state);
+extern unsigned int tuplestore_status(Tuplestorestate *state);
+extern void tuplestore_get_state(Tuplestorestate *state, struct tss_report* tss);
+
#endif /* TUPLESTORE_H */
Hi!
On 17.04.2017 15:09, Remi Colinet wrote:
Hello,
I've implemented a new command named PROGRESS to monitor progression of
long running SQL queries in a backend process.Use case
=======A use case is shown in the below example based on a table named t_10m
with 10 millions rows.The table has been created with :
CREATE TABLE T_10M ( id integer, md5 text);
INSERT INTO T_10M SELECT generate_series(1,10000000) AS id,
md5(random()::text) AS md5;1/ Start a first psql session to run long SQL queries:
[pgadm@rco ~]$ psql -A -d test
psql (10devel)
Type "help" for help.
test=#The option -A is used to allow rows to be output without formatting work.
Redirect output to a file in order to let the query run without terminal
interaction:
test=# \o outStart a long running query:
test=# select * from t_10M order by md5;2/ In a second psql session, list the backend pid and their SQL query
[pgadm@rco ~]$ psql -d test
psql (10devel)
Type "help" for help.test=# select pid, query from pg_stat_activity ;
pid | query
-------+-------------------------------------------
19081 |
19084 |
19339 | select pid, query from pg_stat_activity ;
19341 | select * from t_10m order by md5;
19727 | select * from t_10m order by md5;
19726 | select * from t_10m order by md5;
19079 |
19078 |
19080 |
(9 rows)test=#
Chose the pid of the backend running the long SQL query to be monitored.
Above example is a parallel SQL query. Lowest pid is the main backend of
the query.test=# PROGRESS 19341;
PLAN
PROGRESS
-------------------------------------------------------------------------------------------
Gather Merge
-> Sort=> dumping tuples to tapes
rows r/w merge 0/0 rows r/w effective 0/2722972 0%
Sort Key: md5
-> Parallel Seq Scan on t_10m => rows 2751606/3954135 69% blks
125938/161222 78%
(5 rows)test=#
The query of the monitored backend is:
test=# select * from t_10M order by md5;Because the table has 10 millions of rows, the sort is done on tapes.
Design of the command
=================The design of the patch/command is:
- the user issue the "PROGRESS pid" command from a psql session. The pid
is the one of the backend which runs the SQL query for which we want to
get a progression report. It can be determined from the view
pg_stat_activity.
- the monitoring backend, upon receiving the "PROGRESS pid" command from
psql utility used in step above, sends a signal to the backend whose
process pid is the one provided in the PROGRESS command.
- the monitored backend receives the signal and notes the request as for
any interrupt. Then, it continues its execution of its SQL query until
interrupts can be serviced.
- when the monitored process can service the interrupts, it deals with
the progress request by collecting its execution tree with the execution
progress of each long running node. At this time, the SQL query is no
more running. The progression of each node is calculated during the
execution of the SQL query which is at this moment stopped. The
execution tree is dumped in shared memory pages allocated at the start
of the server. Then, the monitored backend set a latch on which the
monitoring process is waiting for. It then continues executing its SQL
query.
- the monitoring backend collects the share memory data dumped by the
monitored backed, and sends it to its psql session, as a list of rows.The command PROGRESS does not incur any slowness on the running query
because the execution progress is only computed upon receiving the
progress request which is supposed to be seldom used.The code heavily reuses the one of the explain command. In order to
share as much code as possible with the EXPLAIN command, part of the
EXPLAIN code which deals with reporting quals for instance, has been
moved to a new report.c file in the src/backend/commands folder. This
code in report.c is shared between explain.c source code and PROGRESS
command source code which is in progress.c file.The progression reported by PROGRESS command is given in terms of rows,
blocks, bytes and percents. The values displayed depend on the node type
in the execution plan.The current patch implements all the possible nodes which could take a
lot of time:
- Sequential scan nodes with rows and block progress (node type
T_SeqScan, T_SampleScan, T_BitmapHeaepScan, T_SubqueryScan,
T_FunctionScan, T_ValuesScan, T_CteScan, T_WorkTableScan)
- Tuple id scan node with rows and blocks progress (T_TidScan)
- Limit node with rows progress (T_Limit)
- Foreign and custom scan with rows and blocks progress (T_ForeignScan,
T_CustomScan)
- Index scan, index only scan and bitmap index scan with rows and blocks
progressUse cases
========Some further examples of use are shown below in the test_v1.txt file.
What do you make of this idea/patch?
Does it make sense?
Any suggestion is welcome.
The current patch is still work in progress. It is meanwhile stable. It
can be used with regular queries. Utilities commands are not supported
for the moment.
Documentation is not yet written.Regards
Remi
I had implemented analogous feature as extension *pg_query_state* [1]
the idea of which I proposed in the thread [2]. Together with this
extension I provided some patches to postgres core to enable to send
custom signals to working backend (similar to your PROCSIG_PROGRESS) and
to print the current query state through patches in 'ExplainNode' function.
I had implemented the same mechanics as you:
1) interrupt the working backend through ProcSignal;
2) handle progress request in the CHECK_FOR_INTERRUPTS entry;
3) transfer query state through shared memory to caller.
But criticism of my approach was that the structure 'QueryDesc' on basis
of which query state is formed can be inconsistent in the places where
CHECK_FOR_INTERRUPTS appears [3].
I plan to propose the custom_signal patch to community as soon as
possible and as consequence release *pg_query_state* from dependency on
patches to postgres core. In perspective, I want to resolve the problem
related to binding to the CHECK_FOR_INTERRUPTS entries perhaps through
patching the executor and implement the robust PROGRESS command.
1. https://github.com/postgrespro/pg_query_state
2.
/messages/by-id/dbfb1a42-ee58-88fd-8d77-550498f52bc5@postgrespro.ru
3. /messages/by-id/24182.1472745492@sss.pgh.pa.us
--
Maksim Milyutin
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hello Maksim,
The core implementation I suggested for the new PROGRESS command uses
different functions from the one used by EXPLAIN for its core
implementation.
Some source code is shared with EXPLAIN command. But this shared code is
only related to quals, properties, children, subPlans and few other nodes.
All other code for PROGRESS is new code.
I don't believe explain.c code can be fully shared with the one of the new
PROGRESS command. These 2 commands have different purposes.
The core code used for the new PROGRESS command is very different from the
core code used for EXPLAIN.
I only extracted some common code from explain.c and put it in report.c
which is used by progress.c.This code is valid for Plan and PlanState.
The code shared is:
ReportPreScanNode() renamed from ExplainPreScanNode()
ReportBeginOutput() renamed from ExplainBeginOutput()
ReportEndOutput() renamed from ExplainEndOutput()
ReportOpenGroup() ...
ReportProperties() ...
ReportPropertyText() ...
ReportHasChildren() ...
ReportSubPlans() ...
ReportMemberNodes() ...
ReportCustomChildren() ...
ReportCloseGroup() ...
ExplainState has been renamed ReportState.
Regarding the queryDesc state of SQL query upon receiving a request to
report its execution progress, it does not bring any issue. The request is
noted when the signal is received by the monitored backend. Then, the
backend continues its execution code path. When interrupts are checked in
the executor code, the request will be dealt.
When the request is being dealt, the monitored backend will stop its
execution and report the progress of the SQL query. Whatever is the status
of the SQL query, progress.c code checks the status and report either that
the SQL query does not have a valid status, or otherwise the current
execution state of the SQL query.
SQL query status checking is about:
- idle transaction
- out of transaction status
- null planned statement
- utility command
- self monitoring
Other tests can be added if needed to exclude some SQL query state. Such
checking is done in void HandleProgressRequest(void).
I do not see why a SQL query progression would not be possible in this
context. Even when the queryDescc is NULL, we can just report a <idle
transaction> output. This is currently the case with the patch suggested.
So far, I've found this new command very handy. It allows to evaluate the
time needed to complete a SQL query.
A further improvement would be to report the memory, disk and time
resources used by the monitored backend. An overuse of memory, disk and
time resources can prevent the SQL query to complete.
Best regards
Remi
2017-04-18 15:00 GMT+02:00 Maksim Milyutin <m.milyutin@postgrespro.ru>:
Show quoted text
Hi!
On 17.04.2017 15:09, Remi Colinet wrote:
Hello,
I've implemented a new command named PROGRESS to monitor progression of
long running SQL queries in a backend process.Use case
=======A use case is shown in the below example based on a table named t_10m
with 10 millions rows.The table has been created with :
CREATE TABLE T_10M ( id integer, md5 text);
INSERT INTO T_10M SELECT generate_series(1,10000000) AS id,
md5(random()::text) AS md5;1/ Start a first psql session to run long SQL queries:
[pgadm@rco ~]$ psql -A -d test
psql (10devel)
Type "help" for help.
test=#The option -A is used to allow rows to be output without formatting work.
Redirect output to a file in order to let the query run without terminal
interaction:
test=# \o outStart a long running query:
test=# select * from t_10M order by md5;2/ In a second psql session, list the backend pid and their SQL query
[pgadm@rco ~]$ psql -d test
psql (10devel)
Type "help" for help.test=# select pid, query from pg_stat_activity ;
pid | query
-------+-------------------------------------------
19081 |
19084 |
19339 | select pid, query from pg_stat_activity ;
19341 | select * from t_10m order by md5;
19727 | select * from t_10m order by md5;
19726 | select * from t_10m order by md5;
19079 |
19078 |
19080 |
(9 rows)test=#
Chose the pid of the backend running the long SQL query to be monitored.
Above example is a parallel SQL query. Lowest pid is the main backend of
the query.test=# PROGRESS 19341;
PLAN
PROGRESS
------------------------------------------------------------
-------------------------------
Gather Merge
-> Sort=> dumping tuples to tapes
rows r/w merge 0/0 rows r/w effective 0/2722972 0%
Sort Key: md5
-> Parallel Seq Scan on t_10m => rows 2751606/3954135 69% blks
125938/161222 78%
(5 rows)test=#
The query of the monitored backend is:
test=# select * from t_10M order by md5;Because the table has 10 millions of rows, the sort is done on tapes.
Design of the command
=================The design of the patch/command is:
- the user issue the "PROGRESS pid" command from a psql session. The pid
is the one of the backend which runs the SQL query for which we want to
get a progression report. It can be determined from the view
pg_stat_activity.
- the monitoring backend, upon receiving the "PROGRESS pid" command from
psql utility used in step above, sends a signal to the backend whose
process pid is the one provided in the PROGRESS command.
- the monitored backend receives the signal and notes the request as for
any interrupt. Then, it continues its execution of its SQL query until
interrupts can be serviced.
- when the monitored process can service the interrupts, it deals with
the progress request by collecting its execution tree with the execution
progress of each long running node. At this time, the SQL query is no
more running. The progression of each node is calculated during the
execution of the SQL query which is at this moment stopped. The
execution tree is dumped in shared memory pages allocated at the start
of the server. Then, the monitored backend set a latch on which the
monitoring process is waiting for. It then continues executing its SQL
query.
- the monitoring backend collects the share memory data dumped by the
monitored backed, and sends it to its psql session, as a list of rows.The command PROGRESS does not incur any slowness on the running query
because the execution progress is only computed upon receiving the
progress request which is supposed to be seldom used.The code heavily reuses the one of the explain command. In order to
share as much code as possible with the EXPLAIN command, part of the
EXPLAIN code which deals with reporting quals for instance, has been
moved to a new report.c file in the src/backend/commands folder. This
code in report.c is shared between explain.c source code and PROGRESS
command source code which is in progress.c file.The progression reported by PROGRESS command is given in terms of rows,
blocks, bytes and percents. The values displayed depend on the node type
in the execution plan.The current patch implements all the possible nodes which could take a
lot of time:
- Sequential scan nodes with rows and block progress (node type
T_SeqScan, T_SampleScan, T_BitmapHeaepScan, T_SubqueryScan,
T_FunctionScan, T_ValuesScan, T_CteScan, T_WorkTableScan)
- Tuple id scan node with rows and blocks progress (T_TidScan)
- Limit node with rows progress (T_Limit)
- Foreign and custom scan with rows and blocks progress (T_ForeignScan,
T_CustomScan)
- Index scan, index only scan and bitmap index scan with rows and blocks
progressUse cases
========Some further examples of use are shown below in the test_v1.txt file.
What do you make of this idea/patch?
Does it make sense?
Any suggestion is welcome.
The current patch is still work in progress. It is meanwhile stable. It
can be used with regular queries. Utilities commands are not supported
for the moment.
Documentation is not yet written.Regards
RemiI had implemented analogous feature as extension *pg_query_state* [1] the
idea of which I proposed in the thread [2]. Together with this extension I
provided some patches to postgres core to enable to send custom signals to
working backend (similar to your PROCSIG_PROGRESS) and to print the current
query state through patches in 'ExplainNode' function.I had implemented the same mechanics as you:
1) interrupt the working backend through ProcSignal;
2) handle progress request in the CHECK_FOR_INTERRUPTS entry;
3) transfer query state through shared memory to caller.
But criticism of my approach was that the structure 'QueryDesc' on basis
of which query state is formed can be inconsistent in the places where
CHECK_FOR_INTERRUPTS appears [3].I plan to propose the custom_signal patch to community as soon as possible
and as consequence release *pg_query_state* from dependency on patches to
postgres core. In perspective, I want to resolve the problem related to
binding to the CHECK_FOR_INTERRUPTS entries perhaps through patching the
executor and implement the robust PROGRESS command.1. https://github.com/postgrespro/pg_query_state
2. /messages/by-id/dbfb1a42-ee58-88fd-8d7
7-550498f52bc5%40postgrespro.ru
3. /messages/by-id/24182.1472745492@sss.pgh.pa.us--
Maksim Milyutin
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company
On 18.04.2017 17:39, Remi Colinet wrote:
Hello Maksim,
The core implementation I suggested for the new PROGRESS command uses
different functions from the one used by EXPLAIN for its core
implementation.
Some source code is shared with EXPLAIN command. But this shared code is
only related to quals, properties, children, subPlans and few other nodes.All other code for PROGRESS is new code.
I don't believe explain.c code can be fully shared with the one of the
new PROGRESS command. These 2 commands have different purposes.
The core code used for the new PROGRESS command is very different from
the core code used for EXPLAIN.
Perhaps you will be forced to duplicate significant snippets of
functionality from explain.c into your progress.c.
Regarding the queryDesc state of SQL query upon receiving a request to
report its execution progress, it does not bring any issue. The request
is noted when the signal is received by the monitored backend. Then, the
backend continues its execution code path. When interrupts are checked
in the executor code, the request will be dealt.
Yes, interrupts are checked in the CHECK_FOR_INTERRUPTS entries.
When the request is being dealt, the monitored backend will stop its
execution and report the progress of the SQL query. Whatever is the
status of the SQL query, progress.c code checks the status and report
either that the SQL query does not have a valid status, or otherwise the
current execution state of the SQL query.SQL query status checking is about:
- idle transaction
- out of transaction status
- null planned statement
- utility command
- self monitoringOther tests can be added if needed to exclude some SQL query state. Such
checking is done in void HandleProgressRequest(void).
I do not see why a SQL query progression would not be possible in this
context. Even when the queryDescc is NULL, we can just report a <idle
transaction> output. This is currently the case with the patch suggested.
It's interesting question - how much the active query is in a usable
state on the stage of execution. Tom Lane noticed that
CHECK_FOR_INTERRUPTS doesn't give us 100% guarantee about full
consistency [1].
So far, I've found this new command very handy. It allows to evaluate
the time needed to complete a SQL query.
Could you explain how you get the percent of execution for nodes of plan
tree and overall for query?
A further improvement would be to report the memory, disk and time
resources used by the monitored backend. An overuse of memory, disk and
time resources can prevent the SQL query to complete.
This functionality is somehow implemented in explain.c. You can see my
patch to this file [2]. I only manipulate runtime statistics (data in
the structure 'Instrumentation') to achieve the printing state of
running query.
1. /messages/by-id/24182.1472745492@sss.pgh.pa.us
2.
https://github.com/postgrespro/pg_query_state/blob/master/runtime_explain.patch
--
Maksim Milyutin
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Maksim,
2017-04-18 20:31 GMT+02:00 Maksim Milyutin <m.milyutin@postgrespro.ru>:
On 18.04.2017 17:39, Remi Colinet wrote:
Hello Maksim,
The core implementation I suggested for the new PROGRESS command uses
different functions from the one used by EXPLAIN for its core
implementation.
Some source code is shared with EXPLAIN command. But this shared code is
only related to quals, properties, children, subPlans and few other nodes.All other code for PROGRESS is new code.
I don't believe explain.c code can be fully shared with the one of the
new PROGRESS command. These 2 commands have different purposes.
The core code used for the new PROGRESS command is very different from
the core code used for EXPLAIN.Perhaps you will be forced to duplicate significant snippets of
functionality from explain.c into your progress.c.
Currently, few code is duplicated between EXPLAIN and PROGRESS commands.
The duplicated code could be moved to file src/backend/commands/report.c
which is used to gather shared code between the 2 commands. I will try to
complete this code sharing as much as possible.
The main point is that PROGRESS uses the same design pattern as EXPLAIN by
parsing the query tree. The work horse of the PROGRESS command is
ProgressNode() which calls recursively sub nodes until we reach leaf nodes
such as SeqScan, IndexScan, TupleStore, Sort, Material, ... . EXPLAIN
command uses a similar work horse with function ExplainNode() which
eventually calls different leaf nodes.
Some of the leaf nodes which are common to the 2 commands have been put in
the file src/backend/commands/report.c. May be some further code sharing is
also possible for the work horse by using a template function which would
call EXPLAIN specific leaf node functions or PROGRESS specific leaf node
functions.
Regarding the queryDesc state of SQL query upon receiving a request to
report its execution progress, it does not bring any issue. The request
is noted when the signal is received by the monitored backend. Then, the
backend continues its execution code path. When interrupts are checked
in the executor code, the request will be dealt.Yes, interrupts are checked in the CHECK_FOR_INTERRUPTS entries.
When the request is being dealt, the monitored backend will stop its
execution and report the progress of the SQL query. Whatever is the
status of the SQL query, progress.c code checks the status and report
either that the SQL query does not have a valid status, or otherwise the
current execution state of the SQL query.SQL query status checking is about:
- idle transaction
- out of transaction status
- null planned statement
- utility command
- self monitoringOther tests can be added if needed to exclude some SQL query state. Such
checking is done in void HandleProgressRequest(void).
I do not see why a SQL query progression would not be possible in this
context. Even when the queryDescc is NULL, we can just report a <idle
transaction> output. This is currently the case with the patch suggested.It's interesting question - how much the active query is in a usable state
on the stage of execution. Tom Lane noticed that CHECK_FOR_INTERRUPTS
doesn't give us 100% guarantee about full consistency [1].
I wonder what you mean about usable state.
Currently, the code suggested tests the queryDesc pointer and all the sub
nodes pointers in order to detect NULL pointers. When the progress report
is collected by the backend, this backend does the collect and consequently
does not run the query. So the query tree is not being modified. At this
moment, whatever is the query state, we can manage to deal with its static
state. It is only a tree which could also be just a NULL pointer in the
most extreme case. Such case is dealt in the current code.
So far, I've found this new command very handy. It allows to evaluate
the time needed to complete a SQL query.
Could you explain how you get the percent of execution for nodes of plan
tree and overall for query?
The progress of execution of the query is computed as follows at 2
different places for each leaf node type (Scan, IndexScan, Sort, Material,
TupleStore, ...):
- one place in the executor code, or in access methods code, or in sort
utilities code, used during the execution of the SQL query in which
following values are counted for instance: rows R/W, blocks, R/W, tapes R/W
used for sort, tuple store R/W, ... . Some of these values are already
computed in the current Postgresql official source code. Some other values
had to be added and collected.
- one place in the leaf function of each node type (ProgressScan(),
ProgressSort(), ...) in which percents are computed and are then dumped
together with raw values collected during execution, in the report. The
dump details can be selected with the VERBOSE option of the PROGRESS
command (For instance # PROGRESS VERBOSE $pid)
For instance:
1/ read/write rows are collected when running the executor in the file
src/backend/executor/execProcnode.c
==============================================================================
This is already implemented in the current official source tree. Nothing
mode needs to be implemented to collect values (total rows number and
current rows number already fetched are collected).
The report is implemented in leaf function ProgressScanRows().
2/ read/write blocks are collected in the file
src/backend/access/heap/heapam.c
==========================================================
This is already implemented in the current official source tree. Nothing
more needs to be implemented to collect values during execution.
The report is implemented in a leaf function ProgressScanBlks().
3/ the sort progressions are collected in the file
src/backend/utils/sort/tuplesort.c
==========================================================
[root@rco pg]# git diff --stat master.. src/backend/utils/sort/tuplesort.c
src/backend/utils/sort/tuplesort.c | 142
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-------------
1 file changed, 127 insertions(+), 15 deletions(-)
[root@rco pg]#
New fields have been added to compute the different I/O
(read/write/merge/...) per tapes for instance, during a sort on tape.
The report of Sort node is implemented in leaf function ProgressSort()
4/ the tuple store progressions are computed in the file
src/backend/utils/sort/tuplestore.c
=================================================================
[root@rco pg]# git diff --stat master.. src/backend/utils/sort/tuplestore.c
src/backend/utils/sort/tuplestore.c | 73
++++++++++++++++++++++++++++++++++++++++++++++++++++++++-----------------
1 file changed, 56 insertions(+), 17 deletions(-)
[root@rco pg]#
New fields have been added to collect the I/O needed for such tuple store.
The report of TupleStore node is implemented in leaf function
ProgressTupleStore().
Other node types have been implemented: TidScan, IndexScan, LimitScan,
CustomScan, Hash, ModifyTable.
Such node may require some new fields to collect values during the SQL
query execution.
Overall, the overhead caused by new values collected during the SQL query
execution, is very low.
A few values need to be collected.
A further improvement would be to report the memory, disk and time
resources used by the monitored backend. An overuse of memory, disk and
time resources can prevent the SQL query to complete.This functionality is somehow implemented in explain.c. You can see my
patch to this file [2]. I only manipulate runtime statistics (data in the
structure 'Instrumentation') to achieve the printing state of running query.I will check your patch and try to add such feature to the current patch.
It provides a valuable hint to estimate whether a SQL query has a chance to
complete and will not reached the resource limits.
.
1. /messages/by-id/24182.1472745492@sss.pgh.pa.us
2. https://github.com/postgrespro/pg_query_state/blob/master/
runtime_explain.patchThanks for your suggestions and comments
Regards
Remi
Show quoted text
--
Maksim Milyutin
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company
Following on previous email....
I have added below some use cases which I find very relevant when we need
to know the progress of a SQL query.
The command can be used by any SQL query (select, update, delete, insert).
The tables used have been created with :
CREATE TABLE T_1M (id integer, md5 text);
INSERT INTO T_1M SELECT generate_series(1,1000000) AS id,
md5(random()::text) AS md5;
CREATE TABLE T_10M ( id integer, md5 text);
INSERT INTO T_10M SELECT generate_series(1,10000000) AS id,
md5(random()::text) AS md5;
All the different leaf node types are implemented.
1/ Parallel select with sort (no index)
===========================
=> Terminal running the long SQL query:
test=# select * from t_10M order by md5;
=> Terminal monitoring SQL query progression:
test=# select pid,query from pg_stat_activity ;
pid | query
-------+------------------------------------------
8062 |
8065 |
19605 | select pid,query from pg_stat_activity ;
20830 | select * from t_10M order by md5;
20832 | select * from t_10M order by md5;
20833 | select * from t_10M order by md5;
8060 |
8059 |
8061 |
(9 rows)
test=# PROGRESS 20830
test-# ;
PLAN PROGRESS
------------------------------------------------------------------------
Gather Merge
-> Sort=> dumping tuples to tapes / merging tapes
rows r/w merge 2167923/2167908 rows r/w effective 0/3514320 0%
Sort Key: md5
-> Parallel Seq Scan on t_10m => rows 3514321/4166700 84%
(5 rows)
test=#
test=#
test=# PROGRESS 20830;
PLAN PROGRESS
----------------------------------------------------------------------------
Gather Merge
-> Sort=> final merge sort on tapes
rows r/w merge 4707198/4691167 rows r/w effective 16016/3514320 0%
Sort Key: md5
-> Parallel Seq Scan on t_10m => rows 3514321/4166700 84%
(5 rows)
test=# PROGRESS 20830;
PLAN
PROGRESS
-----------------------------------------------------------------------------
Gather Merge
-> Sort=> final merge sort on tapes
rows r/w merge 4809857/4691167 rows r/w effective 118675/3514320 3%
Sort Key: md5
-> Parallel Seq Scan on t_10m => rows 3514321/4166700 84%
(5 rows)
test=# PROGRESS 20830;
PLAN
PROGRESS
-----------------------------------------------------------------------------
Gather Merge
-> Sort=> final merge sort on tapes
rows r/w merge 4883715/4691167 rows r/w effective 192533/3514320 5%
Sort Key: md5
-> Parallel Seq Scan on t_10m => rows 3514321/4166700 84%
(5 rows)
test=# PROGRESS 20830;
PLAN
PROGRESS
-----------------------------------------------------------------------------
Gather Merge
-> Sort=> final merge sort on tapes
rows r/w merge 4948381/4691167 rows r/w effective 257199/3514320 7%
Sort Key: md5
-> Parallel Seq Scan on t_10m => rows 3514321/4166700 84%
(5 rows)
test=# PROGRESS 20830;
PLAN
PROGRESS
-----------------------------------------------------------------------------
Gather Merge
-> Sort=> final merge sort on tapes
rows r/w merge 5022137/4691167 rows r/w effective 330955/3514320 9%
Sort Key: md5
-> Parallel Seq Scan on t_10m => rows 3514321/4166700 84%
(5 rows)
test=# PROGRESS 20830;
PLAN
PROGRESS
------------------------------------------------------------------------------
Gather Merge
-> Sort=> final merge sort on tapes
rows r/w merge 5079083/4691167 rows r/w effective 387901/3514320
11%
Sort Key: md5
-> Parallel Seq Scan on t_10m => rows 3514321/4166700 84%
(5 rows)
test=# PROGRESS 20830;
PLAN
PROGRESS
------------------------------------------------------------------------------
Gather Merge
-> Sort=> final merge sort on tapes
rows r/w merge 5144499/4691167 rows r/w effective 453317/3514320
12%
Sort Key: md5
-> Parallel Seq Scan on t_10m => rows 3514321/4166700 84%
(5 rows)
test=# PROGRESS 20830;
PLAN PROGRESS
----------------------
<out of transaction>
(1 row)
test=#
SQL query was interrupted before completion
2/ Insert into table
=============
=> Terminal running the long SQL query:
test=# INSERT INTO T_10M SELECT generate_series(10000001, 12000000) AS id,
md5(random()::text) AS md5;
=> Terminal monitoring SQL query progression:
test=# PROGRESS 20830;
PLAN PROGRESS
----------------------
<out of transaction>
(1 row)
test=#
test=# PROGRESS 20830;
PLAN PROGRESS
-----------------------
Insert => rows 718161
-> ProjectSet
-> Result
(3 rows)
test=# PROGRESS 20830;
PLAN PROGRESS
------------------------
Insert => rows 1370255
-> ProjectSet
-> Result
(3 rows)
test=# PROGRESS 20830;
PLAN PROGRESS
------------------------
Insert => rows 1916731
-> ProjectSet
-> Result
(3 rows)
test=# PROGRESS 20830;
PLAN PROGRESS
----------------
<idle backend>
(1 row)
test=#
3/ Delete with like clause
===================
=> Terminal running the long SQL query:
test=# DELETE FROM T_10M WHERE md5 like '%cb%';
=> Terminal monitoring SQL query progression:
test=# PROGRESS 20830;
PLAN PROGRESS
----------------
<idle backend>
(1 row)
test=# PROGRESS 20830;
PLAN PROGRESS
----------------------------------------------------------------------
Delete => rows 91906
-> Seq Scan on t_10m => rows 91906/848485 10% blks 6668/100000 6%
Filter: (md5 ~~ '%cb%'::text)
(3 rows)
test=# PROGRESS 20830;
PLAN PROGRESS
-------------------------------------------------------------------------
Delete => rows 151900
-> Seq Scan on t_10m => rows 151900/848485 17% blks 11019/100000 11%
Filter: (md5 ~~ '%cb%'::text)
(3 rows)
test=# PROGRESS 20830;
PLAN PROGRESS
-------------------------------------------------------------------------
Delete => rows 309533
-> Seq Scan on t_10m => rows 309533/848485 36% blks 22471/100000 22%
Filter: (md5 ~~ '%cb%'::text)
(3 rows)
test=# PROGRESS 20830;
PLAN PROGRESS
-------------------------------------------------------------------------
Delete => rows 705968
-> Seq Scan on t_10m => rows 705968/848485 83% blks 51274/100000 51%
Filter: (md5 ~~ '%cb%'::text)
(3 rows)
test=# PROGRESS 20830;
PLAN PROGRESS
--------------------------------------------------------------------------
Delete => rows 913843
-> Seq Scan on t_10m => rows 913843/848485 107% blks 66417/100000 66%
Filter: (md5 ~~ '%cb%'::text)
(3 rows)
test=# PROGRESS 20830;
PLAN PROGRESS
---------------------------------------------------------------------------
Delete => rows 1113104
-> Seq Scan on t_10m => rows 1113104/848485 131% blks 80881/100000 80%
Filter: (md5 ~~ '%cb%'::text)
(3 rows)
test=# PROGRESS 20830;
PLAN PROGRESS
----------------
<idle backend>
(1 row)
test=#
Above monitoring report shows:
- The Seq Scan node with the number of rows scanned, the number of blocks
scanned/read.
- The Delete node with the number of rows deleted.
4/ Select with offset and limit clause
===========================
=> Terminal running the long SQL query:
test=#select * from t_10M order by md5 offset 80 limit 10;
=> Terminal monitoring SQL query progression:
test=# \watch PROGRESS 20830;
Wed 19 Apr 2017 04:36:16 PM CEST (every 1s)
PLAN PROGRESS
----------------
<idle backend>
(1 row)
Wed 19 Apr 2017 04:36:17 PM CEST (every 1s)
PLAN PROGRESS
----------------
<idle backend>
(1 row)
Wed 19 Apr 2017 04:36:18 PM CEST (every 1s)
PLAN
PROGRESS
------------------------------------------------------------------------------
Limit => offset 0% limit 0%
-> Sort=> loading tuples in memory 90
Sort Key: md5
-> Seq Scan on t_10m => rows 174392/11586584 1% blks 1640/100000
1%
(4 rows)
...
PLAN
PROGRESS
----------------------------------------------------------------------------------
Limit => offset 0% limit 0%
-> Sort=> loading tuples in memory 90
Sort Key: md5
-> Seq Scan on t_10m => rows 1656828/11586584 14% blks
15600/100000 15%
(4 rows)
...
PLAN
PROGRESS
----------------------------------------------------------------------------------
Limit => offset 0% limit 0%
-> Sort=> loading tuples in memory 90
Sort Key: md5
-> Seq Scan on t_10m => rows 4954207/11586584 42% blks
46640/100000 46%
(4 rows)
Wed 19 Apr 2017 04:36:35 PM CEST (every 1s)
...
PLAN
PROGRESS
----------------------------------------------------------------------------------
Limit => offset 0% limit 0%
-> Sort=> loading tuples in memory 90
Sort Key: md5
-> Seq Scan on t_10m => rows 7837687/11586584 67% blks
73772/100000 73%
(4 rows)
Wed 19 Apr 2017 04:36:41 PM CEST (every 1s)
...
PLAN
PROGRESS
-----------------------------------------------------------------------------------
Limit => offset 0% limit 0%
-> Sort=> loading tuples in memory 90
Sort Key: md5
-> Seq Scan on t_10m => rows 10378786/11586584 89% blks
97690/100000 97%
(4 rows)
Wed 19 Apr 2017 04:36:49 PM CEST (every 1s)
PLAN PROGRESS
----------------
<idle backend>
(1 row)
5/ Sample scan
=============
=> Terminal running the long SQL query:
# select * from t_10m tablesample system(50);
=> Terminal monitoring SQL query progression:
PLAN PROGRESS
----------------
<idle backend>
(1 row)
Wed 19 Apr 2017 04:44:12 PM CEST (every 1s)
PLAN PROGRESS
-----------------------------------------------------------------------
Sample Scan on t_10m => rows 783274/5793292 13% blks 14616/100000 14%
Sampling: system ('50'::real)
(2 rows)
Wed 19 Apr 2017 04:44:13 PM CEST (every 1s)
PLAN PROGRESS
------------------------------------------------------------------------
Sample Scan on t_10m => rows 2514675/5793292 43% blks 47076/100000 47%
Sampling: system ('50'::real)
(2 rows)
Wed 19 Apr 2017 04:44:14 PM CEST (every 1s)
PLAN PROGRESS
------------------------------------------------------------------------
Sample Scan on t_10m => rows 4031400/5793292 69% blks 75625/100000 75%
Sampling: system ('50'::real)
(2 rows)
Wed 19 Apr 2017 04:44:15 PM CEST (every 1s)
PLAN PROGRESS
----------------
<idle backend>
(1 row)
2017-04-19 16:13 GMT+02:00 Remi Colinet <remi.colinet@gmail.com>:
Show quoted text
Maksim,
2017-04-18 20:31 GMT+02:00 Maksim Milyutin <m.milyutin@postgrespro.ru>:
On 18.04.2017 17:39, Remi Colinet wrote:
Hello Maksim,
The core implementation I suggested for the new PROGRESS command uses
different functions from the one used by EXPLAIN for its core
implementation.
Some source code is shared with EXPLAIN command. But this shared code is
only related to quals, properties, children, subPlans and few other
nodes.All other code for PROGRESS is new code.
I don't believe explain.c code can be fully shared with the one of the
new PROGRESS command. These 2 commands have different purposes.
The core code used for the new PROGRESS command is very different from
the core code used for EXPLAIN.Perhaps you will be forced to duplicate significant snippets of
functionality from explain.c into your progress.c.Currently, few code is duplicated between EXPLAIN and PROGRESS commands.
The duplicated code could be moved to file src/backend/commands/report.c
which is used to gather shared code between the 2 commands. I will try to
complete this code sharing as much as possible.The main point is that PROGRESS uses the same design pattern as EXPLAIN by
parsing the query tree. The work horse of the PROGRESS command is
ProgressNode() which calls recursively sub nodes until we reach leaf nodes
such as SeqScan, IndexScan, TupleStore, Sort, Material, ... . EXPLAIN
command uses a similar work horse with function ExplainNode() which
eventually calls different leaf nodes.Some of the leaf nodes which are common to the 2 commands have been put in
the file src/backend/commands/report.c. May be some further code sharing is
also possible for the work horse by using a template function which would
call EXPLAIN specific leaf node functions or PROGRESS specific leaf node
functions.Regarding the queryDesc state of SQL query upon receiving a request to
report its execution progress, it does not bring any issue. The request
is noted when the signal is received by the monitored backend. Then, the
backend continues its execution code path. When interrupts are checked
in the executor code, the request will be dealt.Yes, interrupts are checked in the CHECK_FOR_INTERRUPTS entries.
When the request is being dealt, the monitored backend will stop its
execution and report the progress of the SQL query. Whatever is the
status of the SQL query, progress.c code checks the status and report
either that the SQL query does not have a valid status, or otherwise the
current execution state of the SQL query.SQL query status checking is about:
- idle transaction
- out of transaction status
- null planned statement
- utility command
- self monitoringOther tests can be added if needed to exclude some SQL query state. Such
checking is done in void HandleProgressRequest(void).
I do not see why a SQL query progression would not be possible in this
context. Even when the queryDescc is NULL, we can just report a <idle
transaction> output. This is currently the case with the patch suggested.It's interesting question - how much the active query is in a usable
state on the stage of execution. Tom Lane noticed that CHECK_FOR_INTERRUPTS
doesn't give us 100% guarantee about full consistency [1].I wonder what you mean about usable state.
Currently, the code suggested tests the queryDesc pointer and all the sub
nodes pointers in order to detect NULL pointers. When the progress report
is collected by the backend, this backend does the collect and consequently
does not run the query. So the query tree is not being modified. At this
moment, whatever is the query state, we can manage to deal with its static
state. It is only a tree which could also be just a NULL pointer in the
most extreme case. Such case is dealt in the current code.So far, I've found this new command very handy. It allows to evaluate
the time needed to complete a SQL query.
Could you explain how you get the percent of execution for nodes of plan
tree and overall for query?The progress of execution of the query is computed as follows at 2
different places for each leaf node type (Scan, IndexScan, Sort, Material,
TupleStore, ...):- one place in the executor code, or in access methods code, or in sort
utilities code, used during the execution of the SQL query in which
following values are counted for instance: rows R/W, blocks, R/W, tapes R/W
used for sort, tuple store R/W, ... . Some of these values are already
computed in the current Postgresql official source code. Some other values
had to be added and collected.- one place in the leaf function of each node type (ProgressScan(),
ProgressSort(), ...) in which percents are computed and are then dumped
together with raw values collected during execution, in the report. The
dump details can be selected with the VERBOSE option of the PROGRESS
command (For instance # PROGRESS VERBOSE $pid)For instance:
1/ read/write rows are collected when running the executor in the file
src/backend/executor/execProcnode.c
============================================================
==================This is already implemented in the current official source tree. Nothing
mode needs to be implemented to collect values (total rows number and
current rows number already fetched are collected).
The report is implemented in leaf function ProgressScanRows().2/ read/write blocks are collected in the file src/backend/access/heap/
heapam.c
==========================================================This is already implemented in the current official source tree. Nothing
more needs to be implemented to collect values during execution.
The report is implemented in a leaf function ProgressScanBlks().3/ the sort progressions are collected in the file src/backend/utils/sort/
tuplesort.c
==========================================================[root@rco pg]# git diff --stat master.. src/backend/utils/sort/tuplesort.c src/backend/utils/sort/tuplesort.c | 142 ++++++++++++++++++++++++++++++ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ ++++++++++++++------------- 1 file changed, 127 insertions(+), 15 deletions(-) [root@rco pg]#New fields have been added to compute the different I/O
(read/write/merge/...) per tapes for instance, during a sort on tape.
The report of Sort node is implemented in leaf function ProgressSort()4/ the tuple store progressions are computed in the file
src/backend/utils/sort/tuplestore.c
=================================================================[root@rco pg]# git diff --stat master.. src/backend/utils/sort/
tuplestore.c
src/backend/utils/sort/tuplestore.c | 73 ++++++++++++++++++++++++++++++
++++++++++++++++++++++++++-----------------
1 file changed, 56 insertions(+), 17 deletions(-)
[root@rco pg]#New fields have been added to collect the I/O needed for such tuple store.
The report of TupleStore node is implemented in leaf function
ProgressTupleStore().Other node types have been implemented: TidScan, IndexScan, LimitScan,
CustomScan, Hash, ModifyTable.
Such node may require some new fields to collect values during the SQL
query execution.Overall, the overhead caused by new values collected during the SQL query
execution, is very low.
A few values need to be collected.A further improvement would be to report the memory, disk and time
resources used by the monitored backend. An overuse of memory, disk and
time resources can prevent the SQL query to complete.This functionality is somehow implemented in explain.c. You can see my
patch to this file [2]. I only manipulate runtime statistics (data in the
structure 'Instrumentation') to achieve the printing state of running query.I will check your patch and try to add such feature to the current patch.
It provides a valuable hint to estimate whether a SQL query has a chance
to complete and will not reached the resource limits.
.1. /messages/by-id/24182.1472745492@sss.pgh.pa.us
2. https://github.com/postgrespro/pg_query_state/blob/master/ru
ntime_explain.patchThanks for your suggestions and comments
Regards
Remi--
Maksim Milyutin
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company
On 19.04.2017 17:13, Remi Colinet wrote:
Maksim,
2017-04-18 20:31 GMT+02:00 Maksim Milyutin <m.milyutin@postgrespro.ru
<mailto:m.milyutin@postgrespro.ru>>:On 18.04.2017 17:39, Remi Colinet wrote:
Regarding the queryDesc state of SQL query upon receiving a
request to
report its execution progress, it does not bring any issue. The
request
is noted when the signal is received by the monitored backend.
Then, the
backend continues its execution code path. When interrupts are
checked
in the executor code, the request will be dealt.Yes, interrupts are checked in the CHECK_FOR_INTERRUPTS entries.
When the request is being dealt, the monitored backend will stop its
execution and report the progress of the SQL query. Whatever is the
status of the SQL query, progress.c code checks the status and
report
either that the SQL query does not have a valid status, or
otherwise the
current execution state of the SQL query.SQL query status checking is about:
- idle transaction
- out of transaction status
- null planned statement
- utility command
- self monitoringOther tests can be added if needed to exclude some SQL query
state. Such
checking is done in void HandleProgressRequest(void).
I do not see why a SQL query progression would not be possible
in this
context. Even when the queryDescc is NULL, we can just report a
<idle
transaction> output. This is currently the case with the patch
suggested.It's interesting question - how much the active query is in a usable
state on the stage of execution. Tom Lane noticed that
CHECK_FOR_INTERRUPTS doesn't give us 100% guarantee about full
consistency [1].I wonder what you mean about usable state.
A usable query state is suitable for analysis, IOW we have consistent
QueryDesc object. This term was introduced by Tom Lane in [1]. I suppose
he meant the case when a query fails with error and before transaction
aborts we bump into *CHECK_FOR_INTERRUPTS* in the place where QueryDesc
may be inconsistent and further reading from it will give us invalid result.
Currently, the code suggested tests the queryDesc pointer and all the
sub nodes pointers in order to detect NULL pointers. When the progress
report is collected by the backend, this backend does the collect and
consequently does not run the query. So the query tree is not being
modified. At this moment, whatever is the query state, we can manage to
deal with its static state. It is only a tree which could also be just a
NULL pointer in the most extreme case. Such case is dealt in the current
code.
Perhaps the deep checking of QueryDesc would allow us to consider it as
consistent.
1. /messages/by-id/24182.1472745492@sss.pgh.pa.us
--
Maksim Milyutin
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
2017-04-19 18:41 GMT+02:00 Maksim Milyutin <m.milyutin@postgrespro.ru>:
On 19.04.2017 17:13, Remi Colinet wrote:
Maksim,
2017-04-18 20:31 GMT+02:00 Maksim Milyutin <m.milyutin@postgrespro.ru
<mailto:m.milyutin@postgrespro.ru>>:On 18.04.2017 17:39, Remi Colinet wrote:
Regarding the queryDesc state of SQL query upon receiving a
request to
report its execution progress, it does not bring any issue. The
request
is noted when the signal is received by the monitored backend.
Then, the
backend continues its execution code path. When interrupts are
checked
in the executor code, the request will be dealt.Yes, interrupts are checked in the CHECK_FOR_INTERRUPTS entries.
When the request is being dealt, the monitored backend will stop
its
execution and report the progress of the SQL query. Whatever is
the
status of the SQL query, progress.c code checks the status and
report
either that the SQL query does not have a valid status, or
otherwise the
current execution state of the SQL query.SQL query status checking is about:
- idle transaction
- out of transaction status
- null planned statement
- utility command
- self monitoringOther tests can be added if needed to exclude some SQL query
state. Such
checking is done in void HandleProgressRequest(void).
I do not see why a SQL query progression would not be possible
in this
context. Even when the queryDescc is NULL, we can just report a
<idle
transaction> output. This is currently the case with the patch
suggested.It's interesting question - how much the active query is in a usable
state on the stage of execution. Tom Lane noticed that
CHECK_FOR_INTERRUPTS doesn't give us 100% guarantee about full
consistency [1].I wonder what you mean about usable state.
A usable query state is suitable for analysis, IOW we have consistent
QueryDesc object. This term was introduced by Tom Lane in [1]. I suppose he
meant the case when a query fails with error and before transaction aborts
we bump into *CHECK_FOR_INTERRUPTS* in the place where QueryDesc may be
inconsistent and further reading from it will give us invalid result.
I could indeed trigger a segmentation fault because the nodes of the tree
may be under freeing. Some node may be partially filled for instance. But
each node can be checked against null pointer once the monitored backend is
no more executing its query and is dumping its progress state. So this is
not a big deal in fact.
Show quoted text
Currently, the code suggested tests the queryDesc pointer and all the
sub nodes pointers in order to detect NULL pointers. When the progress
report is collected by the backend, this backend does the collect and
consequently does not run the query. So the query tree is not being
modified. At this moment, whatever is the query state, we can manage to
deal with its static state. It is only a tree which could also be just a
NULL pointer in the most extreme case. Such case is dealt in the current
code.Perhaps the deep checking of QueryDesc would allow us to consider it as
consistent.1. /messages/by-id/24182.1472745492@sss.pgh.pa.us
--
Maksim Milyutin
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company
On Mon, Apr 17, 2017 at 9:09 PM, Remi Colinet <remi.colinet@gmail.com>
wrote:
Hello,
I've implemented a new command named PROGRESS to monitor progression of
long running SQL queries in a backend process.Thank you for the patch.
I am testing your patch but after applying your patch other regression test
failed.
$ make installcheck
13 of 178 tests failed.
Regards,
Vinayak
On 5 May 2017 at 22:38, Vinayak Pokale <vinpokale@gmail.com> wrote:
On Mon, Apr 17, 2017 at 9:09 PM, Remi Colinet <remi.colinet@gmail.com>
wrote:Hello,
I've implemented a new command named PROGRESS to monitor progression of
long running SQL queries in a backend process.Thank you for the patch.
sorry if i'm bikeshedding to soon but... why a command instead of a function?
something like pg_progress_backend() will be in sync with
pg_cancel_backend()/pg_terminate_backend() and the result of such a
function could be usable by a tool to examine a slow query status
--
Jaime Casanova www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Do you have more details about the failed tests?
Regards,
Remi
2017-05-06 5:38 GMT+02:00 Vinayak Pokale <vinpokale@gmail.com>:
Show quoted text
On Mon, Apr 17, 2017 at 9:09 PM, Remi Colinet <remi.colinet@gmail.com>
wrote:Hello,
I've implemented a new command named PROGRESS to monitor progression of
long running SQL queries in a backend process.Thank you for the patch.
I am testing your patch but after applying your patch other regression
test failed.$ make installcheck
13 of 178 tests failed.Regards,
Vinayak
That's a good point.
A command is more straightforward because it targets only one backend.
The user is supposed to know which backend pid is taking a long time to
complete based on pg_stat_activity().
This is somehow the same approach as EXPLAIN command.
But the use is limited to psql utility. And this adds one more command.
I see 2 possible choices:
1 - either convert the command into a table.
This is the way it is done on Oracle database with v$session_longops view.
Obviously, this requires probing the status of each backend. This
inconvenient can be mitigated by using a threeshold of a few seconds before
considering a backend progression. v$session_longops only reports long
running queries after at least 6 seconds of execution.
This is less efficient that targeting directly a given pid or backend id.
But this is far better for SQL.
2 - either convert the command into a function
The advantage of a function is that it can accept parameters. So parameters
could be the pid of the backend, the verbosity level, the format (text,
json, ....).
This would not reduce the options of the current command. And then a view
could be created on top of the function.
May be a mix of both a function with parameters and a view created on the
function is the solution.
Regards
Remi
2017-05-06 5:57 GMT+02:00 Jaime Casanova <jaime.casanova@2ndquadrant.com>:
Show quoted text
On 5 May 2017 at 22:38, Vinayak Pokale <vinpokale@gmail.com> wrote:
On Mon, Apr 17, 2017 at 9:09 PM, Remi Colinet <remi.colinet@gmail.com>
wrote:Hello,
I've implemented a new command named PROGRESS to monitor progression of
long running SQL queries in a backend process.Thank you for the patch.
sorry if i'm bikeshedding to soon but... why a command instead of a
function?
something like pg_progress_backend() will be in sync with
pg_cancel_backend()/pg_terminate_backend() and the result of such a
function could be usable by a tool to examine a slow query status--
Jaime Casanova www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services