Collecting statistics about contents of JSONB columns

Started by Tomas Vondraabout 4 years ago22 messages

tomas.vondra@enterprisedb.com

about 4 years ago

6 attachment(s)

Hi,

One of the complaints I sometimes hear from users and customers using
Postgres to store JSON documents (as JSONB type, of course) is that the
selectivity estimates are often pretty poor.

Currently we only really have MCV and histograms with whole documents,
and we can deduce some stats from that. But that is somewhat bogus
because there's only ~100 documents in either MCV/histogram (with the
default statistics target). And moreover we discard all "oversized"
values (over 1kB) before even calculating those stats, which makes it
even less representative.

A couple weeks ago I started playing with this, and I experimented with
improving extended statistics in this direction. After a while I noticed
a forgotten development branch from 2016 which tried to do this by
adding a custom typanalyze function, which seemed like a more natural
idea (because it's really a statistics for a single column).

But then I went to pgconf NYC in early December, and I spoke to Oleg
about various JSON-related things, and he mentioned they've been working
on this topic some time ago too, but did not have time to pursue it. So
he pointed me to a branch [1]https://github.com/postgrespro/postgres/tree/jsonb_stats developed by Nikita Glukhov.

I like Nikita's branch because it solved a couple architectural issues
that I've been aware of, but only solved them in a rather hackish way.

I had a discussion with Nikita about his approach what can we do to move
it forward. He's focusing on other JSON stuff, but he's OK with me
taking over and moving it forward. So here we go ...

Nikita rebased his branch recently, I've kept improving it in various
(mostly a lot of comments and docs, and some minor fixes and tweaks).
I've pushed my version with a couple extra commits in [2]https://github.com/tvondra/postgres/tree/jsonb_stats_rework, but you can
ignore that except if you want to see what I added/changed.

Attached is a couple patches adding adding the main part of the feature.
There's a couple more commits in the github repositories, adding more
advanced features - I'll briefly explain those later, but I'm not
including them here because those are optional features and it'd be
distracting to include them here.

There are 6 patches in the series, but the magic mostly happens in parts
0001 and 0006. The other parts are mostly just adding infrastructure,
which may be a sizeable amount of code, but the changes are fairly
simple and obvious. So let's focus on 0001 and 0006.

To add JSON statistics we need to do two basic things - we need to build
the statistics and then we need to allow using them while estimating
conditions.

1) building stats

Let's talk about building the stats first. The patch does one of the
things I experimented with - 0006 adds a jsonb_typanalyze function, and
it associates it with the data type. The function extracts paths and
values from the JSONB document, builds the statistics, and then stores
the result in pg_statistic as a new stakind.

I've been planning to store the stats in pg_statistic too, but I've been
considering to use a custom data type. The patch does something far more
elegant - it simply uses stavalues to store an array of JSONB documents,
each describing stats for one path extracted from the sampled documents.

One (very simple) element of the array might look like this:

{"freq": 1,
"json": {
"mcv": {
"values": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
"numbers": [0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1]},
"width": 19,
"distinct": 10,
"nullfrac": 0,
"correlation": 0.10449},
"path": "$.\"a\"",
"freq_null": 0, "freq_array": 0, "freq_object": 0,
"freq_string": 0, "freq_boolean": 0, "freq_numeric": 0}

In this case there's only a MCV list (represented by two arrays, just
like in pg_statistic), but there might be another part with a histogram.
There's also the other columns we'd expect to see in pg_statistic.

In principle, we need pg_statistic for each path we extract from the
JSON documents and decide it's interesting enough for estimation. There
are probably other ways to serialize/represent this, but I find using
JSONB for this pretty convenient because we need to deal with a mix of
data types (for the same path), and other JSON specific stuff. Storing
that in Postgres arrays would be problematic.

I'm sure there's plenty open questions - for example I think we'll need
some logic to decide which paths to keep, otherwise the statistics can
get quite big, if we're dealing with large / variable documents. We're
already doing something similar for MCV lists.

One of Nikita's patches not included in this thread allow "selective"
statistics, where you can define in advance a "filter" restricting which
parts are considered interesting by ANALYZE. That's interesting, but I
think we need some simple MCV-like heuristics first anyway.

Another open question is how deep the stats should be. Imagine documents
like this:

{"a" : {"b" : {"c" : {"d" : ...}}}}

The current patch build stats for all possible paths:

"a"
"a.b"
"a.b.c"
"a.b.c.d"

and of course many of the paths will often have JSONB documents as
values, not simple scalar values. I wonder if we should limit the depth
somehow, and maybe build stats only for scalar values.

2) applying the statistics

One of the problems is how to actually use the statistics. For @>
operator it's simple enough, because it's (jsonb @> jsonb) so we have
direct access to the stats. But often the conditions look like this:

jsonb_column ->> 'key' = 'value'

so the condition is actually on an expression, not on the JSONB column
directly. My solutions were pretty ugly hacks, but Nikita had a neat
idea - we can define a custom procedure for each operator, which is
responsible for "calculating" the statistics for the expression.

So in this case "->>" will have such "oprstat" procedure, which fetches
stats for the JSONB column, extracts stats for the "key" path. And then
we can use that for estimation of the (text = text) condition.

This is what 0001 does, pretty much. We simply look for expression stats
provided by an index, extended statistics, and then - if oprstat is
defined for the operator - we try to derive the stats.

This opens other interesting opportunities for the future - one of the
parts adds oprstat for basic arithmetic operators, which allows deducing
statistics for expressions like (a+10) from statistics on column (a).

Which seems like a neat feature on it's own, but it also interacts with
the extended statistics in somewhat non-obvious ways (especially when
estimating GROUP BY cardinalities).

Of course, there's a limit of what we can reasonably estimate - for
example, there may be statistical dependencies between paths, and this
patch does not even attempt to deal with that. In a way, this is similar
to correlation between columns, except that here we have a dynamic set
of columns, which makes it much much harder. We'd need something like
extended stats on steroids, pretty much.

I'm sure I've forgotten various important bits - many of them are
mentioned or explained in comments, but I'm sure others are not. And I'd
bet there are things I forgot about entirely or got wrong. So feel free
to ask.

In any case, I think this seems like a good first step to improve our
estimates for JSOB columns.

regards

[1]: https://github.com/postgrespro/postgres/tree/jsonb_stats

[2]: https://github.com/tvondra/postgres/tree/jsonb_stats_rework

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

0001-Add-pg_operator.oprstat-for-derived-operato-20211230.patchtext/x-patch; charset=UTF-8; name=0001-Add-pg_operator.oprstat-for-derived-operato-20211230.patchDownload

From 7852524c50239904c7267e149b7fbe728778692b Mon Sep 17 00:00:00 2001
From: Nikita Glukhov <n.gluhov@postgrespro.ru>
Date: Sat, 12 Nov 2016 18:59:43 +0300
Subject: [PATCH 01/10] Add pg_operator.oprstat for derived operator statistics
 estimation

---
 src/backend/catalog/pg_operator.c      | 11 +++++
 src/backend/commands/operatorcmds.c    | 67 ++++++++++++++++++++++++++
 src/backend/utils/adt/selfuncs.c       | 39 +++++++++++++++
 src/backend/utils/cache/lsyscache.c    | 24 +++++++++
 src/include/catalog/pg_operator.h      |  4 ++
 src/include/utils/lsyscache.h          |  1 +
 src/test/regress/expected/oidjoins.out |  1 +
 7 files changed, 147 insertions(+)

diff --git a/src/backend/catalog/pg_operator.c b/src/backend/catalog/pg_operator.c
index 4c5a56cb094..1df5ac8ee8b 100644
--- a/src/backend/catalog/pg_operator.c
+++ b/src/backend/catalog/pg_operator.c
@@ -256,6 +256,7 @@ OperatorShellMake(const char *operatorName,
 	values[Anum_pg_operator_oprcode - 1] = ObjectIdGetDatum(InvalidOid);
 	values[Anum_pg_operator_oprrest - 1] = ObjectIdGetDatum(InvalidOid);
 	values[Anum_pg_operator_oprjoin - 1] = ObjectIdGetDatum(InvalidOid);
+	values[Anum_pg_operator_oprstat - 1] = ObjectIdGetDatum(InvalidOid);
 
 	/*
 	 * create a new operator tuple
@@ -301,6 +302,7 @@ OperatorShellMake(const char *operatorName,
  *		negatorName				X negator operator
  *		restrictionId			X restriction selectivity procedure ID
  *		joinId					X join selectivity procedure ID
+ *		statsId					X statistics derivation procedure ID
  *		canMerge				merge join can be used with this operator
  *		canHash					hash join can be used with this operator
  *
@@ -333,6 +335,7 @@ OperatorCreate(const char *operatorName,
 			   List *negatorName,
 			   Oid restrictionId,
 			   Oid joinId,
+			   Oid statsId,
 			   bool canMerge,
 			   bool canHash)
 {
@@ -505,6 +508,7 @@ OperatorCreate(const char *operatorName,
 	values[Anum_pg_operator_oprcode - 1] = ObjectIdGetDatum(procedureId);
 	values[Anum_pg_operator_oprrest - 1] = ObjectIdGetDatum(restrictionId);
 	values[Anum_pg_operator_oprjoin - 1] = ObjectIdGetDatum(joinId);
+	values[Anum_pg_operator_oprstat - 1] = ObjectIdGetDatum(statsId);
 
 	pg_operator_desc = table_open(OperatorRelationId, RowExclusiveLock);
 
@@ -855,6 +859,13 @@ makeOperatorDependencies(HeapTuple tuple,
 		add_exact_object_address(&referenced, addrs);
 	}
 
+	/* Dependency on statistics derivation function */
+	if (OidIsValid(oper->oprstat))
+	{
+		ObjectAddressSet(referenced, ProcedureRelationId, oper->oprstat);
+		add_exact_object_address(&referenced, addrs);
+	}
+
 	record_object_address_dependencies(&myself, addrs, DEPENDENCY_NORMAL);
 	free_object_addresses(addrs);
 
diff --git a/src/backend/commands/operatorcmds.c b/src/backend/commands/operatorcmds.c
index eb50f60ed13..813e369aef6 100644
--- a/src/backend/commands/operatorcmds.c
+++ b/src/backend/commands/operatorcmds.c
@@ -53,6 +53,12 @@
 static Oid	ValidateRestrictionEstimator(List *restrictionName);
 static Oid	ValidateJoinEstimator(List *joinName);
 
+/*
+ * XXX Maybe not the right name, because "estimator" implies we're calculating
+ * selectivity. But we're actually deriving statistics for an expression.
+ */
+static Oid	ValidateStatisticsEstimator(List *joinName);
+
 /*
  * DefineOperator
  *		this function extracts all the information from the
@@ -78,10 +84,12 @@ DefineOperator(List *names, List *parameters)
 	List	   *commutatorName = NIL;	/* optional commutator operator name */
 	List	   *negatorName = NIL;	/* optional negator operator name */
 	List	   *restrictionName = NIL;	/* optional restrict. sel. function */
+	List	   *statisticsName = NIL;	/* optional stats estimat. procedure */
 	List	   *joinName = NIL; /* optional join sel. function */
 	Oid			functionOid;	/* functions converted to OID */
 	Oid			restrictionOid;
 	Oid			joinOid;
+	Oid			statisticsOid;
 	Oid			typeId[2];		/* to hold left and right arg */
 	int			nargs;
 	ListCell   *pl;
@@ -131,6 +139,9 @@ DefineOperator(List *names, List *parameters)
 			restrictionName = defGetQualifiedName(defel);
 		else if (strcmp(defel->defname, "join") == 0)
 			joinName = defGetQualifiedName(defel);
+		/* XXX perhaps full "statistics" wording would be better */
+		else if (strcmp(defel->defname, "stats") == 0)
+			statisticsName = defGetQualifiedName(defel);
 		else if (strcmp(defel->defname, "hashes") == 0)
 			canHash = defGetBoolean(defel);
 		else if (strcmp(defel->defname, "merges") == 0)
@@ -246,6 +257,10 @@ DefineOperator(List *names, List *parameters)
 		joinOid = ValidateJoinEstimator(joinName);
 	else
 		joinOid = InvalidOid;
+	if (statisticsName)
+		statisticsOid = ValidateStatisticsEstimator(statisticsName);
+	else
+		statisticsOid = InvalidOid;
 
 	/*
 	 * now have OperatorCreate do all the work..
@@ -260,6 +275,7 @@ DefineOperator(List *names, List *parameters)
 					   negatorName, /* optional negator operator name */
 					   restrictionOid,	/* optional restrict. sel. function */
 					   joinOid, /* optional join sel. function name */
+					   statisticsOid, /* optional stats estimation procedure */
 					   canMerge,	/* operator merges */
 					   canHash);	/* operator hashes */
 }
@@ -357,6 +373,40 @@ ValidateJoinEstimator(List *joinName)
 	return joinOid;
 }
 
+/*
+ * Look up a statistics estimator function by name, and verify that it has the
+ * correct signature and we have the permissions to attach it to an operator.
+ */
+static Oid
+ValidateStatisticsEstimator(List *statName)
+{
+	Oid			typeId[4];
+	Oid			statOid;
+	AclResult	aclresult;
+
+	typeId[0] = INTERNALOID;	/* PlannerInfo */
+	typeId[1] = INTERNALOID;	/* OpExpr */
+	typeId[2] = INT4OID;		/* varRelid */
+	typeId[3] = INTERNALOID;	/* VariableStatData */
+
+	statOid = LookupFuncName(statName, 4, typeId, false);
+
+	/* statistics estimators must return void */
+	if (get_func_rettype(statOid) != BOOLOID)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+				 errmsg("statistics estimator function %s must return type %s",
+						NameListToString(statName), "boolean")));
+
+	/* Require EXECUTE rights for the estimator */
+	aclresult = pg_proc_aclcheck(statOid, GetUserId(), ACL_EXECUTE);
+	if (aclresult != ACLCHECK_OK)
+		aclcheck_error(aclresult, OBJECT_FUNCTION,
+					   NameListToString(statName));
+
+	return statOid;
+}
+
 /*
  * Guts of operator deletion.
  */
@@ -424,6 +474,9 @@ AlterOperator(AlterOperatorStmt *stmt)
 	List	   *joinName = NIL; /* optional join sel. function */
 	bool		updateJoin = false;
 	Oid			joinOid;
+	List	   *statName = NIL; /* optional statistics estimation procedure */
+	bool		updateStat = false;
+	Oid			statOid;
 
 	/* Look up the operator */
 	oprId = LookupOperWithArgs(stmt->opername, false);
@@ -454,6 +507,11 @@ AlterOperator(AlterOperatorStmt *stmt)
 			joinName = param;
 			updateJoin = true;
 		}
+		else if (pg_strcasecmp(defel->defname, "stats") == 0)
+		{
+			statName = param;
+			updateStat = true;
+		}
 
 		/*
 		 * The rest of the options that CREATE accepts cannot be changed.
@@ -496,6 +554,10 @@ AlterOperator(AlterOperatorStmt *stmt)
 		joinOid = ValidateJoinEstimator(joinName);
 	else
 		joinOid = InvalidOid;
+	if (statName)
+		statOid = ValidateStatisticsEstimator(statName);
+	else
+		statOid = InvalidOid;
 
 	/* Perform additional checks, like OperatorCreate does */
 	if (!(OidIsValid(oprForm->oprleft) && OidIsValid(oprForm->oprright)))
@@ -536,6 +598,11 @@ AlterOperator(AlterOperatorStmt *stmt)
 		replaces[Anum_pg_operator_oprjoin - 1] = true;
 		values[Anum_pg_operator_oprjoin - 1] = joinOid;
 	}
+	if (updateStat)
+	{
+		replaces[Anum_pg_operator_oprstat - 1] = true;
+		values[Anum_pg_operator_oprstat - 1] = statOid;
+	}
 
 	tup = heap_modify_tuple(tup, RelationGetDescr(catalog),
 							values, nulls, replaces);
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 10895fb2876..3015e949db7 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -4915,6 +4915,32 @@ ReleaseDummy(HeapTuple tuple)
 	pfree(tuple);
 }
 
+/*
+ * examine_operator_expression
+ *		Try to derive optimizer statistics for the operator expression using
+ *		operator's oprstat function.
+ *
+ * XXX Not sure why this returns bool - we ignore the return value anyway. We
+ * might as well return the calculated vardata (or NULL).
+ *
+ * XXX Not sure what to do about recursion - there can be another OpExpr in
+ * one of the arguments, and we should call this recursively from the oprstat
+ * procedure. But that's not possible, because it's marked as static.
+ */
+static bool
+examine_operator_expression(PlannerInfo *root, OpExpr *opexpr, int varRelid,
+							VariableStatData *vardata)
+{
+	RegProcedure oprstat = get_oprstat(opexpr->opno);
+
+	return OidIsValid(oprstat) &&
+		DatumGetBool(OidFunctionCall4(oprstat,
+									  PointerGetDatum(root),
+									  PointerGetDatum(opexpr),
+									  Int32GetDatum(varRelid),
+									  PointerGetDatum(vardata)));
+}
+
 /*
  * examine_variable
  *		Try to look up statistical data about an expression.
@@ -5335,6 +5361,19 @@ examine_variable(PlannerInfo *root, Node *node, int varRelid,
 				pos++;
 			}
 		}
+
+		/*
+		 * If there's no index or extended statistics matching the expression,
+		 * try deriving the statistics from statistics on arguments of the
+		 * operator expression (OpExpr). We do this last because it may be quite
+		 * expensive, and it's unclear how accurate the statistics will be.
+		 *
+		 * XXX Shouldn't this put more restrictions on the OpExpr? E.g. that
+		 * one of the arguments has to be a Const or something?
+		 */
+		if (!vardata->statsTuple && IsA(basenode, OpExpr))
+			examine_operator_expression(root, (OpExpr *) basenode, varRelid,
+										vardata);
 	}
 }
 
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index 9176514a962..77a8b01347f 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -1567,6 +1567,30 @@ get_oprjoin(Oid opno)
 		return (RegProcedure) InvalidOid;
 }
 
+/*
+ * get_oprstat
+ *
+ *		Returns procedure id for estimating statistics for an operator.
+ */
+RegProcedure
+get_oprstat(Oid opno)
+{
+	HeapTuple	tp;
+
+	tp = SearchSysCache1(OPEROID, ObjectIdGetDatum(opno));
+	if (HeapTupleIsValid(tp))
+	{
+		Form_pg_operator optup = (Form_pg_operator) GETSTRUCT(tp);
+		RegProcedure result;
+
+		result = optup->oprstat;
+		ReleaseSysCache(tp);
+		return result;
+	}
+	else
+		return (RegProcedure) InvalidOid;
+}
+
 /*				---------- FUNCTION CACHE ----------					 */
 
 /*
diff --git a/src/include/catalog/pg_operator.h b/src/include/catalog/pg_operator.h
index 6ab61517b1e..70d5f70964f 100644
--- a/src/include/catalog/pg_operator.h
+++ b/src/include/catalog/pg_operator.h
@@ -73,6 +73,9 @@ CATALOG(pg_operator,2617,OperatorRelationId)
 
 	/* OID of join estimator, or 0 */
 	regproc		oprjoin BKI_DEFAULT(-) BKI_LOOKUP_OPT(pg_proc);
+
+	/* OID of statistics estimator, or 0 */
+	regproc		oprstat BKI_DEFAULT(-) BKI_LOOKUP_OPT(pg_proc);
 } FormData_pg_operator;
 
 /* ----------------
@@ -95,6 +98,7 @@ extern ObjectAddress OperatorCreate(const char *operatorName,
 									List *negatorName,
 									Oid restrictionId,
 									Oid joinId,
+									Oid statisticsId,
 									bool canMerge,
 									bool canHash);
 
diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h
index 77871aaefc3..12fc1cadc62 100644
--- a/src/include/utils/lsyscache.h
+++ b/src/include/utils/lsyscache.h
@@ -118,6 +118,7 @@ extern Oid	get_commutator(Oid opno);
 extern Oid	get_negator(Oid opno);
 extern RegProcedure get_oprrest(Oid opno);
 extern RegProcedure get_oprjoin(Oid opno);
+extern RegProcedure get_oprstat(Oid opno);
 extern char *get_func_name(Oid funcid);
 extern Oid	get_func_namespace(Oid funcid);
 extern Oid	get_func_rettype(Oid funcid);
diff --git a/src/test/regress/expected/oidjoins.out b/src/test/regress/expected/oidjoins.out
index 215eb899be3..111ea99cdae 100644
--- a/src/test/regress/expected/oidjoins.out
+++ b/src/test/regress/expected/oidjoins.out
@@ -113,6 +113,7 @@ NOTICE:  checking pg_operator {oprnegate} => pg_operator {oid}
 NOTICE:  checking pg_operator {oprcode} => pg_proc {oid}
 NOTICE:  checking pg_operator {oprrest} => pg_proc {oid}
 NOTICE:  checking pg_operator {oprjoin} => pg_proc {oid}
+NOTICE:  checking pg_operator {oprstat} => pg_proc {oid}
 NOTICE:  checking pg_opfamily {opfmethod} => pg_am {oid}
 NOTICE:  checking pg_opfamily {opfnamespace} => pg_namespace {oid}
 NOTICE:  checking pg_opfamily {opfowner} => pg_authid {oid}
-- 
2.31.1

0002-Add-stats_form_tuple-20211230.patchtext/x-patch; charset=UTF-8; name=0002-Add-stats_form_tuple-20211230.patchDownload

From 70ec4b49544c62ab1c701a90f4c60ce2e3c717c2 Mon Sep 17 00:00:00 2001
From: Nikita Glukhov <n.gluhov@postgrespro.ru>
Date: Wed, 7 Dec 2016 16:12:55 +0300
Subject: [PATCH 02/10] Add stats_form_tuple()

---
 src/backend/utils/adt/selfuncs.c | 55 ++++++++++++++++++++++++++++++++
 src/include/utils/selfuncs.h     | 22 +++++++++++++
 2 files changed, 77 insertions(+)

diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 3015e949db7..3e8fc47f662 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -7958,3 +7958,58 @@ brincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 
 	*indexPages = index->pages;
 }
+
+/*
+ * stats_form_tuple - Form pg_statistic tuple from StatsData.
+ *
+ * If 'data' parameter is NULL, form all-NULL tuple (nullfrac = 1.0).
+ */
+HeapTuple
+stats_form_tuple(StatsData *data)
+{
+	Relation	rel;
+	HeapTuple	tuple;
+	Datum		values[Natts_pg_statistic];
+	bool		nulls[Natts_pg_statistic];
+	int			i;
+
+	for (i = 0; i < Natts_pg_statistic; ++i)
+		nulls[i] = false;
+
+	values[Anum_pg_statistic_starelid - 1] = ObjectIdGetDatum(InvalidOid);
+	values[Anum_pg_statistic_staattnum - 1] = Int16GetDatum(0);
+	values[Anum_pg_statistic_stainherit - 1] = BoolGetDatum(false);
+	values[Anum_pg_statistic_stanullfrac - 1] =
+									Float4GetDatum(data ? data->nullfrac : 1.0);
+	values[Anum_pg_statistic_stawidth - 1] =
+									Int32GetDatum(data ? data->width : 0);
+	values[Anum_pg_statistic_stadistinct - 1] =
+									Float4GetDatum(data ? data->distinct : 0);
+
+	for (i = 0; i < STATISTIC_NUM_SLOTS; i++)
+	{
+		StatsSlot *slot = data ? &data->slots[i] : NULL;
+
+		values[Anum_pg_statistic_stakind1 + i - 1] =
+								Int16GetDatum(slot ? slot->kind : 0);
+
+		values[Anum_pg_statistic_staop1 + i - 1] =
+					ObjectIdGetDatum(slot ? slot->opid : InvalidOid);
+
+		if (slot && DatumGetPointer(slot->numbers))
+			values[Anum_pg_statistic_stanumbers1 + i - 1] = slot->numbers;
+		else
+			nulls[Anum_pg_statistic_stanumbers1 + i - 1] = true;
+
+		if (slot && DatumGetPointer(slot->values))
+			values[Anum_pg_statistic_stavalues1 + i - 1] = slot->values;
+		else
+			nulls[Anum_pg_statistic_stavalues1 + i - 1] = true;
+	}
+
+	rel = table_open(StatisticRelationId, AccessShareLock);
+	tuple = heap_form_tuple(RelationGetDescr(rel), values, nulls);
+	table_close(rel, NoLock);
+
+	return tuple;
+}
diff --git a/src/include/utils/selfuncs.h b/src/include/utils/selfuncs.h
index 9dd444e1ff5..b90e05b8707 100644
--- a/src/include/utils/selfuncs.h
+++ b/src/include/utils/selfuncs.h
@@ -16,6 +16,7 @@
 #define SELFUNCS_H
 
 #include "access/htup.h"
+#include "catalog/pg_statistic.h"
 #include "fmgr.h"
 #include "nodes/pathnodes.h"
 
@@ -133,6 +134,24 @@ typedef struct
 	double		num_sa_scans;	/* # indexscans from ScalarArrayOpExprs */
 } GenericCosts;
 
+/* Single pg_statistic slot */
+typedef struct StatsSlot
+{
+	int16	kind;		/* stakindN: statistic kind (STATISTIC_KIND_XXX) */
+	Oid		opid;		/* staopN: associated operator, if needed */
+	Datum	numbers;	/* stanumbersN: float4 array of numbers */
+	Datum	values;		/* stavaluesN: anyarray of values */
+} StatsSlot;
+
+/* Deformed pg_statistic tuple */
+typedef struct StatsData
+{
+	float4		nullfrac;	/* stanullfrac: fraction of NULL values  */
+	float4		distinct;	/* stadistinct: number of distinct non-NULL values */
+	int32		width;		/* stawidth: average width in bytes of non-NULL values */
+	StatsSlot	slots[STATISTIC_NUM_SLOTS]; /* slots for different statistic types */
+} StatsData;
+
 /* Hooks for plugins to get control when we ask for stats */
 typedef bool (*get_relation_stats_hook_type) (PlannerInfo *root,
 											  RangeTblEntry *rte,
@@ -231,6 +250,9 @@ extern void genericcostestimate(PlannerInfo *root, IndexPath *path,
 								double loop_count,
 								GenericCosts *costs);
 
+extern HeapTuple stats_form_tuple(StatsData *data);
+
+
 /* Functions in array_selfuncs.c */
 
 extern Selectivity scalararraysel_containment(PlannerInfo *root,
-- 
2.31.1

0003-Add-symbolic-names-for-some-jsonb-operators-20211230.patchtext/x-patch; charset=UTF-8; name=0003-Add-symbolic-names-for-some-jsonb-operators-20211230.patchDownload

From 487adb560118d413eb4a9a8ec24a2c3f7ebcfcc1 Mon Sep 17 00:00:00 2001
From: Nikita Glukhov <n.gluhov@postgrespro.ru>
Date: Sat, 12 Nov 2016 18:59:43 +0300
Subject: [PATCH 03/10] Add symbolic names for some jsonb operators

---
 src/include/catalog/pg_operator.dat | 45 +++++++++++++++++------------
 1 file changed, 26 insertions(+), 19 deletions(-)

diff --git a/src/include/catalog/pg_operator.dat b/src/include/catalog/pg_operator.dat
index 2bc7cc35484..c0ff8da722c 100644
--- a/src/include/catalog/pg_operator.dat
+++ b/src/include/catalog/pg_operator.dat
@@ -2067,7 +2067,7 @@
 { oid => '1751', descr => 'negate',
   oprname => '-', oprkind => 'l', oprleft => '0', oprright => 'numeric',
   oprresult => 'numeric', oprcode => 'numeric_uminus' },
-{ oid => '1752', descr => 'equal',
+{ oid => '1752', oid_symbol => 'NumericEqOperator', descr => 'equal',
   oprname => '=', oprcanmerge => 't', oprcanhash => 't', oprleft => 'numeric',
   oprright => 'numeric', oprresult => 'bool', oprcom => '=(numeric,numeric)',
   oprnegate => '<>(numeric,numeric)', oprcode => 'numeric_eq',
@@ -2077,7 +2077,7 @@
   oprresult => 'bool', oprcom => '<>(numeric,numeric)',
   oprnegate => '=(numeric,numeric)', oprcode => 'numeric_ne',
   oprrest => 'neqsel', oprjoin => 'neqjoinsel' },
-{ oid => '1754', descr => 'less than',
+{ oid => '1754', oid_symbol => 'NumericLtOperator', descr => 'less than',
   oprname => '<', oprleft => 'numeric', oprright => 'numeric',
   oprresult => 'bool', oprcom => '>(numeric,numeric)',
   oprnegate => '>=(numeric,numeric)', oprcode => 'numeric_lt',
@@ -3172,70 +3172,77 @@
 { oid => '3967', descr => 'get value from json as text with path elements',
   oprname => '#>>', oprleft => 'json', oprright => '_text', oprresult => 'text',
   oprcode => 'json_extract_path_text' },
-{ oid => '3211', descr => 'get jsonb object field',
+{ oid => '3211', oid_symbol => 'JsonbObjectFieldOperator',
+  descr => 'get jsonb object field',
   oprname => '->', oprleft => 'jsonb', oprright => 'text', oprresult => 'jsonb',
   oprcode => 'jsonb_object_field' },
-{ oid => '3477', descr => 'get jsonb object field as text',
+{ oid => '3477', oid_symbol => 'JsonbObjectFieldTextOperator',
+  descr => 'get jsonb object field as text',
   oprname => '->>', oprleft => 'jsonb', oprright => 'text', oprresult => 'text',
   oprcode => 'jsonb_object_field_text' },
-{ oid => '3212', descr => 'get jsonb array element',
+{ oid => '3212', oid_symbol => 'JsonbArrayElementOperator',
+  descr => 'get jsonb array element',
   oprname => '->', oprleft => 'jsonb', oprright => 'int4', oprresult => 'jsonb',
   oprcode => 'jsonb_array_element' },
-{ oid => '3481', descr => 'get jsonb array element as text',
+{ oid => '3481', oid_symbol => 'JsonbArrayElementTextOperator',
+  descr => 'get jsonb array element as text',
   oprname => '->>', oprleft => 'jsonb', oprright => 'int4', oprresult => 'text',
   oprcode => 'jsonb_array_element_text' },
-{ oid => '3213', descr => 'get value from jsonb with path elements',
+{ oid => '3213', oid_symbol => 'JsonbExtractPathOperator',
+  descr => 'get value from jsonb with path elements',
   oprname => '#>', oprleft => 'jsonb', oprright => '_text',
   oprresult => 'jsonb', oprcode => 'jsonb_extract_path' },
-{ oid => '3206', descr => 'get value from jsonb as text with path elements',
+{ oid => '3206', oid_symbol => 'JsonbExtractPathTextOperator',
+  descr => 'get value from jsonb as text with path elements',
   oprname => '#>>', oprleft => 'jsonb', oprright => '_text',
   oprresult => 'text', oprcode => 'jsonb_extract_path_text' },
-{ oid => '3240', descr => 'equal',
+{ oid => '3240', oid_symbol => 'JsonbEqOperator', descr => 'equal',
   oprname => '=', oprcanmerge => 't', oprcanhash => 't', oprleft => 'jsonb',
   oprright => 'jsonb', oprresult => 'bool', oprcom => '=(jsonb,jsonb)',
   oprnegate => '<>(jsonb,jsonb)', oprcode => 'jsonb_eq', oprrest => 'eqsel',
   oprjoin => 'eqjoinsel' },
-{ oid => '3241', descr => 'not equal',
+{ oid => '3241', oid_symbol => 'JsonbNeOperator', descr => 'not equal',
   oprname => '<>', oprleft => 'jsonb', oprright => 'jsonb', oprresult => 'bool',
   oprcom => '<>(jsonb,jsonb)', oprnegate => '=(jsonb,jsonb)',
   oprcode => 'jsonb_ne', oprrest => 'neqsel', oprjoin => 'neqjoinsel' },
-{ oid => '3242', descr => 'less than',
+{ oid => '3242', oid_symbol => 'JsonbLtOperator', descr => 'less than',
   oprname => '<', oprleft => 'jsonb', oprright => 'jsonb', oprresult => 'bool',
   oprcom => '>(jsonb,jsonb)', oprnegate => '>=(jsonb,jsonb)',
   oprcode => 'jsonb_lt', oprrest => 'scalarltsel',
   oprjoin => 'scalarltjoinsel' },
-{ oid => '3243', descr => 'greater than',
+{ oid => '3243', oid_symbol => 'JsonbGtOperator', descr => 'greater than',
   oprname => '>', oprleft => 'jsonb', oprright => 'jsonb', oprresult => 'bool',
   oprcom => '<(jsonb,jsonb)', oprnegate => '<=(jsonb,jsonb)',
   oprcode => 'jsonb_gt', oprrest => 'scalargtsel',
   oprjoin => 'scalargtjoinsel' },
-{ oid => '3244', descr => 'less than or equal',
+{ oid => '3244', oid_symbol => 'JsonbLeOperator', descr => 'less than or equal',
   oprname => '<=', oprleft => 'jsonb', oprright => 'jsonb', oprresult => 'bool',
   oprcom => '>=(jsonb,jsonb)', oprnegate => '>(jsonb,jsonb)',
   oprcode => 'jsonb_le', oprrest => 'scalarlesel',
   oprjoin => 'scalarlejoinsel' },
-{ oid => '3245', descr => 'greater than or equal',
+{ oid => '3245', oid_symbol => 'JsonbGeOperator',
+  descr => 'greater than or equal',
   oprname => '>=', oprleft => 'jsonb', oprright => 'jsonb', oprresult => 'bool',
   oprcom => '<=(jsonb,jsonb)', oprnegate => '<(jsonb,jsonb)',
   oprcode => 'jsonb_ge', oprrest => 'scalargesel',
   oprjoin => 'scalargejoinsel' },
-{ oid => '3246', descr => 'contains',
+{ oid => '3246', oid_symbol => 'JsonbContainsOperator', descr => 'contains',
   oprname => '@>', oprleft => 'jsonb', oprright => 'jsonb', oprresult => 'bool',
   oprcom => '<@(jsonb,jsonb)', oprcode => 'jsonb_contains',
   oprrest => 'matchingsel', oprjoin => 'matchingjoinsel' },
-{ oid => '3247', descr => 'key exists',
+{ oid => '3247', oid_symbol => 'JsonbExistsOperator', descr => 'key exists',
   oprname => '?', oprleft => 'jsonb', oprright => 'text', oprresult => 'bool',
   oprcode => 'jsonb_exists', oprrest => 'matchingsel',
   oprjoin => 'matchingjoinsel' },
-{ oid => '3248', descr => 'any key exists',
+{ oid => '3248', oid_symbol => 'JsonbExistsAnyOperator', descr => 'any key exists',
   oprname => '?|', oprleft => 'jsonb', oprright => '_text', oprresult => 'bool',
   oprcode => 'jsonb_exists_any', oprrest => 'matchingsel',
   oprjoin => 'matchingjoinsel' },
-{ oid => '3249', descr => 'all keys exist',
+{ oid => '3249', oid_symbol => 'JsonbExistsAllOperator', descr => 'all keys exist',
   oprname => '?&', oprleft => 'jsonb', oprright => '_text', oprresult => 'bool',
   oprcode => 'jsonb_exists_all', oprrest => 'matchingsel',
   oprjoin => 'matchingjoinsel' },
-{ oid => '3250', descr => 'is contained by',
+{ oid => '3250', oid_symbol => 'JsonbContainedOperator', descr => 'is contained by',
   oprname => '<@', oprleft => 'jsonb', oprright => 'jsonb', oprresult => 'bool',
   oprcom => '@>(jsonb,jsonb)', oprcode => 'jsonb_contained',
   oprrest => 'matchingsel', oprjoin => 'matchingjoinsel' },
-- 
2.31.1

0004-Add-helper-jsonb-functions-and-macros-20211230.patchtext/x-patch; charset=UTF-8; name=0004-Add-helper-jsonb-functions-and-macros-20211230.patchDownload

From 32297b9ad2736a52e32cc22f6c729f89f095983a Mon Sep 17 00:00:00 2001
From: Nikita Glukhov <n.gluhov@postgrespro.ru>
Date: Fri, 16 Dec 2016 17:16:47 +0300
Subject: [PATCH 04/10] Add helper jsonb functions and macros

---
 src/backend/utils/adt/jsonb_util.c |  27 +++++
 src/backend/utils/adt/jsonfuncs.c  |  10 +-
 src/include/utils/jsonb.h          | 166 ++++++++++++++++++++++++++++-
 3 files changed, 196 insertions(+), 7 deletions(-)

diff --git a/src/backend/utils/adt/jsonb_util.c b/src/backend/utils/adt/jsonb_util.c
index 57111877959..8538fc6c37f 100644
--- a/src/backend/utils/adt/jsonb_util.c
+++ b/src/backend/utils/adt/jsonb_util.c
@@ -388,6 +388,22 @@ findJsonbValueFromContainer(JsonbContainer *container, uint32 flags,
 	return NULL;
 }
 
+/*
+ * findJsonbValueFromContainer() wrapper that sets up JsonbValue key string.
+ */
+JsonbValue *
+findJsonbValueFromContainerLen(JsonbContainer *container, uint32 flags,
+							   const char *key, uint32 keylen)
+{
+	JsonbValue	k;
+
+	k.type = jbvString;
+	k.val.string.val = key;
+	k.val.string.len = keylen;
+
+	return findJsonbValueFromContainer(container, flags, &k);
+}
+
 /*
  * Find value by key in Jsonb object and fetch it into 'res', which is also
  * returned.
@@ -602,6 +618,17 @@ pushJsonbValue(JsonbParseState **pstate, JsonbIteratorToken seq,
 		return pushJsonbValueScalar(pstate, seq, jbval);
 	}
 
+	/*
+	 * XXX I'm not quite sure why we actually do this? Why do we need to change
+	 * how JsonbValue is converted to Jsonb for the statistics patch?
+	 */
+	/* push value from scalar container without its enclosing array */
+	if (*pstate && JsonbExtractScalar(jbval->val.binary.data, &v))
+	{
+		Assert(IsAJsonbScalar(&v));
+		return pushJsonbValueScalar(pstate, seq, &v);
+	}
+
 	/* unpack the binary and add each piece to the pstate */
 	it = JsonbIteratorInit(jbval->val.binary.data);
 
diff --git a/src/backend/utils/adt/jsonfuncs.c b/src/backend/utils/adt/jsonfuncs.c
index 6335845d08e..adfaa8ec5ce 100644
--- a/src/backend/utils/adt/jsonfuncs.c
+++ b/src/backend/utils/adt/jsonfuncs.c
@@ -5371,7 +5371,8 @@ iterate_jsonb_values(Jsonb *jb, uint32 flags, void *state,
 		if (type == WJB_KEY)
 		{
 			if (flags & jtiKey)
-				action(state, v.val.string.val, v.val.string.len);
+				action(state, unconstify(char *, v.val.string.val),
+					   v.val.string.len);
 
 			continue;
 		}
@@ -5386,7 +5387,8 @@ iterate_jsonb_values(Jsonb *jb, uint32 flags, void *state,
 		{
 			case jbvString:
 				if (flags & jtiString)
-					action(state, v.val.string.val, v.val.string.len);
+					action(state, unconstify(char *, v.val.string.val),
+						   v.val.string.len);
 				break;
 			case jbvNumeric:
 				if (flags & jtiNumeric)
@@ -5508,7 +5510,9 @@ transform_jsonb_string_values(Jsonb *jsonb, void *action_state,
 	{
 		if ((type == WJB_VALUE || type == WJB_ELEM) && v.type == jbvString)
 		{
-			out = transform_action(action_state, v.val.string.val, v.val.string.len);
+			out = transform_action(action_state,
+								   unconstify(char *, v.val.string.val),
+								   v.val.string.len);
 			v.val.string.val = VARDATA_ANY(out);
 			v.val.string.len = VARSIZE_ANY_EXHDR(out);
 			res = pushJsonbValue(&st, type, type < WJB_BEGIN_ARRAY ? &v : NULL);
diff --git a/src/include/utils/jsonb.h b/src/include/utils/jsonb.h
index 4e07debf785..95cecb63ce7 100644
--- a/src/include/utils/jsonb.h
+++ b/src/include/utils/jsonb.h
@@ -14,6 +14,7 @@
 
 #include "lib/stringinfo.h"
 #include "utils/array.h"
+#include "utils/builtins.h"
 #include "utils/numeric.h"
 
 /* Tokens used when sequentially processing a jsonb value */
@@ -229,8 +230,7 @@ typedef struct
 #define JB_ROOT_IS_OBJECT(jbp_) ((*(uint32 *) VARDATA(jbp_) & JB_FOBJECT) != 0)
 #define JB_ROOT_IS_ARRAY(jbp_)	((*(uint32 *) VARDATA(jbp_) & JB_FARRAY) != 0)
 
-
-enum jbvType
+typedef enum jbvType
 {
 	/* Scalar types */
 	jbvNull = 0x0,
@@ -250,7 +250,7 @@ enum jbvType
 	 * into JSON strings when outputted to json/jsonb.
 	 */
 	jbvDatetime = 0x20,
-};
+} JsonbValueType;
 
 /*
  * JsonbValue:	In-memory representation of Jsonb.  This is a convenient
@@ -269,7 +269,7 @@ struct JsonbValue
 		struct
 		{
 			int			len;
-			char	   *val;	/* Not necessarily null-terminated */
+			const char *val;	/* Not necessarily null-terminated */
 		}			string;		/* String primitive type */
 
 		struct
@@ -382,6 +382,10 @@ extern int	compareJsonbContainers(JsonbContainer *a, JsonbContainer *b);
 extern JsonbValue *findJsonbValueFromContainer(JsonbContainer *sheader,
 											   uint32 flags,
 											   JsonbValue *key);
+extern JsonbValue *findJsonbValueFromContainerLen(JsonbContainer *container,
+												  uint32 flags,
+												  const char *key,
+												  uint32 keylen);
 extern JsonbValue *getKeyJsonValueFromContainer(JsonbContainer *container,
 												const char *keyVal, int keyLen,
 												JsonbValue *res);
@@ -399,6 +403,7 @@ extern bool JsonbDeepContains(JsonbIterator **val,
 extern void JsonbHashScalarValue(const JsonbValue *scalarVal, uint32 *hash);
 extern void JsonbHashScalarValueExtended(const JsonbValue *scalarVal,
 										 uint64 *hash, uint64 seed);
+extern bool JsonbExtractScalar(JsonbContainer *jbc, JsonbValue *res);
 
 /* jsonb.c support functions */
 extern char *JsonbToCString(StringInfo out, JsonbContainer *in,
@@ -412,4 +417,157 @@ extern Datum jsonb_set_element(Jsonb *jb, Datum *path, int path_len,
 							   JsonbValue *newval);
 extern Datum jsonb_get_element(Jsonb *jb, Datum *path, int npath,
 							   bool *isnull, bool as_text);
+
+/*
+ * XXX Not sure we want to add these functions to jsonb.h, which is the
+ * public API. Maybe it belongs rather to jsonb_typanalyze.c or elsewhere,
+ * closer to how it's used?
+ */
+
+/* helper inline functions for JsonbValue initialization */
+static inline JsonbValue *
+JsonValueInitObject(JsonbValue *val, int nPairs, int nPairsAllocated)
+{
+	val->type = jbvObject;
+	val->val.object.nPairs = nPairs;
+	val->val.object.pairs = nPairsAllocated ?
+							palloc(sizeof(JsonbPair) * nPairsAllocated) : NULL;
+
+	return val;
+}
+
+static inline JsonbValue *
+JsonValueInitArray(JsonbValue *val, int nElems, int nElemsAllocated,
+				   bool rawScalar)
+{
+	val->type = jbvArray;
+	val->val.array.nElems = nElems;
+	val->val.array.elems = nElemsAllocated ?
+							palloc(sizeof(JsonbValue) * nElemsAllocated) : NULL;
+	val->val.array.rawScalar = rawScalar;
+
+	return val;
+}
+
+static inline JsonbValue *
+JsonValueInitBinary(JsonbValue *val, Jsonb *jb)
+{
+	val->type = jbvBinary;
+	val->val.binary.data = &(jb)->root;
+	val->val.binary.len = VARSIZE_ANY_EXHDR(jb);
+	return val;
+}
+
+
+static inline JsonbValue *
+JsonValueInitString(JsonbValue *jbv, const char *str)
+{
+	jbv->type = jbvString;
+	jbv->val.string.len = strlen(str);
+	jbv->val.string.val = memcpy(palloc(jbv->val.string.len + 1), str,
+								 jbv->val.string.len + 1);
+	return jbv;
+}
+
+static inline JsonbValue *
+JsonValueInitStringWithLen(JsonbValue *jbv, const char *str, int len)
+{
+	jbv->type = jbvString;
+	jbv->val.string.val = str;
+	jbv->val.string.len = len;
+	return jbv;
+}
+
+static inline JsonbValue *
+JsonValueInitText(JsonbValue *jbv, text *txt)
+{
+	jbv->type = jbvString;
+	jbv->val.string.val = VARDATA_ANY(txt);
+	jbv->val.string.len = VARSIZE_ANY_EXHDR(txt);
+	return jbv;
+}
+
+static inline JsonbValue *
+JsonValueInitNumeric(JsonbValue *jbv, Numeric num)
+{
+	jbv->type = jbvNumeric;
+	jbv->val.numeric = num;
+	return jbv;
+}
+
+static inline JsonbValue *
+JsonValueInitInteger(JsonbValue *jbv, int64 i)
+{
+	jbv->type = jbvNumeric;
+	jbv->val.numeric = DatumGetNumeric(DirectFunctionCall1(
+											int8_numeric, Int64GetDatum(i)));
+	return jbv;
+}
+
+static inline JsonbValue *
+JsonValueInitFloat(JsonbValue *jbv, float4 f)
+{
+	jbv->type = jbvNumeric;
+	jbv->val.numeric = DatumGetNumeric(DirectFunctionCall1(
+											float4_numeric, Float4GetDatum(f)));
+	return jbv;
+}
+
+static inline JsonbValue *
+JsonValueInitDouble(JsonbValue *jbv, float8 f)
+{
+	jbv->type = jbvNumeric;
+	jbv->val.numeric = DatumGetNumeric(DirectFunctionCall1(
+											float8_numeric, Float8GetDatum(f)));
+	return jbv;
+}
+
+/* helper macros for jsonb building */
+#define pushJsonbKey(pstate, jbv, key) \
+		pushJsonbValue(pstate, WJB_KEY, JsonValueInitString(jbv, key))
+
+#define pushJsonbValueGeneric(Type, pstate, jbv, val) \
+		pushJsonbValue(pstate, WJB_VALUE, JsonValueInit##Type(jbv, val))
+
+#define pushJsonbElemGeneric(Type, pstate, jbv, val) \
+		pushJsonbValue(pstate, WJB_ELEM, JsonValueInit##Type(jbv, val))
+
+#define pushJsonbValueInteger(pstate, jbv, i) \
+		pushJsonbValueGeneric(Integer, pstate, jbv, i)
+
+#define pushJsonbValueFloat(pstate, jbv, f) \
+		pushJsonbValueGeneric(Float, pstate, jbv, f)
+
+#define pushJsonbElemFloat(pstate, jbv, f) \
+		pushJsonbElemGeneric(Float, pstate, jbv, f)
+
+#define pushJsonbElemString(pstate, jbv, txt) \
+		pushJsonbElemGeneric(String, pstate, jbv, txt)
+
+#define pushJsonbElemText(pstate, jbv, txt) \
+		pushJsonbElemGeneric(Text, pstate, jbv, txt)
+
+#define pushJsonbElemNumeric(pstate, jbv, num) \
+		pushJsonbElemGeneric(Numeric, pstate, jbv, num)
+
+#define pushJsonbElemInteger(pstate, jbv, num) \
+		pushJsonbElemGeneric(Integer, pstate, jbv, num)
+
+#define pushJsonbElemBinary(pstate, jbv, jbcont) \
+		pushJsonbElemGeneric(Binary, pstate, jbv, jbcont)
+
+#define pushJsonbKeyValueGeneric(Type, pstate, jbv, key, val) ( \
+		pushJsonbKey(pstate, jbv, key), \
+		pushJsonbValueGeneric(Type, pstate, jbv, val) \
+	)
+
+#define pushJsonbKeyValueString(pstate, jbv, key, val) \
+		pushJsonbKeyValueGeneric(String, pstate, jbv, key, val)
+
+#define pushJsonbKeyValueFloat(pstate, jbv, key, val) \
+		pushJsonbKeyValueGeneric(Float, pstate, jbv, key, val)
+
+#define pushJsonbKeyValueInteger(pstate, jbv, key, val) \
+		pushJsonbKeyValueGeneric(Integer, pstate, jbv, key, val)
+
 #endif							/* __JSONB_H__ */
-- 
2.31.1

0005-Export-scalarineqsel-20211230.patchtext/x-patch; charset=UTF-8; name=0005-Export-scalarineqsel-20211230.patchDownload

From c74ae4213a710e6abbcb332ef333e158cb1196c3 Mon Sep 17 00:00:00 2001
From: Nikita Glukhov <n.gluhov@postgrespro.ru>
Date: Sat, 12 Nov 2016 19:01:43 +0300
Subject: [PATCH 05/10] Export scalarineqsel()

---
 src/backend/utils/adt/selfuncs.c | 2 +-
 src/include/utils/selfuncs.h     | 4 ++++
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 3e8fc47f662..1478ee536ad 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -573,7 +573,7 @@ neqsel(PG_FUNCTION_ARGS)
  * it will return an approximate estimate based on assuming that the constant
  * value falls in the middle of the bin identified by binary search.
  */
-static double
+double
 scalarineqsel(PlannerInfo *root, Oid operator, bool isgt, bool iseq,
 			  Oid collation,
 			  VariableStatData *vardata, Datum constval, Oid consttype)
diff --git a/src/include/utils/selfuncs.h b/src/include/utils/selfuncs.h
index b90e05b8707..74841a108b4 100644
--- a/src/include/utils/selfuncs.h
+++ b/src/include/utils/selfuncs.h
@@ -249,6 +249,10 @@ extern List *add_predicate_to_index_quals(IndexOptInfo *index,
 extern void genericcostestimate(PlannerInfo *root, IndexPath *path,
 								double loop_count,
 								GenericCosts *costs);
+extern double scalarineqsel(PlannerInfo *root, Oid operator, bool isgt,
+							bool iseq, Oid collation,
+							VariableStatData *vardata, Datum constval,
+							Oid consttype);
 
 extern HeapTuple stats_form_tuple(StatsData *data);
 
-- 
2.31.1

0006-Add-jsonb-statistics-20211230.patchtext/x-patch; charset=UTF-8; name=0006-Add-jsonb-statistics-20211230.patchDownload

From 4c77b47d8b4dbed22ed24d00528162b4025de81d Mon Sep 17 00:00:00 2001
From: Nikita Glukhov <n.gluhov@postgrespro.ru>
Date: Sat, 12 Nov 2016 19:19:33 +0300
Subject: [PATCH 06/10] Add jsonb statistics

---
 src/backend/catalog/system_functions.sql  |   36 +
 src/backend/catalog/system_views.sql      |   56 +
 src/backend/utils/adt/Makefile            |    2 +
 src/backend/utils/adt/jsonb_selfuncs.c    | 1353 ++++++++++++++++++++
 src/backend/utils/adt/jsonb_typanalyze.c  | 1379 +++++++++++++++++++++
 src/backend/utils/adt/jsonpath_exec.c     |    2 +-
 src/include/catalog/pg_operator.dat       |   15 +-
 src/include/catalog/pg_proc.dat           |   11 +
 src/include/catalog/pg_statistic.h        |    2 +
 src/include/catalog/pg_type.dat           |    2 +-
 src/include/utils/json_selfuncs.h         |  100 ++
 src/test/regress/expected/jsonb_stats.out |  713 +++++++++++
 src/test/regress/expected/rules.out       |   32 +
 src/test/regress/parallel_schedule        |    2 +-
 src/test/regress/sql/jsonb_stats.sql      |  249 ++++
 15 files changed, 3944 insertions(+), 10 deletions(-)
 create mode 100644 src/backend/utils/adt/jsonb_selfuncs.c
 create mode 100644 src/backend/utils/adt/jsonb_typanalyze.c
 create mode 100644 src/include/utils/json_selfuncs.h
 create mode 100644 src/test/regress/expected/jsonb_stats.out
 create mode 100644 src/test/regress/sql/jsonb_stats.sql

diff --git a/src/backend/catalog/system_functions.sql b/src/backend/catalog/system_functions.sql
index 3a4fa9091b1..53cf6fc893a 100644
--- a/src/backend/catalog/system_functions.sql
+++ b/src/backend/catalog/system_functions.sql
@@ -594,6 +594,42 @@ LANGUAGE internal
 STRICT IMMUTABLE PARALLEL SAFE
 AS 'unicode_is_normalized';
 
+-- XXX is this function immutable / parallel safe?
+-- XXX do we actually need to cast to text and then to jsonb?
+CREATE FUNCTION pg_json_path_stats(tab regclass, path_index integer) RETURNS text
+AS $$
+	SELECT jsonb_pretty((
+		CASE
+			WHEN stakind1 = 8 THEN stavalues1
+			WHEN stakind2 = 8 THEN stavalues2
+			WHEN stakind3 = 8 THEN stavalues3
+			WHEN stakind4 = 8 THEN stavalues4
+			WHEN stakind5 = 8 THEN stavalues5
+		END::text::jsonb[])[$2])
+	FROM pg_statistic
+	WHERE starelid = $1
+$$ LANGUAGE 'sql';
+
+-- XXX is this function immutable / parallel safe?
+-- XXX do we actually need to cast to text and then to jsonb?
+CREATE FUNCTION pg_json_path_stats(tab regclass, path text) RETURNS text
+AS $$
+	SELECT jsonb_pretty(pathstats)
+	FROM (
+		SELECT unnest(
+			CASE
+				WHEN stakind1 = 8 THEN stavalues1
+				WHEN stakind2 = 8 THEN stavalues2
+				WHEN stakind3 = 8 THEN stavalues3
+				WHEN stakind4 = 8 THEN stavalues4
+				WHEN stakind5 = 8 THEN stavalues5
+			END::text::jsonb[]) pathstats
+		FROM pg_statistic
+		WHERE starelid = $1
+	) paths
+	WHERE pathstats->>'path' = $2
+$$ LANGUAGE 'sql';
+
 --
 -- The default permissions for functions mean that anyone can execute them.
 -- A number of functions shouldn't be executable by just anyone, but rather
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 61b515cdb85..51f132637a5 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -362,6 +362,62 @@ CREATE VIEW pg_stats_ext_exprs WITH (security_barrier) AS
 -- unprivileged users may read pg_statistic_ext but not pg_statistic_ext_data
 REVOKE ALL ON pg_statistic_ext_data FROM public;
 
+-- XXX This probably needs to do the same checks as pg_stats, i.e. 
+--    WHERE NOT attisdropped
+--    AND has_column_privilege(c.oid, a.attnum, 'select')
+--    AND (c.relrowsecurity = false OR NOT row_security_active(c.oid));
+CREATE VIEW pg_stats_json AS
+	SELECT
+		nspname AS schemaname,
+		relname AS tablename,
+		attname AS attname,
+
+		path->>'path' AS json_path,
+
+		stainherit AS inherited,
+
+		(path->'json'->>'nullfrac')::float4 AS null_frac,
+		(path->'json'->>'width')::float4 AS avg_width,
+		(path->'json'->>'distinct')::float4 AS n_distinct,
+
+		ARRAY(SELECT val FROM jsonb_array_elements(
+				path->'json'->'mcv'->'values') val)::anyarray
+			AS most_common_vals,
+
+		ARRAY(SELECT num::text::float4 FROM jsonb_array_elements(
+				path->'json'->'mcv'->'numbers') num)
+			AS most_common_freqs,
+
+		ARRAY(SELECT val FROM jsonb_array_elements(
+				path->'json'->'histogram'->'values') val)
+			AS histogram_bounds,
+
+		ARRAY(SELECT val::text::int FROM jsonb_array_elements(
+				path->'array_length'->'mcv'->'values') val)
+			AS most_common_array_lengths,
+
+		ARRAY(SELECT num::text::float4 FROM jsonb_array_elements(
+				path->'array_length'->'mcv'->'numbers') num)
+			AS most_common_array_length_freqs,
+
+		(path->'json'->>'correlation')::float4 AS correlation
+
+	FROM
+		pg_statistic s JOIN pg_class c ON (c.oid = s.starelid)
+		JOIN pg_attribute a ON (c.oid = attrelid AND attnum = s.staattnum)
+		LEFT JOIN pg_namespace n ON (n.oid = c.relnamespace),
+		LATERAL (
+			SELECT unnest((CASE
+					WHEN stakind1 = 8 THEN stavalues1
+					WHEN stakind2 = 8 THEN stavalues2
+					WHEN stakind3 = 8 THEN stavalues3
+					WHEN stakind4 = 8 THEN stavalues4
+					WHEN stakind5 = 8 THEN stavalues5
+				END ::text::jsonb[])[2:]) AS path
+		) paths;
+
+-- no need to revoke any privileges, we've already revoked accss to pg_statistic
+
 CREATE VIEW pg_publication_tables AS
     SELECT
         P.pubname AS pubname,
diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile
index 41b486bceff..5e359ccf4fb 100644
--- a/src/backend/utils/adt/Makefile
+++ b/src/backend/utils/adt/Makefile
@@ -50,6 +50,8 @@ OBJS = \
 	jsonb.o \
 	jsonb_gin.o \
 	jsonb_op.o \
+	jsonb_selfuncs.o \
+	jsonb_typanalyze.o \
 	jsonb_util.o \
 	jsonfuncs.o \
 	jsonbsubs.o \
diff --git a/src/backend/utils/adt/jsonb_selfuncs.c b/src/backend/utils/adt/jsonb_selfuncs.c
new file mode 100644
index 00000000000..39f23506cb2
--- /dev/null
+++ b/src/backend/utils/adt/jsonb_selfuncs.c
@@ -0,0 +1,1353 @@
+/*-------------------------------------------------------------------------
+ *
+ * jsonb_selfuncs.c
+ *	  Functions for selectivity estimation of jsonb operators
+ *
+ * Copyright (c) 2016, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/adt/jsonb_selfuncs.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include <math.h>
+
+#include "fmgr.h"
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "catalog/pg_operator.h"
+#include "catalog/pg_statistic.h"
+#include "catalog/pg_type.h"
+#include "nodes/primnodes.h"
+#include "utils/builtins.h"
+#include "utils/json.h"
+#include "utils/jsonb.h"
+#include "utils/json_selfuncs.h"
+#include "utils/lsyscache.h"
+#include "utils/rel.h"
+#include "utils/selfuncs.h"
+
+#define DEFAULT_JSON_CONTAINS_SEL	0.001
+
+/*
+ * jsonGetField
+ *		Given a JSONB document and a key, extract the JSONB value for the key.
+ */
+static inline Datum
+jsonGetField(Datum obj, const char *field)
+{
+	Jsonb 	   *jb = DatumGetJsonbP(obj);
+	JsonbValue *jbv = findJsonbValueFromContainerLen(&jb->root, JB_FOBJECT,
+													 field, strlen(field));
+	return jbv ? JsonbPGetDatum(JsonbValueToJsonb(jbv)) : PointerGetDatum(NULL);
+}
+
+/*
+ * jsonGetFloat4
+ *		Given a JSONB value, interpret it as a float4 value.
+ *
+ * This expects the JSONB value to be a numeric, because that's how we store
+ * floats in JSONB, and we cast it to float4.
+ *
+ * XXX Not sure assert is a sufficient protection against different types of
+ * JSONB values to be passed in.
+ */
+static inline Datum
+jsonGetFloat4(Datum jsonb)
+{
+	Jsonb	   *jb = DatumGetJsonbP(jsonb);
+	JsonbValue	jv;
+
+	JsonbExtractScalar(&jb->root, &jv);
+	Assert(jv.type == jbvNumeric);
+
+	return DirectFunctionCall1(numeric_float4, NumericGetDatum(jv.val.numeric));
+}
+
+/*
+ * jsonStatsInit
+ *		Given a pg_statistic tuple, expand STATISTIC_KIND_JSON into JsonStats.
+ */
+bool
+jsonStatsInit(JsonStats data, const VariableStatData *vardata)
+{
+	Jsonb	   *jb;
+	JsonbValue	prefix;
+
+	memset(&data->attslot, 0, sizeof(data->attslot));
+	data->statsTuple = vardata->statsTuple;
+
+	/* FIXME Could be before the memset, I guess? Checking vardata->statsTuple. */
+	if (!data->statsTuple)
+		return false;
+
+	/* Were there just NULL values in the column? No JSON stats, but still useful. */
+	if (((Form_pg_statistic) GETSTRUCT(data->statsTuple))->stanullfrac >= 1.0)
+	{
+		data->nullfrac = 1.0;
+		return true;
+	}
+
+	/* Do we have the JSON stats built in the pg_statistic? */
+	if (!get_attstatsslot(&data->attslot, data->statsTuple,
+						  STATISTIC_KIND_JSON, InvalidOid,
+						  ATTSTATSSLOT_NUMBERS | ATTSTATSSLOT_VALUES))
+		return false;
+
+	/* XXX Not sure what this means / how could it happen? */
+	if (data->attslot.nvalues < 2)
+	{
+		free_attstatsslot(&data->attslot);
+		return false;
+	}
+
+	/* XXX If the ACL check was not OK, would we even get here? */
+	data->acl_ok = vardata->acl_ok;
+	data->rel = vardata->rel;
+	data->nullfrac =
+		data->attslot.nnumbers > 0 ? data->attslot.numbers[0] : 0.0;
+	data->values = data->attslot.values;
+	data->nvalues = data->attslot.nvalues;
+
+	jb = DatumGetJsonbP(data->values[0]);
+	JsonbExtractScalar(&jb->root, &prefix);
+	Assert(prefix.type == jbvString);
+	data->prefix = prefix.val.string.val;
+	data->prefixlen = prefix.val.string.len;
+
+	return true;
+}
+
+/*
+ * jsonStatsRelease
+ *		Release resources (statistics slot) associated with the JsonStats value.
+ */
+void
+jsonStatsRelease(JsonStats data)
+{
+	free_attstatsslot(&data->attslot);
+}
+
+/*
+ * jsonPathStatsGetSpecialStats
+ *		Extract statistics of given type for JSON path.
+ *
+ * XXX This does not really extract any stats, it merely allocates the struct?
+ */
+static JsonPathStats
+jsonPathStatsGetSpecialStats(JsonPathStats pstats, JsonPathStatsType type)
+{
+	JsonPathStats stats;
+
+	if (!pstats)
+		return NULL;
+
+	stats = palloc(sizeof(*stats));
+	*stats = *pstats;
+	stats->path = memcpy(palloc(stats->pathlen), stats->path, stats->pathlen);
+	stats->type = type;
+
+	return stats;
+}
+
+/*
+ * jsonPathStatsGetLengthStats
+ *		Extract statistics of lengths (for arrays or objects) for the path.
+ */
+JsonPathStats
+jsonPathStatsGetLengthStats(JsonPathStats pstats)
+{
+	/*
+	 * The length statistics is relevant only for values that are objects or
+	 * arrays. So if we observed no such values, we know there can't be such
+	 * statistics and so we simply return NULL.
+	 */
+	if (jsonPathStatsGetTypeFreq(pstats, jbvObject, 0.0) <= 0.0 &&
+		jsonPathStatsGetTypeFreq(pstats, jbvArray, 0.0) <= 0.0)
+		return NULL;
+
+	return jsonPathStatsGetSpecialStats(pstats, JsonPathStatsLength);
+}
+
+/*
+ * jsonPathStatsGetArrayLengthStats
+ *		Extract statistics of lengths for arrays.
+ *
+ * XXX Why doesn't this do jsonPathStatsGetTypeFreq check similar to what
+ * jsonPathStatsGetLengthStats does?
+ */
+static JsonPathStats
+jsonPathStatsGetArrayLengthStats(JsonPathStats pstats)
+{
+	return jsonPathStatsGetSpecialStats(pstats, JsonPathStatsArrayLength);
+}
+
+/*
+ * jsonPathStatsCompare
+ *		Compare two JsonPathStats structs, so that we can sort them.
+ *
+ * We do this so that we can search for stats for a given path simply by
+ * bsearch().
+ *
+ * XXX We never build two structs for the same path, so we know the paths
+ * are different - one may be a prefix of the other, but then we sort the
+ * strings by length.
+ */
+static int
+jsonPathStatsCompare(const void *pv1, const void *pv2)
+{
+	JsonbValue	pathkey;
+	JsonbValue *path2;
+	JsonbValue const *path1 = pv1;
+	/* XXX Seems a bit convoluted to first cast it to Datum, then Jsonb ... */
+	Datum const *pdatum = pv2;
+	Jsonb	   *jsonb = DatumGetJsonbP(*pdatum);
+	int			res;
+
+	/* extract path from the statistics represented as jsonb document */
+	JsonValueInitStringWithLen(&pathkey, "path", 4);
+	path2 = findJsonbValueFromContainer(&jsonb->root, JB_FOBJECT, &pathkey);
+
+	/* XXX Not sure about this? Does empty path mean global stats? */
+	if (!path2 || path2->type != jbvString)
+		return 1;
+
+	/* compare the shared part first, then compare by length */
+	res = strncmp(path1->val.string.val, path2->val.string.val,
+				  Min(path1->val.string.len, path2->val.string.len));
+
+	return res ? res : path1->val.string.len - path2->val.string.len;
+}
+
+/*
+ * jsonStatsFindPathStats
+ *		Find stats for a given path.
+ *
+ * The stats are sorted by path, so we can simply do bsearch().
+ */
+static JsonPathStats
+jsonStatsFindPathStats(JsonStats jsdata, char *path, int pathlen)
+{
+	JsonPathStats stats;
+	JsonbValue	jbvkey;
+	Datum	   *pdatum;
+
+	JsonValueInitStringWithLen(&jbvkey, path, pathlen);
+
+	pdatum = bsearch(&jbvkey, jsdata->values + 1, jsdata->nvalues - 1,
+					 sizeof(*jsdata->values), jsonPathStatsCompare);
+
+	if (!pdatum)
+		return NULL;
+
+	stats = palloc(sizeof(*stats));
+	stats->datum = pdatum;
+	stats->data = jsdata;
+	stats->path = path;
+	stats->pathlen = pathlen;
+	stats->type = JsonPathStatsValues;
+
+	return stats;
+}
+
+/*
+ * jsonStatsGetPathStatsStr
+ *		???
+ *
+ * XXX Seems to do essentially what jsonStatsFindPathStats, except that it also
+ * considers jsdata->prefix. Seems fairly easy to combine those into a single
+ * function.
+ */
+JsonPathStats
+jsonStatsGetPathStatsStr(JsonStats jsdata, const char *subpath, int subpathlen)
+{
+	JsonPathStats stats;
+	char	   *path;
+	int			pathlen;
+
+	if (jsdata->nullfrac >= 1.0)
+		return NULL;
+
+	pathlen = jsdata->prefixlen + subpathlen - 1;
+	path = palloc(pathlen);
+
+	memcpy(path, jsdata->prefix, jsdata->prefixlen);
+	memcpy(&path[jsdata->prefixlen], &subpath[1], subpathlen - 1);
+
+	stats = jsonStatsFindPathStats(jsdata, path, pathlen);
+
+	if (!stats)
+		pfree(path);
+
+	return stats;
+}
+
+/*
+ * jsonPathAppendEntry
+ *		Append entry (represented as simple string) to a path.
+ */
+static void
+jsonPathAppendEntry(StringInfo path, const char *entry)
+{
+	appendStringInfoCharMacro(path, '.');
+	escape_json(path, entry);
+}
+
+/*
+ * jsonPathAppendEntryWithLen
+ *		Append string (represented as string + length) to a path.
+ *
+ * XXX Doesn't this need ecape_json too?
+ */
+static void
+jsonPathAppendEntryWithLen(StringInfo path, const char *entry, int len)
+{
+	char *tmpentry = pnstrdup(entry, len);
+	jsonPathAppendEntry(path, tmpentry);
+	pfree(tmpentry);
+}
+
+/*
+ * jsonPathStatsGetSubpath
+ *		???
+ */
+JsonPathStats
+jsonPathStatsGetSubpath(JsonPathStats pstats, const char *key, int keylen)
+{
+	JsonPathStats spstats;
+	char	   *path;
+	int			pathlen;
+
+	if (key)
+	{
+		StringInfoData str;
+
+		initStringInfo(&str);
+		appendBinaryStringInfo(&str, pstats->path, pstats->pathlen);
+		jsonPathAppendEntryWithLen(&str, key, keylen);
+
+		path = str.data;
+		pathlen = str.len;
+	}
+	else
+	{
+		pathlen = pstats->pathlen + 2;
+		path = palloc(pathlen + 1);
+		snprintf(path, pstats->pathlen + pathlen, "%.*s.#",
+				 pstats->pathlen, pstats->path);
+	}
+
+	spstats = jsonStatsFindPathStats(pstats->data, path, pathlen);
+	if (!spstats)
+		pfree(path);
+
+	return spstats;
+}
+
+/*
+ * jsonPathStatsGetArrayIndexSelectivity
+ *		Given stats for a path, determine selectivity for an array index.
+ */
+Selectivity
+jsonPathStatsGetArrayIndexSelectivity(JsonPathStats pstats, int index)
+{
+	JsonPathStats lenstats = jsonPathStatsGetArrayLengthStats(pstats);
+	JsonbValue	tmpjbv;
+	Jsonb	   *jb;
+
+	/*
+	 * If we have no array length stats, assume all documents match.
+	 *
+	 * XXX Shouldn't this use a default smaller than 1.0? What do the selfuncs
+	 * for regular arrays use?
+	 */
+	if (!lenstats)
+		return 1.0;
+
+	jb = JsonbValueToJsonb(JsonValueInitInteger(&tmpjbv, index));
+
+	/* calculate fraction of elements smaller than the index */
+	return jsonSelectivity(lenstats, JsonbPGetDatum(jb), JsonbGtOperator);
+}
+
+/*
+ * jsonStatsGetPathStats
+ *		???
+ *		
+ * XXX I guess pathLen stored number of pathEntries elements, so it should be
+ * nEntries or something. pathLen implies it's a string length.
+ */
+static JsonPathStats
+jsonStatsGetPathStats(JsonStats jsdata, Datum *pathEntries, int pathLen,
+					  float4 *nullfrac)
+{
+	JsonPathStats pstats;
+	Selectivity	sel = 1.0;
+	int			i;
+
+	if (!pathEntries && pathLen < 0)
+	{
+		if ((pstats = jsonStatsGetPathStatsStr(jsdata, "$.#", 3)))
+		{
+			sel = jsonPathStatsGetArrayIndexSelectivity(pstats, -1 - pathLen);
+			sel /= jsonPathStatsGetFreq(pstats, 0.0);
+		}
+	}
+	else
+	{
+		pstats = jsonStatsGetPathStatsStr(jsdata, "$", 1);
+
+		for (i = 0; pstats && i < pathLen; i++)
+		{
+			char	   *key = text_to_cstring(DatumGetTextP(pathEntries[i]));
+			int			keylen = strlen(key);
+
+			/* XXX What's this key "0123456789" about? */
+			if (key[0] >= '0' && key[0] <= '9' &&
+				key[strspn(key, "0123456789")] == '\0')
+			{
+				char	   *tail;
+				long		index;
+
+				errno = 0;
+				index = strtol(key, &tail, 10);
+
+				if (*tail || errno || index > INT_MAX || index < 0)
+					pstats = jsonPathStatsGetSubpath(pstats, key, keylen);
+				else
+				{
+					float4	arrfreq;
+
+					/* FIXME consider key also */
+					pstats = jsonPathStatsGetSubpath(pstats, NULL, 0);
+					sel *= jsonPathStatsGetArrayIndexSelectivity(pstats, index);
+					arrfreq = jsonPathStatsGetFreq(pstats, 0.0);
+
+					if (arrfreq > 0.0)
+						sel /= arrfreq;
+				}
+			}
+			else
+				pstats = jsonPathStatsGetSubpath(pstats, key, keylen);
+
+			pfree(key);
+		}
+	}
+
+	*nullfrac = 1.0 - sel;
+
+	return pstats;
+}
+
+/*
+ * jsonPathStatsGetNextKeyStats
+ *		???
+ */
+bool
+jsonPathStatsGetNextKeyStats(JsonPathStats stats, JsonPathStats *pkeystats,
+							 bool keysOnly)
+{
+	JsonPathStats keystats = *pkeystats;
+	int			index =
+		(keystats ? keystats->datum : stats->datum) - stats->data->values + 1;
+
+	for (; index < stats->data->nvalues; index++)
+	{
+		JsonbValue	pathkey;
+		JsonbValue *jbvpath;
+		Jsonb	   *jb = DatumGetJsonbP(stats->data->values[index]);
+
+		JsonValueInitStringWithLen(&pathkey, "path", 4);
+		jbvpath = findJsonbValueFromContainer(&jb->root, JB_FOBJECT, &pathkey);
+
+		if (jbvpath->type != jbvString ||
+			jbvpath->val.string.len <= stats->pathlen ||
+			memcmp(jbvpath->val.string.val, stats->path, stats->pathlen))
+			break;
+
+		if (keysOnly)
+		{
+			const char *c = &jbvpath->val.string.val[stats->pathlen];
+
+			Assert(*c == '.');
+			c++;
+
+			if (*c == '#')
+			{
+				if (keysOnly || jbvpath->val.string.len > stats->pathlen + 2)
+					continue;
+			}
+			else
+			{
+				Assert(*c == '"');
+
+				while (*++c != '"')
+					if (*c == '\\')
+						c++;
+
+				if (c - jbvpath->val.string.val < jbvpath->val.string.len - 1)
+					continue;
+			}
+		}
+
+		if (!keystats)
+			keystats = palloc(sizeof(*keystats));
+
+		keystats->datum = &stats->data->values[index];
+		keystats->data = stats->data;
+		keystats->pathlen = jbvpath->val.string.len;
+		keystats->path = memcpy(palloc(keystats->pathlen),
+								jbvpath->val.string.val, keystats->pathlen);
+		keystats->type = JsonPathStatsValues;
+
+		*pkeystats = keystats;
+
+		return true;
+	}
+
+	return false;
+}
+
+/*
+ * jsonStatsConvertArray
+ *		Convert a JSONB array into an array of some regular data type.
+ *
+ * The "type" identifies what elements are in the input JSONB array, while
+ * typid determines the target type.
+ */
+static Datum
+jsonStatsConvertArray(Datum jsonbValueArray, JsonStatType type, Oid typid,
+					  float4 multiplier)
+{
+	Datum	   *values;
+	Jsonb	   *jbvals;
+	JsonbValue	jbv;
+	JsonbIterator *it;
+	JsonbIteratorToken r;
+	int			nvalues;
+	int			i;
+
+	if (!DatumGetPointer(jsonbValueArray))
+		return PointerGetDatum(NULL);
+
+	jbvals = DatumGetJsonbP(jsonbValueArray);
+
+	nvalues = JsonContainerSize(&jbvals->root);
+
+	values = palloc(sizeof(Datum) * nvalues);
+
+	for (i = 0, it = JsonbIteratorInit(&jbvals->root);
+		(r = JsonbIteratorNext(&it, &jbv, true)) != WJB_DONE;)
+	{
+		if (r == WJB_ELEM)
+		{
+			Datum value;
+
+			switch (type)
+			{
+				case JsonStatJsonb:
+				case JsonStatJsonbWithoutSubpaths:
+					value = JsonbPGetDatum(JsonbValueToJsonb(&jbv));
+					break;
+
+				case JsonStatText:
+				case JsonStatString:
+					Assert(jbv.type == jbvString);
+					value = PointerGetDatum(
+								cstring_to_text_with_len(jbv.val.string.val,
+														 jbv.val.string.len));
+					break;
+
+				case JsonStatNumeric:
+					Assert(jbv.type == jbvNumeric);
+					value = DirectFunctionCall1(numeric_float4,
+												NumericGetDatum(jbv.val.numeric));
+					value = Float4GetDatum(DatumGetFloat4(value) * multiplier);
+					break;
+
+				default:
+					elog(ERROR, "invalid json stat type %d", type);
+					value = (Datum) 0;
+					break;
+			}
+
+			Assert(i < nvalues);
+			values[i++] = value;
+		}
+	}
+
+	Assert(i == nvalues);
+
+	/*
+	 * FIXME Does this actually work on all 32/64-bit systems? What if typid is
+	 * FLOAT8OID or something? Should look at TypeCache instead, probably.
+	 */
+	return PointerGetDatum(
+			construct_array(values, nvalues,
+							typid,
+							typid == FLOAT4OID ? 4 : -1,
+							typid == FLOAT4OID ? true /* FLOAT4PASSBYVAL */ : false,
+							'i'));
+}
+
+/*
+ * jsonPathStatsExtractData
+ *		Extract pg_statistics values from statistics for a single path.
+ *
+ *
+ */
+static bool
+jsonPathStatsExtractData(JsonPathStats pstats, JsonStatType stattype,
+						 float4 nullfrac, StatsData *statdata)
+{
+	Datum		data;
+	Datum		nullf;
+	Datum		dist;
+	Datum		width;
+	Datum		mcv;
+	Datum		hst;
+	Datum		corr;
+	Oid			type;
+	Oid			eqop;
+	Oid			ltop;
+	const char *key;
+	StatsSlot  *slot = statdata->slots;
+
+	nullfrac = 1.0 - (1.0 - pstats->data->nullfrac) * (1.0 - nullfrac);
+
+	switch (stattype)
+	{
+		case JsonStatJsonb:
+		case JsonStatJsonbWithoutSubpaths:
+			key = pstats->type == JsonPathStatsArrayLength ? "array_length" :
+				  pstats->type == JsonPathStatsLength ? "length" : "json";
+			type = JSONBOID;
+			eqop = JsonbEqOperator;
+			ltop = JsonbLtOperator;
+			break;
+		case JsonStatText:
+			key = "text";
+			type = TEXTOID;
+			eqop = TextEqualOperator;
+			ltop = TextLessOperator;
+			break;
+		case JsonStatString:
+			key = "string";
+			type = TEXTOID;
+			eqop = TextEqualOperator;
+			ltop = TextLessOperator;
+			break;
+		case JsonStatNumeric:
+			key = "numeric";
+			type = NUMERICOID;
+			eqop = NumericEqOperator;
+			ltop = NumericLtOperator;
+			break;
+		default:
+			elog(ERROR, "invalid json statistic type %d", stattype);
+			break;
+	}
+
+	data = jsonGetField(*pstats->datum, key);
+
+	if (!DatumGetPointer(data))
+		return false;
+
+	nullf = jsonGetField(data, "nullfrac");
+	dist = jsonGetField(data, "distinct");
+	width = jsonGetField(data, "width");
+	mcv = jsonGetField(data, "mcv");
+	hst = jsonGetField(data, "histogram");
+	corr = jsonGetField(data, "correlation");
+
+	statdata->nullfrac = DatumGetPointer(nullf) ?
+							DatumGetFloat4(jsonGetFloat4(nullf)) : 0.0;
+	statdata->distinct = DatumGetPointer(dist) ?
+							DatumGetFloat4(jsonGetFloat4(dist)) : 0.0;
+	statdata->width = DatumGetPointer(width) ?
+							(int32) DatumGetFloat4(jsonGetFloat4(width)) : 0;
+
+	statdata->nullfrac += (1.0 - statdata->nullfrac) * nullfrac;
+
+	if (DatumGetPointer(mcv))
+	{
+		slot->kind = STATISTIC_KIND_MCV;
+		slot->opid = eqop;
+		slot->numbers = jsonStatsConvertArray(jsonGetField(mcv, "numbers"),
+											  JsonStatNumeric, FLOAT4OID,
+											  1.0 - nullfrac);
+		slot->values  = jsonStatsConvertArray(jsonGetField(mcv, "values"),
+											  stattype, type, 0);
+		slot++;
+	}
+
+	if (DatumGetPointer(hst))
+	{
+		slot->kind = STATISTIC_KIND_HISTOGRAM;
+		slot->opid = ltop;
+		slot->numbers = jsonStatsConvertArray(jsonGetField(hst, "numbers"),
+											  JsonStatNumeric, FLOAT4OID, 1.0);
+		slot->values  = jsonStatsConvertArray(jsonGetField(hst, "values"),
+											  stattype, type, 0);
+		slot++;
+	}
+
+	if (DatumGetPointer(corr))
+	{
+		Datum	correlation = jsonGetFloat4(corr);
+		slot->kind = STATISTIC_KIND_CORRELATION;
+		slot->opid = ltop;
+		slot->numbers = PointerGetDatum(construct_array(&correlation, 1,
+														FLOAT4OID, 4, true,
+														'i'));
+		slot++;
+	}
+
+	if ((stattype == JsonStatJsonb ||
+		 stattype == JsonStatJsonbWithoutSubpaths) &&
+		jsonAnalyzeBuildSubPathsData(pstats->data->values,
+									 pstats->data->nvalues,
+									 pstats->datum - pstats->data->values,
+									 pstats->path,
+									 pstats->pathlen,
+									 stattype == JsonStatJsonb,
+									 nullfrac,
+									 &slot->values,
+									 &slot->numbers))
+	{
+		slot->kind = STATISTIC_KIND_JSON;
+		slot++;
+	}
+
+	return true;
+}
+
+static float4
+jsonPathStatsGetFloat(JsonPathStats pstats, const char *key,
+					float4 defaultval)
+{
+	Datum		freq;
+
+	if (!pstats || !(freq = jsonGetField(*pstats->datum, key)))
+		return defaultval;
+
+	return DatumGetFloat4(jsonGetFloat4(freq));
+}
+
+float4
+jsonPathStatsGetFreq(JsonPathStats pstats, float4 defaultfreq)
+{
+	return jsonPathStatsGetFloat(pstats, "freq", defaultfreq);
+}
+
+float4
+jsonPathStatsGetAvgArraySize(JsonPathStats pstats)
+{
+	return jsonPathStatsGetFloat(pstats, "avg_array_length", 1.0);
+}
+
+/*
+ * jsonPathStatsGetTypeFreq
+ *		Get frequency of different JSON object types for a given path.
+ *
+ * JSON documents don't have any particular schema, and the same path may point
+ * to values with different types in multiple documents. Consider for example
+ * two documents {"a" : "b"} and {"a" : 100} which have both a string and int
+ * for the same path. So we track the frequency of different JSON types for
+ * each path, so that we can consider this later.
+ */
+float4
+jsonPathStatsGetTypeFreq(JsonPathStats pstats, JsonbValueType type,
+						 float4 defaultfreq)
+{
+	const char *key;
+
+	if (!pstats)
+		return defaultfreq;
+
+	/*
+	 * When dealing with (object/array) length stats, we only really care about
+	 * objects and arrays.
+	 */
+	if (pstats->type == JsonPathStatsLength ||
+		pstats->type == JsonPathStatsArrayLength)
+	{
+		/* XXX Seems more like an error, no? Why ignore it? */
+		if (type != jbvNumeric)
+			return 0.0;
+
+		/* FIXME This is really hard to read/understand, with two nested ternary operators. */
+		return pstats->type == JsonPathStatsArrayLength
+				? jsonPathStatsGetFreq(pstats, defaultfreq)
+				: jsonPathStatsGetFloat(pstats, "freq_array", defaultfreq) +
+				  jsonPathStatsGetFloat(pstats, "freq_object", defaultfreq);
+	}
+
+	/* Which JSON type are we interested in? Pick the right freq_type key. */
+	switch (type)
+	{
+		case jbvNull:
+			key = "freq_null";
+			break;
+		case jbvString:
+			key = "freq_string";
+			break;
+		case jbvNumeric:
+			key = "freq_numeric";
+			break;
+		case jbvBool:
+			key = "freq_boolean";
+			break;
+		case jbvObject:
+			key = "freq_object";
+			break;
+		case jbvArray:
+			key = "freq_array";
+			break;
+		default:
+			elog(ERROR, "Invalid jsonb value type: %d", type);
+			break;
+	}
+
+	return jsonPathStatsGetFloat(pstats, key, defaultfreq);
+}
+
+/*
+ * jsonPathStatsFormTuple
+ *		For a pg_statistic tuple representing JSON statistics.
+ *
+ * XXX Maybe it's a bit expensive to first build StatsData and then transform it
+ * again while building the tuple. Could it be done in a single step? Would it be
+ * more efficient? Not sure how expensive it actually is, though.
+ */
+static HeapTuple
+jsonPathStatsFormTuple(JsonPathStats pstats, JsonStatType type, float4 nullfrac)
+{
+	StatsData	statdata;
+
+	if (!pstats || !pstats->datum)
+		return NULL;
+
+	/* FIXME What does this mean? */
+	if (pstats->datum == &pstats->data->values[1] &&
+		pstats->type == JsonPathStatsValues)
+		return heap_copytuple(pstats->data->statsTuple);
+
+	MemSet(&statdata, 0, sizeof(statdata));
+
+	if (!jsonPathStatsExtractData(pstats, type, nullfrac, &statdata))
+		return NULL;
+
+	return stats_form_tuple(&statdata);
+}
+
+/*
+ * jsonStatsGetPathStatsTuple
+ *		???
+ */
+static HeapTuple
+jsonStatsGetPathStatsTuple(JsonStats jsdata, JsonStatType type,
+						   Datum *path, int pathlen)
+{
+	float4			nullfrac;
+	JsonPathStats	pstats = jsonStatsGetPathStats(jsdata, path, pathlen,
+												   &nullfrac);
+
+	return jsonPathStatsFormTuple(pstats, type, nullfrac);
+}
+
+/*
+ * jsonStatsGetPathFreq
+ *		Return frequency of a path (fraction of documents containing it).
+ */
+static float4
+jsonStatsGetPathFreq(JsonStats jsdata, Datum *path, int pathlen)
+{
+	float4			nullfrac;
+	JsonPathStats	pstats = jsonStatsGetPathStats(jsdata, path, pathlen,
+												   &nullfrac);
+	float4			freq = (1.0 - nullfrac) * jsonPathStatsGetFreq(pstats, 0.0);
+
+	CLAMP_PROBABILITY(freq);
+	return freq;
+}
+
+/*
+ * jsonbStatsVarOpConst
+ *		Prepare optimizer statistics for a given operator, from JSON stats.
+ *
+ * This handles only OpExpr expressions, with variable and a constant. We get
+ * the constant as is, and the variable is represented by statistics fetched
+ * by get_restriction_variable().
+ *
+ * opid    - OID of the operator (input parameter)
+ * resdata - pointer to calculated statistics for result of operator
+ * vardata - statistics for the restriction variable
+ * cnst    - constant from the operator expression
+ *
+ * Returns true when useful optimizer statistics have been calculated.
+ */
+static bool
+jsonbStatsVarOpConst(Oid opid, VariableStatData *resdata,
+					 const VariableStatData *vardata, Const *cnst)
+{
+	JsonStatData jsdata;
+	JsonStatType statype = JsonStatJsonb;
+
+	if (!jsonStatsInit(&jsdata, vardata))
+		return false;
+
+	switch (opid)
+	{
+		case JsonbObjectFieldTextOperator:
+			statype = JsonStatText;
+			/* fall through */
+		case JsonbObjectFieldOperator:
+		{
+			if (cnst->consttype != TEXTOID)
+			{
+				jsonStatsRelease(&jsdata);
+				return false;
+			}
+
+			resdata->statsTuple =
+				jsonStatsGetPathStatsTuple(&jsdata, statype,
+										  &cnst->constvalue, 1);
+			break;
+		}
+
+		case JsonbArrayElementTextOperator:
+			statype = JsonStatText;
+			/* fall through */
+		case JsonbArrayElementOperator:
+		{
+			if (cnst->consttype != INT4OID)
+			{
+				jsonStatsRelease(&jsdata);
+				return false;
+			}
+
+			resdata->statsTuple =
+				jsonStatsGetPathStatsTuple(&jsdata, statype, NULL,
+										   -1 - DatumGetInt32(cnst->constvalue));
+			break;
+		}
+
+		case JsonbExtractPathTextOperator:
+			statype = JsonStatText;
+			/* fall through */
+		case JsonbExtractPathOperator:
+		{
+			Datum	   *path;
+			bool	   *nulls;
+			int			pathlen;
+			int			i;
+
+			if (cnst->consttype != TEXTARRAYOID)
+			{
+				jsonStatsRelease(&jsdata);
+				return false;
+			}
+
+			deconstruct_array(DatumGetArrayTypeP(cnst->constvalue), TEXTOID,
+							  -1, false, 'i', &path, &nulls, &pathlen);
+
+			for (i = 0; i < pathlen; i++)
+			{
+				if (nulls[i])
+				{
+					pfree(path);
+					pfree(nulls);
+					PG_RETURN_VOID();
+				}
+			}
+
+			resdata->statsTuple =
+				jsonStatsGetPathStatsTuple(&jsdata, statype, path, pathlen);
+
+			pfree(path);
+			pfree(nulls);
+			break;
+		}
+
+		default:
+			jsonStatsRelease(&jsdata);
+			return false;
+	}
+
+	if (!resdata->statsTuple)
+		resdata->statsTuple = stats_form_tuple(NULL);	/* form all-NULL tuple */
+
+	resdata->acl_ok = vardata->acl_ok;
+	resdata->freefunc = heap_freetuple;
+	Assert(resdata->rel == vardata->rel);
+	Assert(resdata->atttype ==
+		(statype == JsonStatJsonb ? JSONBOID :
+		 statype == JsonStatText ? TEXTOID :
+		 /* statype == JsonStatFreq */ BOOLOID));
+
+	jsonStatsRelease(&jsdata);
+	return true;
+}
+
+/*
+ * jsonb_stats
+ *		Statistics estimation procedure for JSONB data type.
+ *
+ * This only supports OpExpr expressions, with (Var op Const) shape.
+ *
+ * XXX It might be useful to allow recursion, i.e. get_restriction_variable
+ * might derive statistics too. I don't think it does that now, right?
+ */
+Datum
+jsonb_stats(PG_FUNCTION_ARGS)
+{
+	PlannerInfo *root = (PlannerInfo *) PG_GETARG_POINTER(0);
+	OpExpr	   *opexpr = (OpExpr *) PG_GETARG_POINTER(1);
+	int			varRelid = PG_GETARG_INT32(2);
+	VariableStatData *resdata	= (VariableStatData *) PG_GETARG_POINTER(3);
+	VariableStatData vardata;
+	Node	   *constexpr;
+	bool		result;
+	bool		varonleft;
+
+	/* should only be called for OpExpr expressions */
+	Assert(IsA(opexpr, OpExpr));
+
+	/* Is the expression simple enough? (Var op Const) or similar? */
+	if (!get_restriction_variable(root, opexpr->args, varRelid,
+								  &vardata, &constexpr, &varonleft))
+		return false;
+
+	/* XXX Could we also get varonleft=false in useful cases? */
+	result = IsA(constexpr, Const) && varonleft &&
+		jsonbStatsVarOpConst(opexpr->opno, resdata, &vardata,
+							 (Const *) constexpr);
+
+	ReleaseVariableStats(vardata);
+
+	return result;
+}
+
+/*
+ * jsonSelectivity
+ *		Use JSON statistics to estimate selectivity for (in)equalities.
+ *
+ * The statistics is represented as (arrays of) JSON values etc. so we
+ * need to pass the right operators to the functions.
+ */
+Selectivity
+jsonSelectivity(JsonPathStats stats, Datum scalar, Oid operator)
+{
+	VariableStatData vardata;
+	Selectivity sel;
+
+	if (!stats)
+		return 0.0;
+
+	vardata.atttype = JSONBOID;
+	vardata.atttypmod = -1;
+	vardata.isunique = false;
+	vardata.rel = stats->data->rel;
+	vardata.var = NULL;
+	vardata.vartype = JSONBOID;
+	vardata.acl_ok = stats->data->acl_ok;
+	vardata.statsTuple = jsonPathStatsFormTuple(stats,
+												JsonStatJsonbWithoutSubpaths, 0.0);
+
+	if (operator == JsonbEqOperator)
+		sel = var_eq_const(&vardata, operator, InvalidOid, scalar, false, true, false);
+	else
+		sel = scalarineqsel(NULL, operator,
+							operator == JsonbGtOperator ||
+							operator == JsonbGeOperator,
+							operator == JsonbLeOperator ||
+							operator == JsonbGeOperator,
+							InvalidOid,
+							&vardata, scalar, JSONBOID);
+
+	if (vardata.statsTuple)
+		heap_freetuple(vardata.statsTuple);
+
+	return sel;
+}
+
+/*
+ * jsonSelectivityContains
+ *		Estimate selectivity for containment operator on JSON.
+ *
+ * XXX This really needs more comments explaining the logic.
+ */
+static Selectivity
+jsonSelectivityContains(JsonStats stats, Jsonb *jb)
+{
+	JsonbValue		v;
+	JsonbIterator  *it;
+	JsonbIteratorToken r;
+	StringInfoData	pathstr;
+	struct Path
+	{
+		struct Path *parent;
+		int			len;
+		JsonPathStats stats;
+		Selectivity	freq;
+		Selectivity	sel;
+	}			root,
+			   *path = &root;
+	Selectivity	scalarSel = 0.0;
+	Selectivity	sel;
+	bool		rawScalar = false;
+
+	initStringInfo(&pathstr);
+
+	appendStringInfo(&pathstr, "$");
+
+	root.parent = NULL;
+	root.len = pathstr.len;
+	root.stats = jsonStatsGetPathStatsStr(stats, pathstr.data, pathstr.len);
+	root.freq = jsonPathStatsGetFreq(root.stats, 0.0);
+	root.sel = 1.0;
+
+	if (root.freq <= 0.0)
+		return 0.0;
+
+	it = JsonbIteratorInit(&jb->root);
+
+	while ((r = JsonbIteratorNext(&it, &v, false)) != WJB_DONE)
+	{
+		switch (r)
+		{
+			case WJB_BEGIN_OBJECT:
+			{
+				struct Path *p = palloc(sizeof(*p));
+
+				p->len = pathstr.len;
+				p->parent = path;
+				p->stats = NULL;
+				p->freq = jsonPathStatsGetTypeFreq(path->stats, jbvObject, 0.0);
+				if (p->freq <= 0.0)
+					return 0.0;
+				p->sel = 1.0;
+				path = p;
+				break;
+			}
+
+			case WJB_BEGIN_ARRAY:
+			{
+				struct Path *p = palloc(sizeof(*p));
+
+				rawScalar = v.val.array.rawScalar;
+
+				appendStringInfo(&pathstr, ".#");
+				p->len = pathstr.len;
+				p->parent = path;
+				p->stats = jsonStatsGetPathStatsStr(stats, pathstr.data,
+													pathstr.len);
+				p->freq = jsonPathStatsGetFreq(p->stats, 0.0);
+				if (p->freq <= 0.0 && !rawScalar)
+					return 0.0;
+				p->sel = 1.0;
+				path = p;
+
+				break;
+			}
+
+			case WJB_END_OBJECT:
+			case WJB_END_ARRAY:
+			{
+				struct Path *p = path;
+
+				path = path->parent;
+				sel = p->sel * p->freq / path->freq;
+				pfree(p);
+				pathstr.data[pathstr.len = path->len] = '\0';
+				if (pathstr.data[pathstr.len - 1] == '#')
+					sel = 1.0 - pow(1.0 - sel,
+								jsonPathStatsGetAvgArraySize(path->stats));
+				path->sel *= sel;
+				break;
+			}
+
+			case WJB_KEY:
+			{
+				pathstr.data[pathstr.len = path->parent->len] = '\0';
+				jsonPathAppendEntryWithLen(&pathstr, v.val.string.val,
+										   v.val.string.len);
+				path->len = pathstr.len;
+				break;
+			}
+
+			case WJB_VALUE:
+			case WJB_ELEM:
+			{
+				JsonPathStats	pstats = r == WJB_ELEM ? path->stats :
+					jsonStatsGetPathStatsStr(stats, pathstr.data, pathstr.len);
+				Datum			scalar = JsonbPGetDatum(JsonbValueToJsonb(&v));
+
+				if (path->freq <= 0.0)
+					sel = 0.0;
+				else
+				{
+					sel = jsonSelectivity(pstats, scalar, JsonbEqOperator);
+					sel /= path->freq;
+					if (pathstr.data[pathstr.len - 1] == '#')
+						sel = 1.0 - pow(1.0 - sel,
+										jsonPathStatsGetAvgArraySize(path->stats));
+				}
+
+				path->sel *= sel;
+
+				if (r == WJB_ELEM && path->parent == &root && rawScalar)
+					scalarSel = jsonSelectivity(root.stats, scalar,
+												JsonbEqOperator);
+				break;
+			}
+
+			default:
+				break;
+		}
+	}
+
+	sel = scalarSel + root.sel * root.freq;
+	CLAMP_PROBABILITY(sel);
+	return sel;
+}
+
+/*
+ * jsonSelectivityExists
+ *		Estimate selectivity for JSON "exists" operator.
+ */
+static Selectivity
+jsonSelectivityExists(JsonStats stats, Datum key)
+{
+	JsonPathStats arrstats;
+	JsonbValue	jbvkey;
+	Datum		jbkey;
+	Selectivity keysel;
+	Selectivity scalarsel;
+	Selectivity arraysel;
+	Selectivity sel;
+
+	JsonValueInitStringWithLen(&jbvkey,
+							   VARDATA_ANY(key), VARSIZE_ANY_EXHDR(key));
+
+	jbkey = JsonbPGetDatum(JsonbValueToJsonb(&jbvkey));
+
+	keysel = jsonStatsGetPathFreq(stats, &key, 1);
+
+	scalarsel = jsonSelectivity(jsonStatsGetPathStatsStr(stats, "$", 1),
+								jbkey, JsonbEqOperator);
+
+	arrstats = jsonStatsGetPathStatsStr(stats, "$.#", 3);
+	arraysel = jsonSelectivity(arrstats, jbkey, JsonbEqOperator);
+	arraysel = 1.0 - pow(1.0 - arraysel,
+						 jsonPathStatsGetAvgArraySize(arrstats));
+
+	sel = keysel + scalarsel + arraysel;
+	CLAMP_PROBABILITY(sel);
+	return sel;
+}
+
+/*
+ * jsonb_sel
+ *		The main procedure estimating selectivity for all JSONB operators.
+ */
+Datum
+jsonb_sel(PG_FUNCTION_ARGS)
+{
+	PlannerInfo *root = (PlannerInfo *) PG_GETARG_POINTER(0);
+	Oid			operator = PG_GETARG_OID(1);
+	List	   *args = (List *) PG_GETARG_POINTER(2);
+	int			varRelid = PG_GETARG_INT32(3);
+	double		sel = DEFAULT_JSON_CONTAINS_SEL;
+	Node	   *other;
+	Const	   *cnst;
+	bool		varonleft;
+	JsonStatData stats;
+	VariableStatData vardata;
+
+	if (!get_restriction_variable(root, args, varRelid,
+								  &vardata, &other, &varonleft))
+		PG_RETURN_FLOAT8(sel);
+
+	if (!IsA(other, Const))
+		goto out;
+
+	cnst = (Const *) other;
+
+	if (cnst->constisnull)
+	{
+		sel = 0.0;
+		goto out;
+	}
+
+	if (!jsonStatsInit(&stats, &vardata))
+		goto out;
+
+	switch (operator)
+	{
+		case JsonbExistsOperator:
+			if (!varonleft || cnst->consttype != TEXTOID)
+				goto out;
+
+			sel = jsonSelectivityExists(&stats, cnst->constvalue);
+			break;
+
+		case JsonbExistsAnyOperator:
+		case JsonbExistsAllOperator:
+		{
+			Datum	   *keys;
+			bool	   *nulls;
+			Selectivity	freq = 1.0;
+			int			nkeys;
+			int			i;
+			bool		all = operator == JsonbExistsAllOperator;
+
+			if (!varonleft || cnst->consttype != TEXTARRAYOID)
+				goto out;
+
+			deconstruct_array(DatumGetArrayTypeP(cnst->constvalue), TEXTOID,
+							  -1, false, 'i', &keys, &nulls, &nkeys);
+
+			for (i = 0; i < nkeys; i++)
+				if (!nulls[i])
+				{
+					Selectivity pathfreq = jsonSelectivityExists(&stats,
+																 keys[i]);
+					freq *= all ? pathfreq : (1.0 - pathfreq);
+				}
+
+			pfree(keys);
+			pfree(nulls);
+
+			if (!all)
+				freq = 1.0 - freq;
+
+			sel = freq;
+			break;
+		}
+
+		case JsonbContainedOperator:
+			/* TODO */
+			break;
+
+		case JsonbContainsOperator:
+		{
+			if (cnst->consttype != JSONBOID)
+				goto out;
+
+			sel = jsonSelectivityContains(&stats,
+										  DatumGetJsonbP(cnst->constvalue));
+			break;
+		}
+	}
+
+out:
+	jsonStatsRelease(&stats);
+	ReleaseVariableStats(vardata);
+
+	PG_RETURN_FLOAT8((float8) sel);
+}
diff --git a/src/backend/utils/adt/jsonb_typanalyze.c b/src/backend/utils/adt/jsonb_typanalyze.c
new file mode 100644
index 00000000000..e3849ec3076
--- /dev/null
+++ b/src/backend/utils/adt/jsonb_typanalyze.c
@@ -0,0 +1,1379 @@
+/*-------------------------------------------------------------------------
+ *
+ * jsonb_typanalyze.c
+ *	  Functions for gathering statistics from jsonb columns
+ *
+ * Copyright (c) 2016, PostgreSQL Global Development Group
+ *
+ * Functions in this module are used to analyze contents of JSONB columns
+ * and build optimizer statistics. In principle we extract paths from all
+ * sampled documents and calculate the usual statistics (MCV, histogram)
+ * for each path - in principle each path is treated as a column.
+ *
+ * Because we're not enforcing any JSON schema, the documents may differ
+ * a lot - the documents may contain large number of different keys, the
+ * types of values may be entirely different, etc. This makes it more
+ * challenging than building stats for regular columns. For example not
+ * only do we need to decide which values to keep in the MCV, but also
+ * which paths to keep (in case the documents are so variable we can't
+ * keep all paths).
+ *
+ * The statistics is stored in pg_statistic, in a slot with a new stakind
+ * value (STATISTIC_KIND_JSON). The statistics is serialized as an array
+ * of JSONB values, eash element storing statistics for one path.
+ *
+ * For each path, we store the following keys:
+ *
+ * - path         - path this stats is for, serialized as jsonpath
+ * - freq         - frequency of documents containing this path
+ * - json         - the regular per-column stats (MCV, histogram, ...)
+ * - freq_null    - frequency of JSON null values
+ * - freq_array   - frequency of JSON array values
+ * - freq_object  - frequency of JSON object values
+ * - freq_string  - frequency of JSON string values
+ * - freq_numeric - frequency of JSON numeric values
+ *
+ * This is stored in the stavalues array.
+ *
+ * The per-column stats (stored in the "json" key) have additional internal
+ * structure, to allow storing multiple stakind types (histogram, mcv). See
+ * jsonAnalyzeMakeScalarStats for details.
+ *
+ *
+ * XXX There's additional stuff (prefix, length stats) stored in the first
+ * two elements, I think.
+ *
+ * XXX It's a bit weird the "regular" stats are stored in the "json" key,
+ * while the JSON stats (frequencies of different JSON types) are right
+ * at the top level.
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/adt/jsonb_typanalyze.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "fmgr.h"
+#include "access/hash.h"
+#include "access/detoast.h"
+#include "catalog/pg_attribute.h"
+#include "catalog/pg_operator.h"
+#include "catalog/pg_type.h"
+#include "commands/vacuum.h"
+#include "utils/builtins.h"
+#include "utils/guc.h"
+#include "utils/hsearch.h"
+#include "utils/json.h"
+#include "utils/jsonb.h"
+#include "utils/json_selfuncs.h"
+#include "utils/lsyscache.h"
+#include "utils/memutils.h"
+
+typedef struct JsonPathEntry JsonPathEntry;
+
+/*
+ * Element of a path in the JSON document (i.e. not jsonpath). Elements
+ * are linked together to build longer paths.
+ *
+ * XXX We need entry+lenth because JSON path elements may contain null
+ * bytes, I guess?
+ */
+struct JsonPathEntry
+{
+	JsonPathEntry  *parent;
+	const char	   *entry;		/* element of the path as a string */
+	int				len;		/* length of entry string (may be 0) */
+	uint32			hash;		/* hash of the whole path (with parent) */
+};
+
+/*
+ * A path is simply a pointer to the last element (we can traverse to
+ * the top easily).
+ */
+typedef JsonPathEntry *JsonPath;
+
+/* An array containing a dynamic number of JSON values. */
+typedef struct JsonValues
+{
+	Datum	   *buf;
+	int			count;
+	int			allocated;
+} JsonValues;
+
+/*
+ * Scalar statistics built for an array of values, extracted from a JSON
+ * document (for one particular path).
+ *
+ * XXX The array can contain values of different JSON type, probably?
+ */
+typedef struct JsonScalarStats
+{
+	JsonValues		values;
+	VacAttrStats	stats;
+} JsonScalarStats;
+
+/*
+ * Statistics calculated for a set of values.
+ *
+ *
+ * XXX This seems rather complicated and needs simplification. We're not
+ * really using all the various JsonScalarStats bits, there's a lot of
+ * duplication (e.g. each JsonScalarStats contains it's own array, which
+ * has a copy of data from the one in "jsons"). Some of it is defined as
+ * a typedef, but booleans have inline struct.
+ */
+typedef struct JsonValueStats
+{
+	JsonScalarStats	jsons;		/* stats for all JSON types together */
+
+	/* XXX used only with JSON_ANALYZE_SCALARS defined */
+	JsonScalarStats	strings;	/* stats for JSON strings */
+	JsonScalarStats	numerics;	/* stats for JSON numerics */
+
+	/* stats for booleans */
+	struct
+	{
+		int		ntrue;
+		int		nfalse;
+	}				booleans;
+
+	int				nnulls;		/* number of JSON null values */
+	int				nobjects;	/* number of JSON objects */
+	int				narrays;	/* number of JSON arrays */
+
+	JsonScalarStats	lens;		/* stats of object lengths */
+	JsonScalarStats	arrlens;	/* stats of array lengths */
+} JsonValueStats;
+
+/* ??? */
+typedef struct JsonPathAnlStats
+{
+	JsonPathEntry		path;
+	JsonValueStats		vstats;
+	Jsonb			   *stats;
+	char			   *pathstr;
+	double				freq;
+	int					depth;
+
+	JsonPathEntry	  **entries;
+	int					nentries;
+} JsonPathAnlStats;
+
+/* various bits needed while analyzing JSON */
+typedef struct JsonAnalyzeContext
+{
+	VacAttrStats		   *stats;
+	MemoryContext			mcxt;
+	AnalyzeAttrFetchFunc	fetchfunc;
+	HTAB				   *pathshash;
+	JsonPathAnlStats	   *root;
+	JsonPathAnlStats	  **paths;
+	int						npaths;
+	double					totalrows;
+	double					total_width;
+	int						samplerows;
+	int						target;
+	int						null_cnt;
+	int						analyzed_cnt;
+	int						maxdepth;
+	bool					scalarsOnly;
+} JsonAnalyzeContext;
+
+/*
+ * JsonPathMatch
+ *		Determine when two JSON paths (list of JsonPathEntry) match.
+ *
+ * XXX Sould be JsonPathEntryMatch as it deals with JsonPathEntry nodes
+ * not whole paths, no?
+ *
+ * XXX Seems a bit silly to return int, when the return statement only
+ * really returns bool (because of how it compares paths). It's not really
+ * a comparator for sorting, for example.
+ */
+static int
+JsonPathMatch(const void *key1, const void *key2, Size keysize)
+{
+	const JsonPathEntry *path1 = key1;
+	const JsonPathEntry *path2 = key2;
+
+	return path1->parent != path2->parent ||
+		   path1->len != path2->len ||
+		   (path1->len > 0 &&
+			strncmp(path1->entry, path2->entry, path1->len));
+}
+
+/*
+ * JsonPathHash
+ *		Calculate hash of the path entry.
+ *
+ * XXX Again, maybe JsonPathEntryHash would be a better name?
+ *
+ * XXX Maybe should call JsonPathHash on the parent, instead of looking
+ * at the field directly. Could easily happen we have not calculated it
+ * yet, I guess.
+ */
+static uint32
+JsonPathHash(const void *key, Size keysize)
+{
+	const JsonPathEntry	   *path = key;
+
+	/* XXX Call JsonPathHash instead of direct access? */
+	uint32					hash = path->parent ? path->parent->hash : 0;
+
+	hash = (hash << 1) | (hash >> 31);
+	hash ^= path->len < 0 ? 0 : DatumGetUInt32(
+					hash_any((const unsigned char *) path->entry, path->len));
+
+	return hash;
+}
+
+/*
+ * jsonAnalyzeAddPath
+ *		Add an entry for a JSON path to the working list of statistics.
+ *
+ * Returns a pointer to JsonPathAnlStats (which might have already existed
+ * if the path was in earlier document), which can then be populated or
+ * updated.
+ */
+static inline JsonPathAnlStats *
+jsonAnalyzeAddPath(JsonAnalyzeContext *ctx, JsonPath path)
+{
+	JsonPathAnlStats   *stats;
+	bool				found;
+
+	path->hash = JsonPathHash(path, 0);
+
+	/* XXX See if we already saw this path earlier. */
+	stats = hash_search_with_hash_value(ctx->pathshash, path, path->hash,
+										HASH_ENTER, &found);
+
+	/*
+	 * Nope, it's the first time we see this path, so initialize all the
+	 * fields (path string, counters, ...).
+	 */
+	if (!found)
+	{
+		JsonPathAnlStats   *parent = (JsonPathAnlStats *) stats->path.parent;
+		const char		   *ppath = parent->pathstr;
+
+		path = &stats->path;
+
+		/* Is it valid path? If not, we treat it as $.# */
+		if (path->len >= 0)
+		{
+			StringInfoData	si;
+			MemoryContext	oldcxt = MemoryContextSwitchTo(ctx->mcxt);
+
+			initStringInfo(&si);
+
+			path->entry = pnstrdup(path->entry, path->len);
+
+			appendStringInfo(&si, "%s.", ppath);
+			escape_json(&si, path->entry);
+
+			stats->pathstr = si.data;
+
+			MemoryContextSwitchTo(oldcxt);
+		}
+		else
+		{
+			int pathstrlen = strlen(ppath) + 3;
+			stats->pathstr = MemoryContextAlloc(ctx->mcxt, pathstrlen);
+			snprintf(stats->pathstr, pathstrlen, "%s.#", ppath);
+		}
+
+		/* initialize the stats counter for this path entry */
+		memset(&stats->vstats, 0, sizeof(JsonValueStats));
+		stats->stats = NULL;
+		stats->freq = 0.0;
+		stats->depth = parent->depth + 1;
+		stats->entries = NULL;
+		stats->nentries = 0;
+
+		/* XXX Seems strange. Should we even add the path in this case? */
+		if (stats->depth > ctx->maxdepth)
+			ctx->maxdepth = stats->depth;
+	}
+
+	return stats;
+}
+
+/*
+ * JsonValuesAppend
+ *		Add a JSON value to the dynamic array (enlarge it if needed).
+ *
+ * XXX This is likely one of the problems - the documents may be pretty
+ * large, with a lot of different values for each path. At that point
+ * it's problematic to keep all of that in memory at once. So maybe we
+ * need to introduce some sort of compaction (e.g. we could try
+ * deduplicating the values), limit on size of the array or something.
+ */
+static inline void
+JsonValuesAppend(JsonValues *values, Datum value, int initialSize)
+{
+	if (values->count >= values->allocated)
+	{
+		if (values->allocated)
+		{
+			values->allocated = values->allocated * 2;
+			values->buf = repalloc(values->buf,
+									sizeof(values->buf[0]) * values->allocated);
+		}
+		else
+		{
+			values->allocated = initialSize;
+			values->buf = palloc(sizeof(values->buf[0]) * values->allocated);
+		}
+	}
+
+	values->buf[values->count++] = value;
+}
+
+/*
+ * jsonAnalyzeJsonValue
+ *		Process a value extracted from the document (for a given path).
+ */
+static inline void
+jsonAnalyzeJsonValue(JsonAnalyzeContext *ctx, JsonValueStats *vstats,
+					 JsonbValue *jv)
+{
+	JsonScalarStats	   *sstats;
+	JsonbValue		   *jbv;
+	JsonbValue			jbvtmp;
+	Datum				value;
+
+	/* ??? */
+	if (ctx->scalarsOnly && jv->type == jbvBinary)
+	{
+		if (JsonContainerIsObject(jv->val.binary.data))
+			jbv = JsonValueInitObject(&jbvtmp, 0, 0);
+		else
+		{
+			Assert(JsonContainerIsArray(jv->val.binary.data));
+			jbv = JsonValueInitArray(&jbvtmp, 0, 0, false);
+		}
+	}
+	else
+		jbv = jv;
+
+	/* always add it to the "global" JSON stats, shared by all types */
+	JsonValuesAppend(&vstats->jsons.values,
+					 JsonbPGetDatum(JsonbValueToJsonb(jbv)),
+					 ctx->target);
+
+	/*
+	 * Maybe also update the type-specific counters.
+	 *
+	 * XXX The mix of break/return statements in this block is really
+	 * confusing.
+	 */
+	switch (jv->type)
+	{
+		case jbvNull:
+			++vstats->nnulls;
+			return;
+
+		case jbvBool:
+			++*(jv->val.boolean ? &vstats->booleans.ntrue
+								: &vstats->booleans.nfalse);
+			return;
+
+		case jbvString:
+#ifdef JSON_ANALYZE_SCALARS
+			sstats = &vstats->strings;
+			value = PointerGetDatum(
+						cstring_to_text_with_len(jv->val.string.val,
+												 jv->val.string.len));
+			break;
+#else
+			return;
+#endif
+
+		case jbvNumeric:
+#ifdef JSON_ANALYZE_SCALARS
+			sstats = &vstats->numerics;
+			value = PointerGetDatum(jv->val.numeric);
+			break;
+#else
+			return;
+#endif
+
+		case jbvBinary:
+			if (JsonContainerIsObject(jv->val.binary.data))
+			{
+				uint32 size = JsonContainerSize(jv->val.binary.data);
+				value = DatumGetInt32(size);
+				sstats = &vstats->lens;
+				vstats->nobjects++;
+				break;
+			}
+			else if (JsonContainerIsArray(jv->val.binary.data))
+			{
+				uint32 size = JsonContainerSize(jv->val.binary.data);
+				value = DatumGetInt32(size);
+				sstats = &vstats->lens;
+				vstats->narrays++;
+				JsonValuesAppend(&vstats->arrlens.values, value, ctx->target);
+				break;
+			}
+			return;
+
+		default:
+			elog(ERROR, "invalid scalar json value type %d", jv->type);
+			break;
+	}
+
+	JsonValuesAppend(&sstats->values, value, ctx->target);
+}
+
+/*
+ * jsonAnalyzeJson
+ *		Parse the JSON document and build/update stats.
+ *
+ * XXX The name seems a bit weird, with the two json bits.
+ *
+ * XXX The param is either NULL, (char *) -1, or a pointer 
+ */
+static void
+jsonAnalyzeJson(JsonAnalyzeContext *ctx, Jsonb *jb, void *param)
+{
+	JsonbValue			jv;
+	JsonbIterator	   *it;
+	JsonbIteratorToken	tok;
+	JsonPathAnlStats   *target = (JsonPathAnlStats *) param;
+	JsonPathAnlStats   *stats = ctx->root;
+	JsonPath			path = &stats->path;
+	JsonPathEntry		entry;
+	bool				scalar = false;
+
+	if ((!target || target == stats) &&
+		!JB_ROOT_IS_SCALAR(jb))
+		jsonAnalyzeJsonValue(ctx, &stats->vstats, JsonValueInitBinary(&jv, jb));
+
+	it = JsonbIteratorInit(&jb->root);
+
+	while ((tok = JsonbIteratorNext(&it, &jv, true)) != WJB_DONE)
+	{
+		switch (tok)
+		{
+			case WJB_BEGIN_OBJECT:
+				entry.entry = NULL;
+				entry.len = -1;
+				entry.parent = path;
+				path = &entry;
+
+				break;
+
+			case WJB_END_OBJECT:
+				stats = (JsonPathAnlStats *)(path = path->parent);
+				break;
+
+			case WJB_BEGIN_ARRAY:
+				if (!(scalar = jv.val.array.rawScalar))
+				{
+					entry.entry = NULL;
+					entry.len = -1;
+					entry.parent = path;
+					path = &(stats = jsonAnalyzeAddPath(ctx, &entry))->path;
+				}
+				break;
+
+			case WJB_END_ARRAY:
+				if (!scalar)
+					stats = (JsonPathAnlStats *)(path = path->parent);
+				break;
+
+			case WJB_KEY:
+				entry.entry = jv.val.string.val;
+				entry.len = jv.val.string.len;
+				entry.parent = path->parent;
+				path = &(stats = jsonAnalyzeAddPath(ctx, &entry))->path;
+				break;
+
+			case WJB_VALUE:
+			case WJB_ELEM:
+				if (!target || target == stats)
+					jsonAnalyzeJsonValue(ctx, &stats->vstats, &jv);
+
+				/* XXX not sure why we're doing this? */
+				if (jv.type == jbvBinary)
+				{
+					/* recurse into container */
+					JsonbIterator *it2 = JsonbIteratorInit(jv.val.binary.data);
+
+					it2->parent = it;
+					it = it2;
+				}
+				break;
+
+			default:
+				break;
+		}
+	}
+}
+
+/*
+ * jsonAnalyzeJsonSubpath
+ *		???
+ */
+static void
+jsonAnalyzeJsonSubpath(JsonAnalyzeContext *ctx, JsonPathAnlStats *pstats,
+					   JsonbValue *jbv, int n)
+{
+	JsonbValue	scalar;
+	int			i;
+
+	for (i = n; i < pstats->depth; i++)
+	{
+		JsonPathEntry  *entry = pstats->entries[i];
+		JsonbContainer *jbc = jbv->val.binary.data;
+		JsonbValueType	type = jbv->type;
+
+		if (i > n)
+			pfree(jbv);
+
+		if (type != jbvBinary)
+			return;
+
+		if (entry->len == -1)
+		{
+			JsonbIterator	   *it;
+			JsonbIteratorToken	r;
+			JsonbValue			elem;
+
+			if (!JsonContainerIsArray(jbc) || JsonContainerIsScalar(jbc))
+				return;
+
+			it = JsonbIteratorInit(jbc);
+
+			while ((r = JsonbIteratorNext(&it, &elem, true)) != WJB_DONE)
+			{
+				if (r == WJB_ELEM)
+					jsonAnalyzeJsonSubpath(ctx, pstats, &elem, i + 1);
+			}
+
+			return;
+		}
+		else
+		{
+			if (!JsonContainerIsObject(jbc))
+				return;
+
+			jbv = findJsonbValueFromContainerLen(jbc, JB_FOBJECT,
+												 entry->entry, entry->len);
+
+			if (!jbv)
+				return;
+		}
+	}
+
+	if (i == n &&
+		jbv->type == jbvBinary &&
+		JsonbExtractScalar(jbv->val.binary.data, &scalar))
+		jbv = &scalar;
+
+	jsonAnalyzeJsonValue(ctx, &pstats->vstats, jbv);
+
+	if (i > n)
+		pfree(jbv);
+}
+
+/*
+ * jsonAnalyzeJsonPath
+ *		???
+ */
+static void
+jsonAnalyzeJsonPath(JsonAnalyzeContext *ctx, Jsonb *jb, void *param)
+{
+	JsonPathAnlStats   *pstats = (JsonPathAnlStats *) param;
+	JsonbValue			jbvtmp;
+	JsonbValue		   *jbv = JsonValueInitBinary(&jbvtmp, jb);
+	JsonPath			path;
+
+	if (!pstats->entries)
+	{
+		int i;
+
+		pstats->entries = MemoryContextAlloc(ctx->mcxt,
+									sizeof(*pstats->entries) * pstats->depth);
+
+		for (path = &pstats->path, i = pstats->depth - 1;
+			 path->parent && i >= 0;
+			 path = path->parent, i--)
+			pstats->entries[i] = path;
+	}
+
+	jsonAnalyzeJsonSubpath(ctx, pstats, jbv, 0);
+}
+
+static Datum
+jsonAnalyzePathFetch(VacAttrStatsP stats, int rownum, bool *isnull)
+{
+	*isnull = false;
+	return stats->exprvals[rownum];
+}
+
+/*
+ * jsonAnalyzePathValues
+ *		Calculate per-column statistics for values for a single path.
+ *
+ * We have already accumulated all the values for the path, so we simply
+ * call the typanalyze function for the proper data type, and then
+ * compute_stats (which points to compute_scalar_stats or so).
+ */
+static void
+jsonAnalyzePathValues(JsonAnalyzeContext *ctx, JsonScalarStats *sstats,
+					  Oid typid, double freq)
+{
+	JsonValues			   *values = &sstats->values;
+	VacAttrStats		   *stats = &sstats->stats;
+	FormData_pg_attribute	attr;
+	FormData_pg_type		type;
+	int						i;
+
+	if (!sstats->values.count)
+		return;
+
+	get_typlenbyvalalign(typid, &type.typlen, &type.typbyval, &type.typalign);
+
+	attr.attstattarget = ctx->target;
+
+	stats->attr = &attr;
+	stats->attrtypid = typid;
+	stats->attrtypmod = -1;
+	stats->attrtype = &type;
+	stats->anl_context = ctx->stats->anl_context;
+
+	stats->exprvals = values->buf;
+
+	/* XXX Do we need to initialize all slots? */
+	for (i = 0; i < STATISTIC_NUM_SLOTS; i++)
+	{
+		stats->statypid[i] = stats->attrtypid;
+		stats->statyplen[i] = stats->attrtype->typlen;
+		stats->statypbyval[i] = stats->attrtype->typbyval;
+		stats->statypalign[i] = stats->attrtype->typalign;
+	}
+
+	std_typanalyze(stats);
+
+	stats->compute_stats(stats, jsonAnalyzePathFetch,
+						 values->count,
+						 ctx->totalrows / ctx->samplerows * values->count);
+
+	/*
+	 * We've only kept the non-null values, so compute_stats will always
+	 * leave this as 1.0. But we have enough info to calculate the correct
+	 * value.
+	 */
+	stats->stanullfrac = (float4)(1.0 - freq);
+
+	/*
+	 * Similarly, we need to correct the MCV frequencies, becuse those are
+	 * also calculated only from the non-null values. All we need to do is
+	 * simply multiply that with the non-NULL frequency.
+	 */
+	for (i = 0; i < STATISTIC_NUM_SLOTS; i++)
+	{
+		if (stats->stakind[i] == STATISTIC_KIND_MCV)
+		{
+			int j;
+			for (j = 0; j < stats->numnumbers[i]; j++)
+				stats->stanumbers[i][j] *= freq;
+		}
+	}
+}
+
+/*
+ * jsonAnalyzeMakeScalarStats
+ *		Serialize scalar stats into a JSON representation.
+ *
+ * We simply produce a JSON document with a list of predefined keys:
+ *
+ * - nullfrac
+ * - distinct
+ * - width
+ * - correlation
+ * - mcv or histogram
+ *
+ * For the mcv / histogram, we store a nested values / numbers.
+ */
+static JsonbValue *
+jsonAnalyzeMakeScalarStats(JsonbParseState **ps, const char *name,
+							const VacAttrStats *stats)
+{
+	JsonbValue	val;
+	int			i;
+	int			j;
+
+	pushJsonbKey(ps, &val, name);
+
+	pushJsonbValue(ps, WJB_BEGIN_OBJECT, NULL);
+
+	pushJsonbKeyValueFloat(ps, &val, "nullfrac", stats->stanullfrac);
+	pushJsonbKeyValueFloat(ps, &val, "distinct", stats->stadistinct);
+	pushJsonbKeyValueInteger(ps, &val, "width", stats->stawidth);
+
+	for (i = 0; i < STATISTIC_NUM_SLOTS; i++)
+	{
+		if (!stats->stakind[i])
+			break;
+
+		switch (stats->stakind[i])
+		{
+			case STATISTIC_KIND_MCV:
+				pushJsonbKey(ps, &val, "mcv");
+				break;
+
+			case STATISTIC_KIND_HISTOGRAM:
+				pushJsonbKey(ps, &val, "histogram");
+				break;
+
+			case STATISTIC_KIND_CORRELATION:
+				pushJsonbKeyValueFloat(ps, &val, "correlation",
+									   stats->stanumbers[i][0]);
+				continue;
+
+			default:
+				elog(ERROR, "unexpected stakind %d", stats->stakind[i]);
+				break;
+		}
+
+		pushJsonbValue(ps, WJB_BEGIN_OBJECT, NULL);
+
+		if (stats->numvalues[i] > 0)
+		{
+			pushJsonbKey(ps, &val, "values");
+			pushJsonbValue(ps, WJB_BEGIN_ARRAY, NULL);
+			for (j = 0; j < stats->numvalues[i]; j++)
+			{
+				Datum v = stats->stavalues[i][j];
+				if (stats->attrtypid == JSONBOID)
+					pushJsonbElemBinary(ps, &val, DatumGetJsonbP(v));
+				else if (stats->attrtypid == TEXTOID)
+					pushJsonbElemText(ps, &val, DatumGetTextP(v));
+				else if (stats->attrtypid == NUMERICOID)
+					pushJsonbElemNumeric(ps, &val, DatumGetNumeric(v));
+				else if (stats->attrtypid == INT4OID)
+					pushJsonbElemInteger(ps, &val, DatumGetInt32(v));
+				else
+					elog(ERROR, "unexpected stat value type %d",
+						 stats->attrtypid);
+			}
+			pushJsonbValue(ps, WJB_END_ARRAY, NULL);
+		}
+
+		if (stats->numnumbers[i] > 0)
+		{
+			pushJsonbKey(ps, &val, "numbers");
+			pushJsonbValue(ps, WJB_BEGIN_ARRAY, NULL);
+			for (j = 0; j < stats->numnumbers[i]; j++)
+				pushJsonbElemFloat(ps, &val, stats->stanumbers[i][j]);
+			pushJsonbValue(ps, WJB_END_ARRAY, NULL);
+		}
+
+		pushJsonbValue(ps, WJB_END_OBJECT, NULL);
+	}
+
+	return pushJsonbValue(ps, WJB_END_OBJECT, NULL);
+}
+
+/*
+ * jsonAnalyzeBuildPathStats
+ *		Serialize statistics for a particular json path.
+ *
+ * This includes both the per-column stats (stored in "json" key) and the
+ * JSON specific stats (like frequencies of different object types).
+ */
+static Jsonb *
+jsonAnalyzeBuildPathStats(JsonPathAnlStats *pstats)
+{
+	const JsonValueStats *vstats = &pstats->vstats;
+	float4				freq = pstats->freq;
+	bool				full = !!pstats->path.parent;
+	JsonbValue			val;
+	JsonbValue		   *jbv;
+	JsonbParseState	   *ps = NULL;
+
+	pushJsonbValue(&ps, WJB_BEGIN_OBJECT, NULL);
+
+	pushJsonbKeyValueString(&ps, &val, "path", pstats->pathstr);
+
+	pushJsonbKeyValueFloat(&ps, &val, "freq", freq);
+
+	pushJsonbKeyValueFloat(&ps, &val, "freq_null",
+						   freq * vstats->nnulls /
+								  vstats->jsons.values.count);
+
+	pushJsonbKeyValueFloat(&ps, &val, "freq_boolean",
+						   freq * (vstats->booleans.nfalse +
+								   vstats->booleans.ntrue) /
+								  vstats->jsons.values.count);
+
+	pushJsonbKeyValueFloat(&ps, &val, "freq_string",
+						   freq * vstats->strings.values.count /
+								  vstats->jsons.values.count);
+
+	pushJsonbKeyValueFloat(&ps, &val, "freq_numeric",
+						   freq * vstats->numerics.values.count /
+								  vstats->jsons.values.count);
+
+	pushJsonbKeyValueFloat(&ps, &val, "freq_array",
+						   freq * vstats->narrays /
+								  vstats->jsons.values.count);
+
+	pushJsonbKeyValueFloat(&ps, &val, "freq_object",
+						   freq * vstats->nobjects /
+								  vstats->jsons.values.count);
+
+	/*
+	 * XXX not sure why we keep length and array length stats at this level.
+	 * Aren't those covered by the per-column stats? We certainly have
+	 * frequencies for array elements etc.
+	 */
+	if (pstats->vstats.lens.values.count)
+		jsonAnalyzeMakeScalarStats(&ps, "length", &vstats->lens.stats);
+
+	if (pstats->path.len == -1)
+	{
+		JsonPathAnlStats *parent = (JsonPathAnlStats *) pstats->path.parent;
+
+		pushJsonbKeyValueFloat(&ps, &val, "avg_array_length",
+							   (float4) vstats->jsons.values.count /
+										parent->vstats.narrays);
+
+		jsonAnalyzeMakeScalarStats(&ps, "array_length",
+									&parent->vstats.arrlens.stats);
+	}
+
+	if (full)
+	{
+#ifdef JSON_ANALYZE_SCALARS
+		jsonAnalyzeMakeScalarStats(&ps, "string", &vstats->strings.stats);
+		jsonAnalyzeMakeScalarStats(&ps, "numeric", &vstats->numerics.stats);
+#endif
+		jsonAnalyzeMakeScalarStats(&ps, "json", &vstats->jsons.stats);
+	}
+
+	jbv = pushJsonbValue(&ps, WJB_END_OBJECT, NULL);
+
+	return JsonbValueToJsonb(jbv);
+}
+
+/*
+ * jsonAnalyzeCalcPathFreq
+ *		Calculate path frequency, i.e. how many documents contain this path.
+ */
+static void
+jsonAnalyzeCalcPathFreq(JsonAnalyzeContext *ctx, JsonPathAnlStats *pstats)
+{
+	JsonPathAnlStats  *parent = (JsonPathAnlStats *) pstats->path.parent;
+
+	if (parent)
+	{
+		pstats->freq = parent->freq *
+			(pstats->path.len == -1 ? parent->vstats.narrays
+									: pstats->vstats.jsons.values.count) /
+			parent->vstats.jsons.values.count;
+
+		CLAMP_PROBABILITY(pstats->freq);
+	}
+	else
+		pstats->freq = (double) ctx->analyzed_cnt / ctx->samplerows;
+}
+
+/*
+ * jsonAnalyzePath
+ *		Build statistics for values accumulated for this path.
+ *
+ * We're done with accumulating values for this path, so calculate the
+ * statistics for the various arrays.
+ *
+ * XXX I wonder if we could introduce some simple heuristict on which
+ * paths to keep, similarly to what we do for MCV lists. For example a
+ * path that occurred just once is not very interesting, so we could
+ * decide to ignore it and not build the stats. Although that won't
+ * save much, because there'll be very few values accumulated.
+ */
+static void
+jsonAnalyzePath(JsonAnalyzeContext *ctx, JsonPathAnlStats *pstats)
+{
+	MemoryContext		oldcxt;
+	JsonValueStats	   *vstats = &pstats->vstats;
+
+	jsonAnalyzeCalcPathFreq(ctx, pstats);
+
+	/* values combining all object types */
+	jsonAnalyzePathValues(ctx, &vstats->jsons, JSONBOID, pstats->freq);
+
+	/*
+	 * lengths and array lengths
+	 *
+	 * XXX Not sure why we divide it by the number of json values?
+	 */
+	jsonAnalyzePathValues(ctx, &vstats->lens, INT4OID,
+						  pstats->freq * vstats->lens.values.count /
+										 vstats->jsons.values.count);
+	jsonAnalyzePathValues(ctx, &vstats->arrlens, INT4OID,
+						  pstats->freq * vstats->arrlens.values.count /
+										 vstats->jsons.values.count);
+
+#ifdef JSON_ANALYZE_SCALARS
+	/* stats for values of string/numeric types only */
+	jsonAnalyzePathValues(ctx, &vstats->strings, TEXTOID, pstats->freq);
+	jsonAnalyzePathValues(ctx, &vstats->numerics, NUMERICOID, pstats->freq);
+#endif
+
+	oldcxt = MemoryContextSwitchTo(ctx->stats->anl_context);
+	pstats->stats = jsonAnalyzeBuildPathStats(pstats);
+	MemoryContextSwitchTo(oldcxt);
+}
+
+/*
+ * JsonPathStatsCompare
+ *		Compare two path stats (by path string).
+ *
+ * We store the stats sorted by path string, and this is the comparator.
+ */
+static int
+JsonPathStatsCompare(const void *pv1, const void *pv2)
+{
+	return strcmp((*((const JsonPathAnlStats **) pv1))->pathstr,
+				  (*((const JsonPathAnlStats **) pv2))->pathstr);
+}
+
+/*
+ * jsonAnalyzeSortPaths
+ *		Reads all stats stored in the hash table and sorts them.
+ *
+ * XXX It's a bit strange we simply store the result in the context instead
+ * of just returning it.
+ */
+static void
+jsonAnalyzeSortPaths(JsonAnalyzeContext *ctx)
+{
+	HASH_SEQ_STATUS		hseq;
+	JsonPathAnlStats   *path;
+	int					i;
+
+	ctx->npaths = hash_get_num_entries(ctx->pathshash) + 1;
+	ctx->paths = MemoryContextAlloc(ctx->mcxt,
+									sizeof(*ctx->paths) * ctx->npaths);
+
+	ctx->paths[0] = ctx->root;
+
+	hash_seq_init(&hseq, ctx->pathshash);
+
+	for (i = 1; (path = hash_seq_search(&hseq)); i++)
+		ctx->paths[i] = path;
+
+	pg_qsort(ctx->paths, ctx->npaths, sizeof(*ctx->paths),
+			 JsonPathStatsCompare);
+}
+
+/*
+ * jsonAnalyzePaths
+ *		Sort the paths and calculate statistics for each of them.
+ *
+ * Now that we're done with processing the documents, we sort the paths
+ * we extracted and calculate stats for each of them.
+ *
+ * XXX I wonder if we could do this in two phases, to maybe not collect
+ * (or even accumulate) values for paths that are not interesting.
+ */
+static void
+jsonAnalyzePaths(JsonAnalyzeContext	*ctx)
+{
+	int	i;
+
+	jsonAnalyzeSortPaths(ctx);
+
+	for (i = 0; i < ctx->npaths; i++)
+		jsonAnalyzePath(ctx, ctx->paths[i]);
+}
+
+/*
+ * jsonAnalyzeBuildPathStatsArray
+ *		???
+ */
+static Datum *
+jsonAnalyzeBuildPathStatsArray(JsonPathAnlStats **paths, int npaths, int *nvals,
+								const char *prefix, int prefixlen)
+{
+	Datum	   *values = palloc(sizeof(Datum) * (npaths + 1));
+	JsonbValue *jbvprefix = palloc(sizeof(JsonbValue));
+	int			i;
+
+	JsonValueInitStringWithLen(jbvprefix,
+							   memcpy(palloc(prefixlen), prefix, prefixlen),
+							   prefixlen);
+
+	values[0] = JsonbPGetDatum(JsonbValueToJsonb(jbvprefix));
+
+	for (i = 0; i < npaths; i++)
+		values[i + 1] = JsonbPGetDatum(paths[i]->stats);
+
+	*nvals = npaths + 1;
+
+	return values;
+}
+
+/*
+ * jsonAnalyzeMakeStats
+ *		???
+ */
+static Datum *
+jsonAnalyzeMakeStats(JsonAnalyzeContext *ctx, int *numvalues)
+{
+	Datum		   *values;
+	MemoryContext	oldcxt = MemoryContextSwitchTo(ctx->stats->anl_context);
+
+	values = jsonAnalyzeBuildPathStatsArray(ctx->paths, ctx->npaths,
+											numvalues, "$", 1);
+
+	MemoryContextSwitchTo(oldcxt);
+
+	return values;
+}
+
+/*
+ * jsonAnalyzeBuildSubPathsData
+ *		???
+ */
+bool
+jsonAnalyzeBuildSubPathsData(Datum *pathsDatums, int npaths, int index,
+							 const char	*path, int pathlen,
+							 bool includeSubpaths, float4 nullfrac,
+							 Datum *pvals, Datum *pnums)
+{
+	JsonPathAnlStats  **pvalues = palloc(sizeof(*pvalues) * npaths);
+	Datum			   *values;
+	Datum				numbers[1];
+	JsonbValue			pathkey;
+	int					nsubpaths = 0;
+	int					nvalues;
+	int					i;
+
+	JsonValueInitStringWithLen(&pathkey, "path", 4);
+
+	for (i = index; i < npaths; i++)
+	{
+		Jsonb	   *jb = DatumGetJsonbP(pathsDatums[i]);
+		JsonbValue *jbv = findJsonbValueFromContainer(&jb->root, JB_FOBJECT,
+													  &pathkey);
+
+		if (!jbv || jbv->type != jbvString ||
+			jbv->val.string.len < pathlen ||
+			memcmp(jbv->val.string.val, path, pathlen))
+			break;
+
+		pfree(jbv);
+
+		pvalues[nsubpaths] = palloc(sizeof(**pvalues));
+		pvalues[nsubpaths]->stats = jb;
+
+		nsubpaths++;
+
+		if (!includeSubpaths)
+			break;
+	}
+
+	if (!nsubpaths)
+	{
+		pfree(pvalues);
+		return false;
+	}
+
+	values = jsonAnalyzeBuildPathStatsArray(pvalues, nsubpaths, &nvalues,
+											path, pathlen);
+	*pvals = PointerGetDatum(construct_array(values, nvalues, JSONBOID, -1,
+											 false, 'i'));
+
+	pfree(pvalues);
+	pfree(values);
+
+	numbers[0] = Float4GetDatum(nullfrac);
+	*pnums = PointerGetDatum(construct_array(numbers, 1, FLOAT4OID, 4,
+											 true /*FLOAT4PASSBYVAL*/, 'i'));
+
+	return true;
+}
+
+/*
+ * jsonAnalyzeInit
+ *		Initialize the analyze context so that we can start adding paths.
+ */
+static void
+jsonAnalyzeInit(JsonAnalyzeContext *ctx, VacAttrStats *stats,
+				AnalyzeAttrFetchFunc fetchfunc,
+				int samplerows, double totalrows)
+{
+	HASHCTL	hash_ctl;
+
+	memset(ctx, 0, sizeof(*ctx));
+
+	ctx->stats = stats;
+	ctx->fetchfunc = fetchfunc;
+	ctx->mcxt = CurrentMemoryContext;
+	ctx->samplerows = samplerows;
+	ctx->totalrows = totalrows;
+	ctx->target = stats->attr->attstattarget;
+	ctx->scalarsOnly = false;
+
+	MemSet(&hash_ctl, 0, sizeof(hash_ctl));
+	hash_ctl.keysize = sizeof(JsonPathEntry);
+	hash_ctl.entrysize = sizeof(JsonPathAnlStats);
+	hash_ctl.hash = JsonPathHash;
+	hash_ctl.match = JsonPathMatch;
+	hash_ctl.hcxt = ctx->mcxt;
+
+	ctx->pathshash = hash_create("JSON analyze path table", 100, &hash_ctl,
+					HASH_ELEM | HASH_FUNCTION | HASH_COMPARE | HASH_CONTEXT);
+
+	ctx->root = MemoryContextAllocZero(ctx->mcxt, sizeof(JsonPathAnlStats));
+	ctx->root->pathstr = "$";
+}
+
+/*
+ * jsonAnalyzePass
+ *		One analysis pass over the JSON column.
+ *
+ * Performs one analysis pass on the JSON documents, and passes them to the
+ * custom analyzefunc.
+ */
+static void
+jsonAnalyzePass(JsonAnalyzeContext *ctx,
+				void (*analyzefunc)(JsonAnalyzeContext *, Jsonb *, void *),
+				void *analyzearg)
+{
+	int	row_num;
+
+	MemoryContext	tmpcxt = AllocSetContextCreate(CurrentMemoryContext,
+												"Json Analyze Pass Context",
+												ALLOCSET_DEFAULT_MINSIZE,
+												ALLOCSET_DEFAULT_INITSIZE,
+												ALLOCSET_DEFAULT_MAXSIZE);
+
+	MemoryContext	oldcxt = MemoryContextSwitchTo(tmpcxt);
+
+	ctx->null_cnt = 0;
+	ctx->analyzed_cnt = 0;
+	ctx->total_width = 0;
+
+	/* Loop over the arrays. */
+	for (row_num = 0; row_num < ctx->samplerows; row_num++)
+	{
+		Datum		value;
+		Jsonb	   *jb;
+		Size		width;
+		bool		isnull;
+
+		vacuum_delay_point();
+
+		value = ctx->fetchfunc(ctx->stats, row_num, &isnull);
+
+		if (isnull)
+		{
+			/* json is null, just count that */
+			ctx->null_cnt++;
+			continue;
+		}
+
+		width = toast_raw_datum_size(value);
+
+		ctx->total_width += VARSIZE_ANY(DatumGetPointer(value)); /* FIXME raw width? */
+
+		/* Skip too-large values. */
+#define JSON_WIDTH_THRESHOLD (100 * 1024)
+
+		if (width > JSON_WIDTH_THRESHOLD)
+			continue;
+
+		ctx->analyzed_cnt++;
+
+		jb = DatumGetJsonbP(value);
+
+		MemoryContextSwitchTo(oldcxt);
+
+		analyzefunc(ctx, jb, analyzearg);
+
+		oldcxt = MemoryContextSwitchTo(tmpcxt);
+		MemoryContextReset(tmpcxt);
+	}
+
+	MemoryContextSwitchTo(oldcxt);
+}
+
+/*
+ * compute_json_stats() -- compute statistics for a json column
+ */
+static void
+compute_json_stats(VacAttrStats *stats, AnalyzeAttrFetchFunc fetchfunc,
+				   int samplerows, double totalrows)
+{
+	JsonAnalyzeContext	ctx;
+
+	jsonAnalyzeInit(&ctx, stats, fetchfunc, samplerows, totalrows);
+
+	/* XXX Not sure what the first branch is doing (or supposed to)? */
+	if (false)
+	{
+		jsonAnalyzePass(&ctx, jsonAnalyzeJson, NULL);
+		jsonAnalyzePaths(&ctx);
+	}
+	else
+	{
+		int				i;
+		MemoryContext	oldcxt;
+		MemoryContext	tmpcxt = AllocSetContextCreate(CurrentMemoryContext,
+													"Json Analyze Tmp Context",
+													ALLOCSET_DEFAULT_MINSIZE,
+													ALLOCSET_DEFAULT_INITSIZE,
+													ALLOCSET_DEFAULT_MAXSIZE);
+
+		elog(DEBUG1, "analyzing %s attribute \"%s\"",
+			stats->attrtypid == JSONBOID ? "jsonb" : "json",
+			NameStr(stats->attr->attname));
+
+		elog(DEBUG1, "collecting json paths");
+
+		oldcxt = MemoryContextSwitchTo(tmpcxt);
+
+		/*
+		 * XXX It's not immediately clear why this is (-1) and not simply
+		 * NULL. It crashes, so presumably it's used to tweak the behavior,
+		 * but it's not clear why/how, and it affects place that is pretty
+		 * far away, and so not obvious. We should use some sort of flag
+		 * with a descriptive name instead.
+		 *
+		 * XXX If I understand correctly, we simply collect all paths first,
+		 * without accumulating any Values. And then in the next step we
+		 * process each path independently, probably to save memory (we
+		 * don't want to accumulate all values for all paths, with a lot
+		 * of duplicities).
+		 */
+		jsonAnalyzePass(&ctx, jsonAnalyzeJson, (void *) -1);
+		jsonAnalyzeSortPaths(&ctx);
+
+		MemoryContextReset(tmpcxt);
+
+		for (i = 0; i < ctx.npaths; i++)
+		{
+			JsonPathAnlStats *path = ctx.paths[i];
+
+			elog(DEBUG1, "analyzing json path (%d/%d) %s",
+				 i + 1, ctx.npaths, path->pathstr);
+
+			jsonAnalyzePass(&ctx, jsonAnalyzeJsonPath, path);
+			jsonAnalyzePath(&ctx, path);
+
+			MemoryContextReset(tmpcxt);
+		}
+
+		MemoryContextSwitchTo(oldcxt);
+
+		MemoryContextDelete(tmpcxt);
+	}
+
+	/* We can only compute real stats if we found some non-null values. */
+	if (ctx.null_cnt >= samplerows)
+	{
+		/* We found only nulls; assume the column is entirely null */
+		stats->stats_valid = true;
+		stats->stanullfrac = 1.0;
+		stats->stawidth = 0;		/* "unknown" */
+		stats->stadistinct = 0.0;	/* "unknown" */
+	}
+	else if (!ctx.analyzed_cnt)
+	{
+		int	nonnull_cnt = samplerows - ctx.null_cnt;
+
+		/* We found some non-null values, but they were all too wide */
+		stats->stats_valid = true;
+		/* Do the simple null-frac and width stats */
+		stats->stanullfrac = (double) ctx.null_cnt / (double) samplerows;
+		stats->stawidth = ctx.total_width / (double) nonnull_cnt;
+		/* Assume all too-wide values are distinct, so it's a unique column */
+		stats->stadistinct = -1.0 * (1.0 - stats->stanullfrac);
+	}
+	else
+	{
+		VacAttrStats   *jsstats = &ctx.root->vstats.jsons.stats;
+		int				i;
+		int				empty_slot = -1;
+
+		stats->stats_valid = true;
+
+		stats->stanullfrac	= jsstats->stanullfrac;
+		stats->stawidth		= jsstats->stawidth;
+		stats->stadistinct	= jsstats->stadistinct;
+
+		/*
+		 * We need to store the statistics the statistics slots. We simply
+		 * store the regular stats in the first slots, and then we put the
+		 * JSON stats into the first empty slot.
+		 */
+		for (i = 0; i < STATISTIC_NUM_SLOTS; i++)
+		{
+			/* once we hit an empty slot, we're done */
+			if (!jsstats->staop[i])
+			{
+				empty_slot = i;		/* remember the empty slot */
+				break;
+			}
+
+			stats->stakind[i] 		= jsstats->stakind[i];
+			stats->staop[i] 		= jsstats->staop[i];
+			stats->stanumbers[i] 	= jsstats->stanumbers[i];
+			stats->stavalues[i] 	= jsstats->stavalues[i];
+			stats->statypid[i] 		= jsstats->statypid[i];
+			stats->statyplen[i] 	= jsstats->statyplen[i];
+			stats->statypbyval[i] 	= jsstats->statypbyval[i];
+			stats->statypalign[i] 	= jsstats->statypalign[i];
+			stats->numnumbers[i] 	= jsstats->numnumbers[i];
+			stats->numvalues[i] 	= jsstats->numvalues[i];
+		}
+
+		Assert((empty_slot >= 0) && (empty_slot < STATISTIC_NUM_SLOTS));
+
+		stats->stakind[empty_slot] = STATISTIC_KIND_JSON;
+		stats->staop[empty_slot] = InvalidOid;
+		stats->numnumbers[empty_slot] = 1;
+		stats->stanumbers[empty_slot] = MemoryContextAlloc(stats->anl_context,
+														   sizeof(float4));
+		stats->stanumbers[empty_slot][0] = 0.0; /* nullfrac */
+		stats->stavalues[empty_slot] =
+				jsonAnalyzeMakeStats(&ctx, &stats->numvalues[empty_slot]);
+
+		/* We are storing jsonb values */
+		/* XXX Could the parameters be different on other platforms? */
+		stats->statypid[empty_slot] = JSONBOID;
+		stats->statyplen[empty_slot] = -1;
+		stats->statypbyval[empty_slot] = false;
+		stats->statypalign[empty_slot] = 'i';
+	}
+}
+
+/*
+ * json_typanalyze -- typanalyze function for jsonb
+ */
+Datum
+jsonb_typanalyze(PG_FUNCTION_ARGS)
+{
+	VacAttrStats *stats = (VacAttrStats *) PG_GETARG_POINTER(0);
+	Form_pg_attribute attr = stats->attr;
+
+	/* If the attstattarget column is negative, use the default value */
+	/* NB: it is okay to scribble on stats->attr since it's a copy */
+	if (attr->attstattarget < 0)
+		attr->attstattarget = default_statistics_target;
+
+	stats->compute_stats = compute_json_stats;
+	/* see comment about the choice of minrows in commands/analyze.c */
+	stats->minrows = 300 * attr->attstattarget;
+
+	PG_RETURN_BOOL(true);
+}
diff --git a/src/backend/utils/adt/jsonpath_exec.c b/src/backend/utils/adt/jsonpath_exec.c
index 078aaef5392..16a08561c10 100644
--- a/src/backend/utils/adt/jsonpath_exec.c
+++ b/src/backend/utils/adt/jsonpath_exec.c
@@ -1723,7 +1723,7 @@ executeLikeRegex(JsonPathItem *jsp, JsonbValue *str, JsonbValue *rarg,
 		cxt->cflags = jspConvertRegexFlags(jsp->content.like_regex.flags);
 	}
 
-	if (RE_compile_and_execute(cxt->regex, str->val.string.val,
+	if (RE_compile_and_execute(cxt->regex, unconstify(char *, str->val.string.val),
 							   str->val.string.len,
 							   cxt->cflags, DEFAULT_COLLATION_OID, 0, NULL))
 		return jpbTrue;
diff --git a/src/include/catalog/pg_operator.dat b/src/include/catalog/pg_operator.dat
index c0ff8da722c..736f4a7ec3b 100644
--- a/src/include/catalog/pg_operator.dat
+++ b/src/include/catalog/pg_operator.dat
@@ -3175,7 +3175,7 @@
 { oid => '3211', oid_symbol => 'JsonbObjectFieldOperator',
   descr => 'get jsonb object field',
   oprname => '->', oprleft => 'jsonb', oprright => 'text', oprresult => 'jsonb',
-  oprcode => 'jsonb_object_field' },
+  oprcode => 'jsonb_object_field', oprstat => 'jsonb_stats' },
 { oid => '3477', oid_symbol => 'JsonbObjectFieldTextOperator',
   descr => 'get jsonb object field as text',
   oprname => '->>', oprleft => 'jsonb', oprright => 'text', oprresult => 'text',
@@ -3183,7 +3183,7 @@
 { oid => '3212', oid_symbol => 'JsonbArrayElementOperator',
   descr => 'get jsonb array element',
   oprname => '->', oprleft => 'jsonb', oprright => 'int4', oprresult => 'jsonb',
-  oprcode => 'jsonb_array_element' },
+  oprcode => 'jsonb_array_element', oprstat => 'jsonb_stats' },
 { oid => '3481', oid_symbol => 'JsonbArrayElementTextOperator',
   descr => 'get jsonb array element as text',
   oprname => '->>', oprleft => 'jsonb', oprright => 'int4', oprresult => 'text',
@@ -3191,7 +3191,8 @@
 { oid => '3213', oid_symbol => 'JsonbExtractPathOperator',
   descr => 'get value from jsonb with path elements',
   oprname => '#>', oprleft => 'jsonb', oprright => '_text',
-  oprresult => 'jsonb', oprcode => 'jsonb_extract_path' },
+  oprresult => 'jsonb', oprcode => 'jsonb_extract_path',
+  oprstat => 'jsonb_stats' },
 { oid => '3206', oid_symbol => 'JsonbExtractPathTextOperator',
   descr => 'get value from jsonb as text with path elements',
   oprname => '#>>', oprleft => 'jsonb', oprright => '_text',
@@ -3232,20 +3233,20 @@
   oprrest => 'matchingsel', oprjoin => 'matchingjoinsel' },
 { oid => '3247', oid_symbol => 'JsonbExistsOperator', descr => 'key exists',
   oprname => '?', oprleft => 'jsonb', oprright => 'text', oprresult => 'bool',
-  oprcode => 'jsonb_exists', oprrest => 'matchingsel',
+  oprcode => 'jsonb_exists', oprrest => 'jsonb_sel',
   oprjoin => 'matchingjoinsel' },
 { oid => '3248', oid_symbol => 'JsonbExistsAnyOperator', descr => 'any key exists',
   oprname => '?|', oprleft => 'jsonb', oprright => '_text', oprresult => 'bool',
-  oprcode => 'jsonb_exists_any', oprrest => 'matchingsel',
+  oprcode => 'jsonb_exists_any', oprrest => 'jsonb_sel',
   oprjoin => 'matchingjoinsel' },
 { oid => '3249', oid_symbol => 'JsonbExistsAllOperator', descr => 'all keys exist',
   oprname => '?&', oprleft => 'jsonb', oprright => '_text', oprresult => 'bool',
-  oprcode => 'jsonb_exists_all', oprrest => 'matchingsel',
+  oprcode => 'jsonb_exists_all', oprrest => 'jsonb_sel',
   oprjoin => 'matchingjoinsel' },
 { oid => '3250', oid_symbol => 'JsonbContainedOperator', descr => 'is contained by',
   oprname => '<@', oprleft => 'jsonb', oprright => 'jsonb', oprresult => 'bool',
   oprcom => '@>(jsonb,jsonb)', oprcode => 'jsonb_contained',
-  oprrest => 'matchingsel', oprjoin => 'matchingjoinsel' },
+  oprrest => 'jsonb_sel', oprjoin => 'matchingjoinsel' },
 { oid => '3284', descr => 'concatenate',
   oprname => '||', oprleft => 'jsonb', oprright => 'jsonb',
   oprresult => 'jsonb', oprcode => 'jsonb_concat' },
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 4d992dc2241..73ba2d08690 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11715,4 +11715,15 @@
   prorettype => 'bytea', proargtypes => 'pg_brin_minmax_multi_summary',
   prosrc => 'brin_minmax_multi_summary_send' },
 
+# jsonb statistics
+{ oid => '8526', descr => 'jsonb typanalyze',
+  proname => 'jsonb_typanalyze', provolatile => 's', prorettype => 'bool',
+  proargtypes => 'internal', prosrc => 'jsonb_typanalyze' },
+{ oid => '8527', descr => 'jsonb selectivity estimation',
+  proname => 'jsonb_sel', provolatile => 's', prorettype => 'float8',
+  proargtypes => 'internal oid internal int4', prosrc => 'jsonb_sel' },
+{ oid => '8528', descr => 'jsonb statsistics estimation',
+  proname => 'jsonb_stats', provolatile => 's', prorettype => 'bool',
+  proargtypes => 'internal internal int4 internal', prosrc => 'jsonb_stats' },
+
 ]
diff --git a/src/include/catalog/pg_statistic.h b/src/include/catalog/pg_statistic.h
index 4f95d7ade47..85e6f9e0fb6 100644
--- a/src/include/catalog/pg_statistic.h
+++ b/src/include/catalog/pg_statistic.h
@@ -277,6 +277,8 @@ DECLARE_FOREIGN_KEY((starelid, staattnum), pg_attribute, (attrelid, attnum));
  */
 #define STATISTIC_KIND_BOUNDS_HISTOGRAM  7
 
+#define STATISTIC_KIND_JSON 8
+
 #endif							/* EXPOSE_TO_CLIENT_CODE */
 
 #endif							/* PG_STATISTIC_H */
diff --git a/src/include/catalog/pg_type.dat b/src/include/catalog/pg_type.dat
index f3d94f3cf5d..1f881eef516 100644
--- a/src/include/catalog/pg_type.dat
+++ b/src/include/catalog/pg_type.dat
@@ -445,7 +445,7 @@
   typname => 'jsonb', typlen => '-1', typbyval => 'f', typcategory => 'U',
   typsubscript => 'jsonb_subscript_handler', typinput => 'jsonb_in',
   typoutput => 'jsonb_out', typreceive => 'jsonb_recv', typsend => 'jsonb_send',
-  typalign => 'i', typstorage => 'x' },
+  typanalyze => 'jsonb_typanalyze', typalign => 'i', typstorage => 'x' },
 { oid => '4072', array_type_oid => '4073', descr => 'JSON path',
   typname => 'jsonpath', typlen => '-1', typbyval => 'f', typcategory => 'U',
   typinput => 'jsonpath_in', typoutput => 'jsonpath_out',
diff --git a/src/include/utils/json_selfuncs.h b/src/include/utils/json_selfuncs.h
new file mode 100644
index 00000000000..c8999d105bc
--- /dev/null
+++ b/src/include/utils/json_selfuncs.h
@@ -0,0 +1,100 @@
+/*-------------------------------------------------------------------------
+ *
+ * json_selfuncs.h
+ *	  JSON cost estimation functions.
+ *
+ *
+ * Portions Copyright (c) 2016, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *    src/include/utils/json_selfuncs.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef JSON_SELFUNCS_H_
+#define JSON_SELFUNCS_H 1
+
+#include "postgres.h"
+#include "access/htup.h"
+#include "utils/jsonb.h"
+#include "utils/lsyscache.h"
+#include "utils/selfuncs.h"
+
+typedef struct JsonStatData
+{
+	AttStatsSlot	attslot;
+	HeapTuple		statsTuple;
+	RelOptInfo	   *rel;
+	Datum		   *values;
+	int				nvalues;
+	float4			nullfrac;
+	const char	   *prefix;
+	int				prefixlen;
+	bool			acl_ok;
+} JsonStatData, *JsonStats;
+
+typedef enum
+{
+	JsonPathStatsValues,
+	JsonPathStatsLength,
+	JsonPathStatsArrayLength,
+} JsonPathStatsType;
+
+typedef struct JsonPathStatsData
+{
+	Datum			   *datum;
+	JsonStats		   	data;
+	char			   *path;
+	int					pathlen;
+	JsonPathStatsType	type;
+} JsonPathStatsData, *JsonPathStats;
+
+typedef enum JsonStatType
+{
+	JsonStatJsonb,
+	JsonStatJsonbWithoutSubpaths,
+	JsonStatText,
+	JsonStatString,
+	JsonStatNumeric,
+	JsonStatFreq,
+} JsonStatType;
+
+extern bool jsonStatsInit(JsonStats stats, const VariableStatData *vardata);
+extern void jsonStatsRelease(JsonStats data);
+
+extern JsonPathStats jsonStatsGetPathStatsStr(JsonStats stats,
+												const char *path, int pathlen);
+
+extern JsonPathStats jsonPathStatsGetSubpath(JsonPathStats stats,
+										const char *subpath, int subpathlen);
+
+extern bool jsonPathStatsGetNextKeyStats(JsonPathStats stats,
+										JsonPathStats *keystats, bool keysOnly);
+
+extern JsonPathStats jsonPathStatsGetLengthStats(JsonPathStats pstats);
+
+extern float4 jsonPathStatsGetFreq(JsonPathStats pstats, float4 defaultfreq);
+
+extern float4 jsonPathStatsGetTypeFreq(JsonPathStats pstats,
+									JsonbValueType type, float4 defaultfreq);
+
+extern float4 jsonPathStatsGetAvgArraySize(JsonPathStats pstats);
+
+extern Selectivity jsonPathStatsGetArrayIndexSelectivity(JsonPathStats pstats,
+														 int index);
+
+extern Selectivity jsonSelectivity(JsonPathStats stats, Datum scalar, Oid oper);
+
+
+extern bool jsonAnalyzeBuildSubPathsData(Datum *pathsDatums,
+										 int npaths, int index,
+										 const char	*path, int pathlen,
+										 bool includeSubpaths, float4 nullfrac,
+										 Datum *pvals, Datum *pnums);
+
+extern Datum jsonb_typanalyze(PG_FUNCTION_ARGS);
+extern Datum jsonb_stats(PG_FUNCTION_ARGS);
+extern Datum jsonb_sel(PG_FUNCTION_ARGS);
+
+#endif /* JSON_SELFUNCS_H */
diff --git a/src/test/regress/expected/jsonb_stats.out b/src/test/regress/expected/jsonb_stats.out
new file mode 100644
index 00000000000..0badf8e8479
--- /dev/null
+++ b/src/test/regress/expected/jsonb_stats.out
@@ -0,0 +1,713 @@
+CREATE OR REPLACE FUNCTION explain_jsonb(sql_query text)
+RETURNS TABLE(explain_line json) AS
+$$
+BEGIN
+	RETURN QUERY EXECUTE 'EXPLAIN (ANALYZE, FORMAT json) ' || sql_query;
+END;
+$$ LANGUAGE plpgsql;
+CREATE OR REPLACE FUNCTION get_plan_and_actual_rows(sql_query text)
+RETURNS TABLE(plan integer, actual integer) AS
+$$
+	SELECT
+		(plan->>'Plan Rows')::integer plan,
+		(plan->>'Actual Rows')::integer actual
+	FROM (
+		SELECT explain_jsonb(sql_query) #> '{0,Plan,Plans,0}'
+	) p(plan)
+$$ LANGUAGE sql;
+CREATE OR REPLACE FUNCTION check_estimate(sql_query text, accuracy real)
+RETURNS boolean AS
+$$
+	SELECT plan BETWEEN actual / (1 + accuracy) AND (actual + 1) * (1 + accuracy)
+	FROM (SELECT * FROM get_plan_and_actual_rows(sql_query)) x
+$$ LANGUAGE sql;
+CREATE OR REPLACE FUNCTION check_estimate2(sql_query text, accuracy real)
+RETURNS TABLE(min integer, max integer) AS
+$$
+	SELECT (actual * (1 - accuracy))::integer, ((actual + 1) * (1 + accuracy))::integer
+	FROM (SELECT * FROM get_plan_and_actual_rows(sql_query)) x
+$$ LANGUAGE sql;
+CREATE TABLE jsonb_stats_test(js jsonb);
+INSERT INTO jsonb_stats_test SELECT NULL FROM generate_series(1, 1000);
+INSERT INTO jsonb_stats_test SELECT 'null' FROM generate_series(1, 200);
+INSERT INTO jsonb_stats_test SELECT 'true' FROM generate_series(1, 300);
+INSERT INTO jsonb_stats_test SELECT 'false' FROM generate_series(1, 500);
+INSERT INTO jsonb_stats_test SELECT '12345' FROM generate_series(1, 100);
+INSERT INTO jsonb_stats_test SELECT (1000 * (i % 10))::text::jsonb FROM generate_series(1, 400) i;
+INSERT INTO jsonb_stats_test SELECT i::text::jsonb FROM generate_series(1, 500) i;
+INSERT INTO jsonb_stats_test SELECT '"foo"' FROM generate_series(1, 100);
+INSERT INTO jsonb_stats_test SELECT format('"bar%s"', i % 10)::jsonb FROM generate_series(1, 400) i;
+INSERT INTO jsonb_stats_test SELECT format('"baz%s"', i)::jsonb FROM generate_series(1, 500) i;
+INSERT INTO jsonb_stats_test SELECT '{}' FROM generate_series(1, 100);
+INSERT INTO jsonb_stats_test SELECT jsonb_build_object('foo', 'bar') FROM generate_series(1, 100);
+INSERT INTO jsonb_stats_test SELECT jsonb_build_object('foo', 'baz' || (i % 10)) FROM generate_series(1, 300) i;
+INSERT INTO jsonb_stats_test SELECT jsonb_build_object('foo', i % 10) FROM generate_series(1, 200) i;
+INSERT INTO jsonb_stats_test SELECT jsonb_build_object('"foo \"bar"', i % 10) FROM generate_series(1, 200) i;
+INSERT INTO jsonb_stats_test SELECT '[]' FROM generate_series(1, 100);
+INSERT INTO jsonb_stats_test SELECT '["foo"]' FROM generate_series(1, 200);
+INSERT INTO jsonb_stats_test SELECT '[12345]' FROM generate_series(1, 300);
+INSERT INTO jsonb_stats_test SELECT '[["foo"]]' FROM generate_series(1, 200);
+INSERT INTO jsonb_stats_test SELECT '[{"key": "foo"}]' FROM generate_series(1, 200);
+INSERT INTO jsonb_stats_test SELECT '[null, "foo"]' FROM generate_series(1, 200);
+INSERT INTO jsonb_stats_test SELECT '[null, 12345]' FROM generate_series(1, 300);
+INSERT INTO jsonb_stats_test SELECT '[null, ["foo"]]' FROM generate_series(1, 200);
+INSERT INTO jsonb_stats_test SELECT '[null, {"key": "foo"}]' FROM generate_series(1, 200);
+-- Build random variable-length integer arrays
+SELECT setseed(0.0);
+ setseed 
+---------
+ 
+(1 row)
+
+INSERT INTO jsonb_stats_test
+SELECT jsonb_build_object('array',
+	jsonb_build_array())
+FROM generate_series(1, 1000);
+INSERT INTO jsonb_stats_test
+SELECT jsonb_build_object('array',
+	jsonb_build_array(
+		floor(random() * 10)::int))
+FROM generate_series(1, 4000);
+INSERT INTO jsonb_stats_test
+SELECT jsonb_build_object('array',
+	jsonb_build_array(
+		floor(random() * 10)::int,
+		floor(random() * 10)::int))
+FROM generate_series(1, 3000);
+INSERT INTO jsonb_stats_test
+SELECT jsonb_build_object('array',
+	jsonb_build_array(
+		floor(random() * 10)::int,
+		floor(random() * 10)::int,
+		floor(random() * 10)::int))
+FROM generate_series(1, 2000);
+ANALYZE jsonb_stats_test;
+CREATE OR REPLACE FUNCTION check_jsonb_stats_test_estimate(sql_condition text, accuracy real)
+RETURNS boolean AS
+$$
+	SELECT check_estimate('SELECT count(*) FROM jsonb_stats_test WHERE ' || sql_condition, accuracy)
+$$ LANGUAGE sql;
+DROP FUNCTION IF EXISTS check_jsonb_stats_test_estimate2(text, real);
+NOTICE:  function check_jsonb_stats_test_estimate2(text,pg_catalog.float4) does not exist, skipping
+CREATE OR REPLACE FUNCTION check_jsonb_stats_test_estimate2(sql_condition text, accuracy real)
+RETURNS TABLE(plan integer, actual integer) AS
+$$
+	SELECT get_plan_and_actual_rows('SELECT count(*) FROM jsonb_stats_test WHERE ' || sql_condition)
+$$ LANGUAGE sql;
+-- Check NULL estimate
+SELECT check_jsonb_stats_test_estimate($$js IS NULL$$, 0.03);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'bad_key' IS NULL$$, 0.01);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js #> '{bad_key}' IS NULL$$, 0.01);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 1000000 IS NULL$$, 0.01);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js #> '{1000000}' IS NULL$$, 0.01);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'bad_key1' -> 'bad_key2' IS NULL$$, 0.01);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js #> '{bad_key1,bad_key2}' IS NULL$$, 0.01);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'bad_key1' -> 1 IS NULL$$, 0.01);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js #> '{bad_key1,1}' IS NULL$$, 0.01);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 1000000 -> 'foo' IS NULL$$, 0.01);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js #> '{1000000,foo}' IS NULL$$, 0.01);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'bad_key' = '123'$$, 0.01);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 1000000 = '123'$$, 0.01);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+-- Check null eq estimate
+SELECT check_jsonb_stats_test_estimate($$js =  'null'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> 'null'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+-- Check boolean eq estimate
+SELECT check_jsonb_stats_test_estimate($$js =  'true'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> 'true'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js =  'false'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> 'false'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+-- Check numeric eq estimate
+SELECT check_jsonb_stats_test_estimate($$js = '12345'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js#>'{}' = '12345'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js = '3000'$$, 0.3);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js = '1234'$$, 1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> '6000'$$, 0.2);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+-- Check numeric range estimate
+SELECT check_jsonb_stats_test_estimate($$js < '0'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js < '100'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js < '1000'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js < '3456'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js < '10000'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js < '100000'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js > '100' AND js < '600'$$, 0.5);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js > '6800' AND js < '12000'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+-- Check string eq estimate
+SELECT check_jsonb_stats_test_estimate($$js = '"foo"'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js = '"bar7"'$$, 0.2);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js = '"baz1234"'$$, 10);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> '"bar4"'$$, 0.3);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+-- Check string range estimate
+SELECT check_jsonb_stats_test_estimate($$js > '"foo"'$$, 0.01);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js > '"bar"'$$, 0.01);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js > '"baz"'$$, 0.01);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+-- Check object eq estimate
+SELECT check_jsonb_stats_test_estimate($$js = '{}'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js > '{}'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+-- Check object key eq estimate
+SELECT check_jsonb_stats_test_estimate($$js -> 'foo' = '"bar"'$$, 0.2);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'foo' = '"baz"'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'foo' = '"baz5"'$$, 0.3);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js #> '{foo}' = '"bar"'$$, 0.2);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+-- Check object key range estimate
+SELECT check_jsonb_stats_test_estimate($$js -> 'foo' >= '"baz2"'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'foo' <  '"baz9"'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'foo' >= '"baz2"' AND js -> 'foo' < '"baz9"'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+-- Check array eq estimate
+SELECT check_jsonb_stats_test_estimate($$js = '[]'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js >= '[]' AND js < '{}'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+-- Check variable-length array element eq estimate
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 0 = '1'$$, 0.2);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 1 = '6'$$, 0.2);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 2 = '8'$$, 0.2);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 3 = '1'$$, 0.2);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+-- Check variable-length array element range estimate
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 0 < '7'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 1 < '7'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 2 < '7'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 3 < '7'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+-- Check variable-length array containment estimate
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' @> '[]'$$, 0.2);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' @> '[1]'$$, 0.2);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' @> '[100]'$$, 0.2);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' @> '[1, 2]'$$, 1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' @> '[1, 100]'$$, 1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' @> '[1, 2, 100]'$$, 1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' @> '[1, 2, 3]'$$, 5);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' @> '1'$$, 0.3);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' @> '100'$$, 10);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 0 @> '1'$$, 0.3);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 1 @> '1'$$, 0.3);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 2 @> '1'$$, 0.3);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 3 @> '1'$$, 0.3);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 0 @> '[1]'$$, 0.3);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> '{"array": []}'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> '{"array": [1]}'$$, 0.3);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> '{"array": [100]}'$$, 0.3);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> '{"array": [1, 2]}'$$, 1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> '{"array": [1, 100]}'$$, 1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> '{"array": [1, 2, 100]}'$$, 1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> '{"array": [1, 2, 3]}'$$, 100);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+-- check misc containment
+SELECT check_jsonb_stats_test_estimate($$js @> '"foo"'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> '12345'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> '[]'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> '[12345]'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> '["foo"]'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> '[["foo", "bar"]]'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> '[["foo"]]'$$, 0.2);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> '[{"key": "foo"}]'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> '[null]'$$, 0.3);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+-- Check object key null estimate
+SELECT check_jsonb_stats_test_estimate($$js -> 'foo' IS NULL$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'foo' IS NOT NULL$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> '"foo \"bar"' IS NOT NULL$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'bad_key' IS NULL$$, 0.01);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'bad_key' IS NOT NULL$$, 0.01);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+-- Check object key existence
+SELECT check_jsonb_stats_test_estimate($$js ? 'bad_key'$$, 10);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js ? 'foo'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js ? 'array'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js ?| '{foo,bad_key}'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js ?| '{foo,array}'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js ?& '{foo,bad_key}'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js ?& '{foo,bar}'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index b58b062b10d..ffbce87296e 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2535,6 +2535,38 @@ pg_stats_ext_exprs| SELECT cn.nspname AS schemaname,
      LEFT JOIN pg_namespace sn ON ((sn.oid = s.stxnamespace)))
      JOIN LATERAL ( SELECT unnest(pg_get_statisticsobjdef_expressions(s.oid)) AS expr,
             unnest(sd.stxdexpr) AS a) stat ON ((stat.expr IS NOT NULL)));
+pg_stats_json| SELECT n.nspname AS schemaname,
+    c.relname AS tablename,
+    a.attname,
+    (paths.path ->> 'path'::text) AS json_path,
+    s.stainherit AS inherited,
+    (((paths.path -> 'json'::text) ->> 'nullfrac'::text))::real AS null_frac,
+    (((paths.path -> 'json'::text) ->> 'width'::text))::real AS avg_width,
+    (((paths.path -> 'json'::text) ->> 'distinct'::text))::real AS n_distinct,
+    ARRAY( SELECT val.value AS val
+           FROM jsonb_array_elements((((paths.path -> 'json'::text) -> 'mcv'::text) -> 'values'::text)) val(value)) AS most_common_vals,
+    ARRAY( SELECT ((num.value)::text)::real AS num
+           FROM jsonb_array_elements((((paths.path -> 'json'::text) -> 'mcv'::text) -> 'numbers'::text)) num(value)) AS most_common_freqs,
+    ARRAY( SELECT val.value AS val
+           FROM jsonb_array_elements((((paths.path -> 'json'::text) -> 'histogram'::text) -> 'values'::text)) val(value)) AS histogram_bounds,
+    ARRAY( SELECT ((val.value)::text)::integer AS val
+           FROM jsonb_array_elements((((paths.path -> 'array_length'::text) -> 'mcv'::text) -> 'values'::text)) val(value)) AS most_common_array_lengths,
+    ARRAY( SELECT ((num.value)::text)::real AS num
+           FROM jsonb_array_elements((((paths.path -> 'array_length'::text) -> 'mcv'::text) -> 'numbers'::text)) num(value)) AS most_common_array_length_freqs,
+    (((paths.path -> 'json'::text) ->> 'correlation'::text))::real AS correlation
+   FROM (((pg_statistic s
+     JOIN pg_class c ON ((c.oid = s.starelid)))
+     JOIN pg_attribute a ON (((c.oid = a.attrelid) AND (a.attnum = s.staattnum))))
+     LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace))),
+    LATERAL ( SELECT unnest((((
+                CASE
+                    WHEN (s.stakind1 = 8) THEN s.stavalues1
+                    WHEN (s.stakind2 = 8) THEN s.stavalues2
+                    WHEN (s.stakind3 = 8) THEN s.stavalues3
+                    WHEN (s.stakind4 = 8) THEN s.stavalues4
+                    WHEN (s.stakind5 = 8) THEN s.stavalues5
+                    ELSE NULL::anyarray
+                END)::text)::jsonb[])[2:]) AS path) paths;
 pg_tables| SELECT n.nspname AS schemaname,
     c.relname AS tablename,
     pg_get_userbyid(c.relowner) AS tableowner,
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 5b0c73d7e37..d108cc62107 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -112,7 +112,7 @@ test: select_views portals_p2 foreign_key cluster dependency guc bitmapops combo
 # ----------
 # Another group of parallel tests (JSON related)
 # ----------
-test: json jsonb json_encoding jsonpath jsonpath_encoding jsonb_jsonpath
+test: json jsonb json_encoding jsonpath jsonpath_encoding jsonb_jsonpath jsonb_stats
 
 # ----------
 # Another group of parallel tests
diff --git a/src/test/regress/sql/jsonb_stats.sql b/src/test/regress/sql/jsonb_stats.sql
new file mode 100644
index 00000000000..3c66c8c0ed8
--- /dev/null
+++ b/src/test/regress/sql/jsonb_stats.sql
@@ -0,0 +1,249 @@
+CREATE OR REPLACE FUNCTION explain_jsonb(sql_query text)
+RETURNS TABLE(explain_line json) AS
+$$
+BEGIN
+	RETURN QUERY EXECUTE 'EXPLAIN (ANALYZE, FORMAT json) ' || sql_query;
+END;
+$$ LANGUAGE plpgsql;
+
+CREATE OR REPLACE FUNCTION get_plan_and_actual_rows(sql_query text)
+RETURNS TABLE(plan integer, actual integer) AS
+$$
+	SELECT
+		(plan->>'Plan Rows')::integer plan,
+		(plan->>'Actual Rows')::integer actual
+	FROM (
+		SELECT explain_jsonb(sql_query) #> '{0,Plan,Plans,0}'
+	) p(plan)
+$$ LANGUAGE sql;
+
+CREATE OR REPLACE FUNCTION check_estimate(sql_query text, accuracy real)
+RETURNS boolean AS
+$$
+	SELECT plan BETWEEN actual / (1 + accuracy) AND (actual + 1) * (1 + accuracy)
+	FROM (SELECT * FROM get_plan_and_actual_rows(sql_query)) x
+$$ LANGUAGE sql;
+
+CREATE OR REPLACE FUNCTION check_estimate2(sql_query text, accuracy real)
+RETURNS TABLE(min integer, max integer) AS
+$$
+	SELECT (actual * (1 - accuracy))::integer, ((actual + 1) * (1 + accuracy))::integer
+	FROM (SELECT * FROM get_plan_and_actual_rows(sql_query)) x
+$$ LANGUAGE sql;
+
+CREATE TABLE jsonb_stats_test(js jsonb);
+
+INSERT INTO jsonb_stats_test SELECT NULL FROM generate_series(1, 1000);
+
+INSERT INTO jsonb_stats_test SELECT 'null' FROM generate_series(1, 200);
+INSERT INTO jsonb_stats_test SELECT 'true' FROM generate_series(1, 300);
+INSERT INTO jsonb_stats_test SELECT 'false' FROM generate_series(1, 500);
+
+INSERT INTO jsonb_stats_test SELECT '12345' FROM generate_series(1, 100);
+INSERT INTO jsonb_stats_test SELECT (1000 * (i % 10))::text::jsonb FROM generate_series(1, 400) i;
+INSERT INTO jsonb_stats_test SELECT i::text::jsonb FROM generate_series(1, 500) i;
+
+INSERT INTO jsonb_stats_test SELECT '"foo"' FROM generate_series(1, 100);
+INSERT INTO jsonb_stats_test SELECT format('"bar%s"', i % 10)::jsonb FROM generate_series(1, 400) i;
+INSERT INTO jsonb_stats_test SELECT format('"baz%s"', i)::jsonb FROM generate_series(1, 500) i;
+
+INSERT INTO jsonb_stats_test SELECT '{}' FROM generate_series(1, 100);
+INSERT INTO jsonb_stats_test SELECT jsonb_build_object('foo', 'bar') FROM generate_series(1, 100);
+INSERT INTO jsonb_stats_test SELECT jsonb_build_object('foo', 'baz' || (i % 10)) FROM generate_series(1, 300) i;
+INSERT INTO jsonb_stats_test SELECT jsonb_build_object('foo', i % 10) FROM generate_series(1, 200) i;
+INSERT INTO jsonb_stats_test SELECT jsonb_build_object('"foo \"bar"', i % 10) FROM generate_series(1, 200) i;
+
+INSERT INTO jsonb_stats_test SELECT '[]' FROM generate_series(1, 100);
+INSERT INTO jsonb_stats_test SELECT '["foo"]' FROM generate_series(1, 200);
+INSERT INTO jsonb_stats_test SELECT '[12345]' FROM generate_series(1, 300);
+INSERT INTO jsonb_stats_test SELECT '[["foo"]]' FROM generate_series(1, 200);
+INSERT INTO jsonb_stats_test SELECT '[{"key": "foo"}]' FROM generate_series(1, 200);
+INSERT INTO jsonb_stats_test SELECT '[null, "foo"]' FROM generate_series(1, 200);
+INSERT INTO jsonb_stats_test SELECT '[null, 12345]' FROM generate_series(1, 300);
+INSERT INTO jsonb_stats_test SELECT '[null, ["foo"]]' FROM generate_series(1, 200);
+INSERT INTO jsonb_stats_test SELECT '[null, {"key": "foo"}]' FROM generate_series(1, 200);
+
+-- Build random variable-length integer arrays
+SELECT setseed(0.0);
+
+INSERT INTO jsonb_stats_test
+SELECT jsonb_build_object('array',
+	jsonb_build_array())
+FROM generate_series(1, 1000);
+
+INSERT INTO jsonb_stats_test
+SELECT jsonb_build_object('array',
+	jsonb_build_array(
+		floor(random() * 10)::int))
+FROM generate_series(1, 4000);
+
+INSERT INTO jsonb_stats_test
+SELECT jsonb_build_object('array',
+	jsonb_build_array(
+		floor(random() * 10)::int,
+		floor(random() * 10)::int))
+FROM generate_series(1, 3000);
+
+INSERT INTO jsonb_stats_test
+SELECT jsonb_build_object('array',
+	jsonb_build_array(
+		floor(random() * 10)::int,
+		floor(random() * 10)::int,
+		floor(random() * 10)::int))
+FROM generate_series(1, 2000);
+
+
+ANALYZE jsonb_stats_test;
+
+CREATE OR REPLACE FUNCTION check_jsonb_stats_test_estimate(sql_condition text, accuracy real)
+RETURNS boolean AS
+$$
+	SELECT check_estimate('SELECT count(*) FROM jsonb_stats_test WHERE ' || sql_condition, accuracy)
+$$ LANGUAGE sql;
+
+DROP FUNCTION IF EXISTS check_jsonb_stats_test_estimate2(text, real);
+
+CREATE OR REPLACE FUNCTION check_jsonb_stats_test_estimate2(sql_condition text, accuracy real)
+RETURNS TABLE(plan integer, actual integer) AS
+$$
+	SELECT get_plan_and_actual_rows('SELECT count(*) FROM jsonb_stats_test WHERE ' || sql_condition)
+$$ LANGUAGE sql;
+
+-- Check NULL estimate
+SELECT check_jsonb_stats_test_estimate($$js IS NULL$$, 0.03);
+SELECT check_jsonb_stats_test_estimate($$js -> 'bad_key' IS NULL$$, 0.01);
+SELECT check_jsonb_stats_test_estimate($$js #> '{bad_key}' IS NULL$$, 0.01);
+SELECT check_jsonb_stats_test_estimate($$js -> 1000000 IS NULL$$, 0.01);
+SELECT check_jsonb_stats_test_estimate($$js #> '{1000000}' IS NULL$$, 0.01);
+SELECT check_jsonb_stats_test_estimate($$js -> 'bad_key1' -> 'bad_key2' IS NULL$$, 0.01);
+SELECT check_jsonb_stats_test_estimate($$js #> '{bad_key1,bad_key2}' IS NULL$$, 0.01);
+SELECT check_jsonb_stats_test_estimate($$js -> 'bad_key1' -> 1 IS NULL$$, 0.01);
+SELECT check_jsonb_stats_test_estimate($$js #> '{bad_key1,1}' IS NULL$$, 0.01);
+SELECT check_jsonb_stats_test_estimate($$js -> 1000000 -> 'foo' IS NULL$$, 0.01);
+SELECT check_jsonb_stats_test_estimate($$js #> '{1000000,foo}' IS NULL$$, 0.01);
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'bad_key' = '123'$$, 0.01);
+SELECT check_jsonb_stats_test_estimate($$js -> 1000000 = '123'$$, 0.01);
+
+-- Check null eq estimate
+SELECT check_jsonb_stats_test_estimate($$js =  'null'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js @> 'null'$$, 0.1);
+
+-- Check boolean eq estimate
+SELECT check_jsonb_stats_test_estimate($$js =  'true'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js @> 'true'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js =  'false'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js @> 'false'$$, 0.1);
+
+-- Check numeric eq estimate
+SELECT check_jsonb_stats_test_estimate($$js = '12345'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js#>'{}' = '12345'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js = '3000'$$, 0.3);
+SELECT check_jsonb_stats_test_estimate($$js = '1234'$$, 1);
+SELECT check_jsonb_stats_test_estimate($$js @> '6000'$$, 0.2);
+
+-- Check numeric range estimate
+SELECT check_jsonb_stats_test_estimate($$js < '0'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js < '100'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js < '1000'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js < '3456'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js < '10000'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js < '100000'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js > '100' AND js < '600'$$, 0.5);
+SELECT check_jsonb_stats_test_estimate($$js > '6800' AND js < '12000'$$, 0.1);
+
+-- Check string eq estimate
+SELECT check_jsonb_stats_test_estimate($$js = '"foo"'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js = '"bar7"'$$, 0.2);
+SELECT check_jsonb_stats_test_estimate($$js = '"baz1234"'$$, 10);
+SELECT check_jsonb_stats_test_estimate($$js @> '"bar4"'$$, 0.3);
+
+-- Check string range estimate
+SELECT check_jsonb_stats_test_estimate($$js > '"foo"'$$, 0.01);
+SELECT check_jsonb_stats_test_estimate($$js > '"bar"'$$, 0.01);
+SELECT check_jsonb_stats_test_estimate($$js > '"baz"'$$, 0.01);
+
+-- Check object eq estimate
+SELECT check_jsonb_stats_test_estimate($$js = '{}'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js > '{}'$$, 0.1);
+
+-- Check object key eq estimate
+SELECT check_jsonb_stats_test_estimate($$js -> 'foo' = '"bar"'$$, 0.2);
+SELECT check_jsonb_stats_test_estimate($$js -> 'foo' = '"baz"'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js -> 'foo' = '"baz5"'$$, 0.3);
+SELECT check_jsonb_stats_test_estimate($$js #> '{foo}' = '"bar"'$$, 0.2);
+
+-- Check object key range estimate
+SELECT check_jsonb_stats_test_estimate($$js -> 'foo' >= '"baz2"'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js -> 'foo' <  '"baz9"'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js -> 'foo' >= '"baz2"' AND js -> 'foo' < '"baz9"'$$, 0.1);
+
+-- Check array eq estimate
+SELECT check_jsonb_stats_test_estimate($$js = '[]'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js >= '[]' AND js < '{}'$$, 0.1);
+
+-- Check variable-length array element eq estimate
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 0 = '1'$$, 0.2);
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 1 = '6'$$, 0.2);
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 2 = '8'$$, 0.2);
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 3 = '1'$$, 0.2);
+
+-- Check variable-length array element range estimate
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 0 < '7'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 1 < '7'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 2 < '7'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 3 < '7'$$, 0.1);
+
+-- Check variable-length array containment estimate
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' @> '[]'$$, 0.2);
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' @> '[1]'$$, 0.2);
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' @> '[100]'$$, 0.2);
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' @> '[1, 2]'$$, 1);
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' @> '[1, 100]'$$, 1);
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' @> '[1, 2, 100]'$$, 1);
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' @> '[1, 2, 3]'$$, 5);
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' @> '1'$$, 0.3);
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' @> '100'$$, 10);
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 0 @> '1'$$, 0.3);
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 1 @> '1'$$, 0.3);
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 2 @> '1'$$, 0.3);
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 3 @> '1'$$, 0.3);
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 0 @> '[1]'$$, 0.3);
+
+SELECT check_jsonb_stats_test_estimate($$js @> '{"array": []}'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js @> '{"array": [1]}'$$, 0.3);
+SELECT check_jsonb_stats_test_estimate($$js @> '{"array": [100]}'$$, 0.3);
+SELECT check_jsonb_stats_test_estimate($$js @> '{"array": [1, 2]}'$$, 1);
+SELECT check_jsonb_stats_test_estimate($$js @> '{"array": [1, 100]}'$$, 1);
+SELECT check_jsonb_stats_test_estimate($$js @> '{"array": [1, 2, 100]}'$$, 1);
+SELECT check_jsonb_stats_test_estimate($$js @> '{"array": [1, 2, 3]}'$$, 100);
+
+-- check misc containment
+SELECT check_jsonb_stats_test_estimate($$js @> '"foo"'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js @> '12345'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js @> '[]'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js @> '[12345]'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js @> '["foo"]'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js @> '[["foo", "bar"]]'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js @> '[["foo"]]'$$, 0.2);
+SELECT check_jsonb_stats_test_estimate($$js @> '[{"key": "foo"}]'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js @> '[null]'$$, 0.3);
+
+-- Check object key null estimate
+SELECT check_jsonb_stats_test_estimate($$js -> 'foo' IS NULL$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js -> 'foo' IS NOT NULL$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js -> '"foo \"bar"' IS NOT NULL$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js -> 'bad_key' IS NULL$$, 0.01);
+SELECT check_jsonb_stats_test_estimate($$js -> 'bad_key' IS NOT NULL$$, 0.01);
+
+-- Check object key existence
+SELECT check_jsonb_stats_test_estimate($$js ? 'bad_key'$$, 10);
+SELECT check_jsonb_stats_test_estimate($$js ? 'foo'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js ? 'array'$$, 0.1);
+
+SELECT check_jsonb_stats_test_estimate($$js ?| '{foo,bad_key}'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js ?| '{foo,array}'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js ?& '{foo,bad_key}'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js ?& '{foo,bar}'$$, 0.1);
-- 
2.31.1

Zhihong Yu

zyu@yugabyte.com

about 4 years ago

In reply to: Tomas Vondra (#1)

Re: Collecting statistics about contents of JSONB columns

On Fri, Dec 31, 2021 at 2:07 PM Tomas Vondra <tomas.vondra@enterprisedb.com>
wrote:

Hi,

One of the complaints I sometimes hear from users and customers using
Postgres to store JSON documents (as JSONB type, of course) is that the
selectivity estimates are often pretty poor.

Currently we only really have MCV and histograms with whole documents,
and we can deduce some stats from that. But that is somewhat bogus
because there's only ~100 documents in either MCV/histogram (with the
default statistics target). And moreover we discard all "oversized"
values (over 1kB) before even calculating those stats, which makes it
even less representative.

A couple weeks ago I started playing with this, and I experimented with
improving extended statistics in this direction. After a while I noticed
a forgotten development branch from 2016 which tried to do this by
adding a custom typanalyze function, which seemed like a more natural
idea (because it's really a statistics for a single column).

But then I went to pgconf NYC in early December, and I spoke to Oleg
about various JSON-related things, and he mentioned they've been working
on this topic some time ago too, but did not have time to pursue it. So
he pointed me to a branch [1] developed by Nikita Glukhov.

I like Nikita's branch because it solved a couple architectural issues
that I've been aware of, but only solved them in a rather hackish way.

I had a discussion with Nikita about his approach what can we do to move
it forward. He's focusing on other JSON stuff, but he's OK with me
taking over and moving it forward. So here we go ...

Nikita rebased his branch recently, I've kept improving it in various
(mostly a lot of comments and docs, and some minor fixes and tweaks).
I've pushed my version with a couple extra commits in [2], but you can
ignore that except if you want to see what I added/changed.

Attached is a couple patches adding adding the main part of the feature.
There's a couple more commits in the github repositories, adding more
advanced features - I'll briefly explain those later, but I'm not
including them here because those are optional features and it'd be
distracting to include them here.

There are 6 patches in the series, but the magic mostly happens in parts
0001 and 0006. The other parts are mostly just adding infrastructure,
which may be a sizeable amount of code, but the changes are fairly
simple and obvious. So let's focus on 0001 and 0006.

To add JSON statistics we need to do two basic things - we need to build
the statistics and then we need to allow using them while estimating
conditions.

1) building stats

Let's talk about building the stats first. The patch does one of the
things I experimented with - 0006 adds a jsonb_typanalyze function, and
it associates it with the data type. The function extracts paths and
values from the JSONB document, builds the statistics, and then stores
the result in pg_statistic as a new stakind.

I've been planning to store the stats in pg_statistic too, but I've been
considering to use a custom data type. The patch does something far more
elegant - it simply uses stavalues to store an array of JSONB documents,
each describing stats for one path extracted from the sampled documents.

One (very simple) element of the array might look like this:

{"freq": 1,
"json": {
"mcv": {
"values": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
"numbers": [0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1]},
"width": 19,
"distinct": 10,
"nullfrac": 0,
"correlation": 0.10449},
"path": "$.\"a\"",
"freq_null": 0, "freq_array": 0, "freq_object": 0,
"freq_string": 0, "freq_boolean": 0, "freq_numeric": 0}

In this case there's only a MCV list (represented by two arrays, just
like in pg_statistic), but there might be another part with a histogram.
There's also the other columns we'd expect to see in pg_statistic.

In principle, we need pg_statistic for each path we extract from the
JSON documents and decide it's interesting enough for estimation. There
are probably other ways to serialize/represent this, but I find using
JSONB for this pretty convenient because we need to deal with a mix of
data types (for the same path), and other JSON specific stuff. Storing
that in Postgres arrays would be problematic.

I'm sure there's plenty open questions - for example I think we'll need
some logic to decide which paths to keep, otherwise the statistics can
get quite big, if we're dealing with large / variable documents. We're
already doing something similar for MCV lists.

One of Nikita's patches not included in this thread allow "selective"
statistics, where you can define in advance a "filter" restricting which
parts are considered interesting by ANALYZE. That's interesting, but I
think we need some simple MCV-like heuristics first anyway.

Another open question is how deep the stats should be. Imagine documents
like this:

{"a" : {"b" : {"c" : {"d" : ...}}}}

The current patch build stats for all possible paths:

"a"
"a.b"
"a.b.c"
"a.b.c.d"

and of course many of the paths will often have JSONB documents as
values, not simple scalar values. I wonder if we should limit the depth
somehow, and maybe build stats only for scalar values.

2) applying the statistics

One of the problems is how to actually use the statistics. For @>
operator it's simple enough, because it's (jsonb @> jsonb) so we have
direct access to the stats. But often the conditions look like this:

jsonb_column ->> 'key' = 'value'

so the condition is actually on an expression, not on the JSONB column
directly. My solutions were pretty ugly hacks, but Nikita had a neat
idea - we can define a custom procedure for each operator, which is
responsible for "calculating" the statistics for the expression.

So in this case "->>" will have such "oprstat" procedure, which fetches
stats for the JSONB column, extracts stats for the "key" path. And then
we can use that for estimation of the (text = text) condition.

This is what 0001 does, pretty much. We simply look for expression stats
provided by an index, extended statistics, and then - if oprstat is
defined for the operator - we try to derive the stats.

This opens other interesting opportunities for the future - one of the
parts adds oprstat for basic arithmetic operators, which allows deducing
statistics for expressions like (a+10) from statistics on column (a).

Which seems like a neat feature on it's own, but it also interacts with
the extended statistics in somewhat non-obvious ways (especially when
estimating GROUP BY cardinalities).

Of course, there's a limit of what we can reasonably estimate - for
example, there may be statistical dependencies between paths, and this
patch does not even attempt to deal with that. In a way, this is similar
to correlation between columns, except that here we have a dynamic set
of columns, which makes it much much harder. We'd need something like
extended stats on steroids, pretty much.

I'm sure I've forgotten various important bits - many of them are
mentioned or explained in comments, but I'm sure others are not. And I'd
bet there are things I forgot about entirely or got wrong. So feel free
to ask.

In any case, I think this seems like a good first step to improve our
estimates for JSOB columns.

regards

[1] https://github.com/postgrespro/postgres/tree/jsonb_stats

[2] https://github.com/tvondra/postgres/tree/jsonb_stats_rework

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Hi,
For patch 1:

+ List *statisticsName = NIL; /* optional stats estimat.
procedure */

I think if the variable is named estimatorName (or something similar), it
would be easier for people to grasp its purpose.

+       /* XXX perhaps full "statistics" wording would be better */
+       else if (strcmp(defel->defname, "stats") == 0)

I would recommend (stats sounds too general):

+ else if (strcmp(defel->defname, "statsestimator") == 0)

+ statisticsOid = ValidateStatisticsEstimator(statisticsName);

statisticsOid -> statsEstimatorOid

For get_oprstat():

+   }
+   else
+       return (RegProcedure) InvalidOid;

keyword else is not needed (considering the return statement in if block).

For patch 06:

+   /* FIXME Could be before the memset, I guess? Checking
vardata->statsTuple. */
+   if (!data->statsTuple)
+       return false;

I would agree the check can be lifted above the memset call.

+ * XXX This does not really extract any stats, it merely allocates the
struct?
+ */
+static JsonPathStats
+jsonPathStatsGetSpecialStats(JsonPathStats pstats, JsonPathStatsType type)

As comments says, I think allocJsonPathStats() would be better name for the
func.

+ * XXX Why doesn't this do jsonPathStatsGetTypeFreq check similar to what
+ * jsonPathStatsGetLengthStats does?

I think `jsonPathStatsGetTypeFreq(pstats, jbvArray, 0.0) <= 0.0` check
should be added for jsonPathStatsGetArrayLengthStats().

To be continued ...

Zhihong Yu

zyu@yugabyte.com

about 4 years ago

In reply to: Zhihong Yu (#2)

Re: Collecting statistics about contents of JSONB columns

On Sat, Jan 1, 2022 at 7:33 AM Zhihong Yu <zyu@yugabyte.com> wrote:

On Fri, Dec 31, 2021 at 2:07 PM Tomas Vondra <
tomas.vondra@enterprisedb.com> wrote:

Hi,

One of the complaints I sometimes hear from users and customers using
Postgres to store JSON documents (as JSONB type, of course) is that the
selectivity estimates are often pretty poor.

Currently we only really have MCV and histograms with whole documents,
and we can deduce some stats from that. But that is somewhat bogus
because there's only ~100 documents in either MCV/histogram (with the
default statistics target). And moreover we discard all "oversized"
values (over 1kB) before even calculating those stats, which makes it
even less representative.

A couple weeks ago I started playing with this, and I experimented with
improving extended statistics in this direction. After a while I noticed
a forgotten development branch from 2016 which tried to do this by
adding a custom typanalyze function, which seemed like a more natural
idea (because it's really a statistics for a single column).

But then I went to pgconf NYC in early December, and I spoke to Oleg
about various JSON-related things, and he mentioned they've been working
on this topic some time ago too, but did not have time to pursue it. So
he pointed me to a branch [1] developed by Nikita Glukhov.

I like Nikita's branch because it solved a couple architectural issues
that I've been aware of, but only solved them in a rather hackish way.

I had a discussion with Nikita about his approach what can we do to move
it forward. He's focusing on other JSON stuff, but he's OK with me
taking over and moving it forward. So here we go ...

Nikita rebased his branch recently, I've kept improving it in various
(mostly a lot of comments and docs, and some minor fixes and tweaks).
I've pushed my version with a couple extra commits in [2], but you can
ignore that except if you want to see what I added/changed.

Attached is a couple patches adding adding the main part of the feature.
There's a couple more commits in the github repositories, adding more
advanced features - I'll briefly explain those later, but I'm not
including them here because those are optional features and it'd be
distracting to include them here.

There are 6 patches in the series, but the magic mostly happens in parts
0001 and 0006. The other parts are mostly just adding infrastructure,
which may be a sizeable amount of code, but the changes are fairly
simple and obvious. So let's focus on 0001 and 0006.

To add JSON statistics we need to do two basic things - we need to build
the statistics and then we need to allow using them while estimating
conditions.

1) building stats

Let's talk about building the stats first. The patch does one of the
things I experimented with - 0006 adds a jsonb_typanalyze function, and
it associates it with the data type. The function extracts paths and
values from the JSONB document, builds the statistics, and then stores
the result in pg_statistic as a new stakind.

I've been planning to store the stats in pg_statistic too, but I've been
considering to use a custom data type. The patch does something far more
elegant - it simply uses stavalues to store an array of JSONB documents,
each describing stats for one path extracted from the sampled documents.

One (very simple) element of the array might look like this:

{"freq": 1,
"json": {
"mcv": {
"values": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
"numbers": [0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1]},
"width": 19,
"distinct": 10,
"nullfrac": 0,
"correlation": 0.10449},
"path": "$.\"a\"",
"freq_null": 0, "freq_array": 0, "freq_object": 0,
"freq_string": 0, "freq_boolean": 0, "freq_numeric": 0}

In this case there's only a MCV list (represented by two arrays, just
like in pg_statistic), but there might be another part with a histogram.
There's also the other columns we'd expect to see in pg_statistic.

In principle, we need pg_statistic for each path we extract from the
JSON documents and decide it's interesting enough for estimation. There
are probably other ways to serialize/represent this, but I find using
JSONB for this pretty convenient because we need to deal with a mix of
data types (for the same path), and other JSON specific stuff. Storing
that in Postgres arrays would be problematic.

I'm sure there's plenty open questions - for example I think we'll need
some logic to decide which paths to keep, otherwise the statistics can
get quite big, if we're dealing with large / variable documents. We're
already doing something similar for MCV lists.

One of Nikita's patches not included in this thread allow "selective"
statistics, where you can define in advance a "filter" restricting which
parts are considered interesting by ANALYZE. That's interesting, but I
think we need some simple MCV-like heuristics first anyway.

Another open question is how deep the stats should be. Imagine documents
like this:

{"a" : {"b" : {"c" : {"d" : ...}}}}

The current patch build stats for all possible paths:

"a"
"a.b"
"a.b.c"
"a.b.c.d"

and of course many of the paths will often have JSONB documents as
values, not simple scalar values. I wonder if we should limit the depth
somehow, and maybe build stats only for scalar values.

2) applying the statistics

One of the problems is how to actually use the statistics. For @>
operator it's simple enough, because it's (jsonb @> jsonb) so we have
direct access to the stats. But often the conditions look like this:

jsonb_column ->> 'key' = 'value'

so the condition is actually on an expression, not on the JSONB column
directly. My solutions were pretty ugly hacks, but Nikita had a neat
idea - we can define a custom procedure for each operator, which is
responsible for "calculating" the statistics for the expression.

So in this case "->>" will have such "oprstat" procedure, which fetches
stats for the JSONB column, extracts stats for the "key" path. And then
we can use that for estimation of the (text = text) condition.

This is what 0001 does, pretty much. We simply look for expression stats
provided by an index, extended statistics, and then - if oprstat is
defined for the operator - we try to derive the stats.

This opens other interesting opportunities for the future - one of the
parts adds oprstat for basic arithmetic operators, which allows deducing
statistics for expressions like (a+10) from statistics on column (a).

Which seems like a neat feature on it's own, but it also interacts with
the extended statistics in somewhat non-obvious ways (especially when
estimating GROUP BY cardinalities).

Of course, there's a limit of what we can reasonably estimate - for
example, there may be statistical dependencies between paths, and this
patch does not even attempt to deal with that. In a way, this is similar
to correlation between columns, except that here we have a dynamic set
of columns, which makes it much much harder. We'd need something like
extended stats on steroids, pretty much.

I'm sure I've forgotten various important bits - many of them are
mentioned or explained in comments, but I'm sure others are not. And I'd
bet there are things I forgot about entirely or got wrong. So feel free
to ask.

In any case, I think this seems like a good first step to improve our
estimates for JSOB columns.

regards

[1] https://github.com/postgrespro/postgres/tree/jsonb_stats

[2] https://github.com/tvondra/postgres/tree/jsonb_stats_rework

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Hi,
For patch 1:

+ List *statisticsName = NIL; /* optional stats estimat.
procedure */

I think if the variable is named estimatorName (or something similar), it
would be easier for people to grasp its purpose.
+       /* XXX perhaps full "statistics" wording would be better */
+       else if (strcmp(defel->defname, "stats") == 0)
I would recommend (stats sounds too general):

+ else if (strcmp(defel->defname, "statsestimator") == 0)

+ statisticsOid = ValidateStatisticsEstimator(statisticsName);

statisticsOid -> statsEstimatorOid

For get_oprstat():
+   }
+   else
+       return (RegProcedure) InvalidOid;
keyword else is not needed (considering the return statement in if block).

For patch 06:
+   /* FIXME Could be before the memset, I guess? Checking
vardata->statsTuple. */
+   if (!data->statsTuple)
+       return false;
I would agree the check can be lifted above the memset call.
+ * XXX This does not really extract any stats, it merely allocates the
struct?
+ */
+static JsonPathStats
+jsonPathStatsGetSpecialStats(JsonPathStats pstats, JsonPathStatsType type)
As comments says, I think allocJsonPathStats() would be better name for
the func.
+ * XXX Why doesn't this do jsonPathStatsGetTypeFreq check similar to what
+ * jsonPathStatsGetLengthStats does?
I think `jsonPathStatsGetTypeFreq(pstats, jbvArray, 0.0) <= 0.0` check
should be added for jsonPathStatsGetArrayLengthStats().

To be continued ...

Hi,

+static JsonPathStats
+jsonStatsFindPathStats(JsonStats jsdata, char *path, int pathlen)

Stats appears twice in the method name. I think findJsonPathStats() should
suffice.
It should check `if (jsdata->nullfrac >= 1.0)` as jsonStatsGetPathStatsStr
does.

+JsonPathStats
+jsonStatsGetPathStatsStr(JsonStats jsdata, const char *subpath, int
subpathlen)

This func can be static, right ?
I think findJsonPathStatsWithPrefix() would be a better name for the func.

+ * XXX Doesn't this need ecape_json too?
+ */
+static void
+jsonPathAppendEntryWithLen(StringInfo path, const char *entry, int len)
+{
+   char *tmpentry = pnstrdup(entry, len);
+   jsonPathAppendEntry(path, tmpentry);

ecape_json() is called within jsonPathAppendEntry(). The XXX comment can be
dropped.

+jsonPathStatsGetArrayIndexSelectivity(JsonPathStats pstats, int index)

It seems getJsonSelectivityWithArrayIndex() would be a better name.

+       sel = scalarineqsel(NULL, operator,
+                           operator == JsonbGtOperator ||
+                           operator == JsonbGeOperator,
+                           operator == JsonbLeOperator ||
+                           operator == JsonbGeOperator,

Looking at the comment for scalarineqsel():

* scalarineqsel - Selectivity of "<", "<=", ">", ">=" for scalars.
*
* This is the guts of scalarltsel/scalarlesel/scalargtsel/scalargesel.
* The isgt and iseq flags distinguish which of the four cases apply.

It seems JsonbLtOperator doesn't appear in the call, can I ask why ?

Cheers

Thomas Munro

thomas.munro@gmail.com

about 4 years ago

In reply to: Tomas Vondra (#1)

Re: Collecting statistics about contents of JSONB columns

On Sat, Jan 1, 2022 at 11:07 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

0006-Add-jsonb-statistics-20211230.patch

Hi Tomas,

-CREATE OR REPLACE FUNCTION explain_jsonb(sql_query text)
+ï»¿CREATE OR REPLACE FUNCTION explain_jsonb(sql_query text)

https://cirrus-ci.com/task/6405547984420864

It looks like there is a Unicode BOM sequence in there, which is
upsetting our Windows testing but not the 3 Unixes (not sure why).
Probably added by an editor.

Simon Riggs

simon.riggs@enterprisedb.com

about 4 years ago

In reply to: Tomas Vondra (#1)

Re: Collecting statistics about contents of JSONB columns

On Fri, 31 Dec 2021 at 22:07, Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

The patch does something far more
elegant - it simply uses stavalues to store an array of JSONB documents,
each describing stats for one path extracted from the sampled documents.

Sounds good.

I'm sure there's plenty open questions - for example I think we'll need
some logic to decide which paths to keep, otherwise the statistics can
get quite big, if we're dealing with large / variable documents. We're
already doing something similar for MCV lists.

One of Nikita's patches not included in this thread allow "selective"
statistics, where you can define in advance a "filter" restricting which
parts are considered interesting by ANALYZE. That's interesting, but I
think we need some simple MCV-like heuristics first anyway.

Another open question is how deep the stats should be. Imagine documents
like this:

{"a" : {"b" : {"c" : {"d" : ...}}}}

The current patch build stats for all possible paths:

"a"
"a.b"
"a.b.c"
"a.b.c.d"

and of course many of the paths will often have JSONB documents as
values, not simple scalar values. I wonder if we should limit the depth
somehow, and maybe build stats only for scalar values.

The user interface for designing filters sounds hard, so I'd hope we
can ignore that for now.

--
Simon Riggs http://www.EnterpriseDB.com/

Tomas Vondra

tomas.vondra@enterprisedb.com

about 4 years ago

In reply to: Zhihong Yu (#2)

Re: Collecting statistics about contents of JSONB columns

On 1/1/22 16:33, Zhihong Yu wrote:

Hi,
For patch 1:

+ List *statisticsName = NIL; /* optional stats estimat.
procedure */

I think if the variable is named estimatorName (or something similar),
it would be easier for people to grasp its purpose.

I agree "statisticsName" might be too generic or confusing, but I'm not
sure "estimator" is an improvement. Because this is not an "estimator"
(in the sense of estimating selectivity). It "transforms" statistics to
match the expression.

+       /* XXX perhaps full "statistics" wording would be better */
+       else if (strcmp(defel->defname, "stats") == 0)
I would recommend (stats sounds too general):

+ else if (strcmp(defel->defname, "statsestimator") == 0)

+ statisticsOid = ValidateStatisticsEstimator(statisticsName);

statisticsOid -> statsEstimatorOid

Same issue with the "estimator" bit :-(

For get_oprstat():
+   }
+   else
+       return (RegProcedure) InvalidOid;
keyword else is not needed (considering the return statement in if block).

True, but this is how the other get_ functions in lsyscache.c do it.
Like get_oprjoin().

For patch 06:
+   /* FIXME Could be before the memset, I guess? Checking 
vardata->statsTuple. */
+   if (!data->statsTuple)
+       return false;
I would agree the check can be lifted above the memset call.

OK.

+ * XXX This does not really extract any stats, it merely allocates the 
struct?
+ */
+static JsonPathStats
+jsonPathStatsGetSpecialStats(JsonPathStats pstats, JsonPathStatsType type)
As comments says, I think allocJsonPathStats() would be better name for
the func.
+ * XXX Why doesn't this do jsonPathStatsGetTypeFreq check similar to what
+ * jsonPathStatsGetLengthStats does?
I think `jsonPathStatsGetTypeFreq(pstats, jbvArray, 0.0) <= 0.0` check
should be added for jsonPathStatsGetArrayLengthStats().

To be continued ...

OK. I'll see if Nikita has some ideas about the naming changes.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Tomas Vondra

tomas.vondra@enterprisedb.com

about 4 years ago

In reply to: Zhihong Yu (#3)

Re: Collecting statistics about contents of JSONB columns

On 1/1/22 22:16, Zhihong Yu wrote:

Hi,
+static JsonPathStats
+jsonStatsFindPathStats(JsonStats jsdata, char *path, int pathlen)
Stats appears twice in the method name. I think findJsonPathStats()
should suffice.
It should check `if (jsdata->nullfrac >= 1.0)`
as jsonStatsGetPathStatsStr does.
+JsonPathStats
+jsonStatsGetPathStatsStr(JsonStats jsdata, const char *subpath, int 
subpathlen)
This func can be static, right ?
I think findJsonPathStatsWithPrefix() would be a better name for the func.
+ * XXX Doesn't this need ecape_json too?
+ */
+static void
+jsonPathAppendEntryWithLen(StringInfo path, const char *entry, int len)
+{
+   char *tmpentry = pnstrdup(entry, len);
+   jsonPathAppendEntry(path, tmpentry);
ecape_json() is called within jsonPathAppendEntry(). The XXX comment can
be dropped.

+jsonPathStatsGetArrayIndexSelectivity(JsonPathStats pstats, int index)

It seems getJsonSelectivityWithArrayIndex() would be a better name.

Thanks. I'll think about the naming changes.

+       sel = scalarineqsel(NULL, operator,
+                           operator == JsonbGtOperator ||
+                           operator == JsonbGeOperator,
+                           operator == JsonbLeOperator ||
+                           operator == JsonbGeOperator,
Looking at the comment for scalarineqsel():

* scalarineqsel - Selectivity of "<", "<=", ">", ">=" for scalars.
*
* This is the guts of scalarltsel/scalarlesel/scalargtsel/scalargesel.
* The isgt and iseq flags distinguish which of the four cases apply.

It seems JsonbLtOperator doesn't appear in the call, can I ask why ?

Because the scalarineqsel signature is this

scalarineqsel(PlannerInfo *root, Oid operator, bool isgt, bool iseq,
Oid collation,
VariableStatData *vardata, Datum constval,
Oid consttype)

/* is it greater or greater-or-equal */
isgt = operator == JsonbGtOperator ||
operator == JsonbGeOperator

/* is it equality? */
iseq = operator == JsonbLeOperator ||
operator == JsonbGeOperator,

So I think this is correct. A comment explaining this would be nice.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Tomas Vondra

tomas.vondra@enterprisedb.com

about 4 years ago

In reply to: Thomas Munro (#4)

6 attachment(s)

Re: Collecting statistics about contents of JSONB columns

On 1/5/22 21:13, Thomas Munro wrote:

On Sat, Jan 1, 2022 at 11:07 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

0006-Add-jsonb-statistics-20211230.patch

Hi Tomas,
-CREATE OR REPLACE FUNCTION explain_jsonb(sql_query text)
+ï»¿CREATE OR REPLACE FUNCTION explain_jsonb(sql_query text)
https://cirrus-ci.com/task/6405547984420864

It looks like there is a Unicode BOM sequence in there, which is
upsetting our Windows testing but not the 3 Unixes (not sure why).
Probably added by an editor.

Thanks, fixed along with some whitespace issues.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

0001-Add-pg_operator.oprstat-for-derived-operato-20220106.patchtext/x-patch; charset=UTF-8; name=0001-Add-pg_operator.oprstat-for-derived-operato-20220106.patchDownload

From e7fa057401728f6a894866d1e422ff9ebe42c256 Mon Sep 17 00:00:00 2001
From: Nikita Glukhov <n.gluhov@postgrespro.ru>
Date: Sat, 12 Nov 2016 18:59:43 +0300
Subject: [PATCH 1/6] Add pg_operator.oprstat for derived operator statistics
 estimation

---
 src/backend/catalog/pg_operator.c      | 11 +++++
 src/backend/commands/operatorcmds.c    | 67 ++++++++++++++++++++++++++
 src/backend/utils/adt/selfuncs.c       | 39 +++++++++++++++
 src/backend/utils/cache/lsyscache.c    | 24 +++++++++
 src/include/catalog/pg_operator.h      |  4 ++
 src/include/utils/lsyscache.h          |  1 +
 src/test/regress/expected/oidjoins.out |  1 +
 7 files changed, 147 insertions(+)

diff --git a/src/backend/catalog/pg_operator.c b/src/backend/catalog/pg_operator.c
index 4c5a56cb094..1df5ac8ee8b 100644
--- a/src/backend/catalog/pg_operator.c
+++ b/src/backend/catalog/pg_operator.c
@@ -256,6 +256,7 @@ OperatorShellMake(const char *operatorName,
 	values[Anum_pg_operator_oprcode - 1] = ObjectIdGetDatum(InvalidOid);
 	values[Anum_pg_operator_oprrest - 1] = ObjectIdGetDatum(InvalidOid);
 	values[Anum_pg_operator_oprjoin - 1] = ObjectIdGetDatum(InvalidOid);
+	values[Anum_pg_operator_oprstat - 1] = ObjectIdGetDatum(InvalidOid);
 
 	/*
 	 * create a new operator tuple
@@ -301,6 +302,7 @@ OperatorShellMake(const char *operatorName,
  *		negatorName				X negator operator
  *		restrictionId			X restriction selectivity procedure ID
  *		joinId					X join selectivity procedure ID
+ *		statsId					X statistics derivation procedure ID
  *		canMerge				merge join can be used with this operator
  *		canHash					hash join can be used with this operator
  *
@@ -333,6 +335,7 @@ OperatorCreate(const char *operatorName,
 			   List *negatorName,
 			   Oid restrictionId,
 			   Oid joinId,
+			   Oid statsId,
 			   bool canMerge,
 			   bool canHash)
 {
@@ -505,6 +508,7 @@ OperatorCreate(const char *operatorName,
 	values[Anum_pg_operator_oprcode - 1] = ObjectIdGetDatum(procedureId);
 	values[Anum_pg_operator_oprrest - 1] = ObjectIdGetDatum(restrictionId);
 	values[Anum_pg_operator_oprjoin - 1] = ObjectIdGetDatum(joinId);
+	values[Anum_pg_operator_oprstat - 1] = ObjectIdGetDatum(statsId);
 
 	pg_operator_desc = table_open(OperatorRelationId, RowExclusiveLock);
 
@@ -855,6 +859,13 @@ makeOperatorDependencies(HeapTuple tuple,
 		add_exact_object_address(&referenced, addrs);
 	}
 
+	/* Dependency on statistics derivation function */
+	if (OidIsValid(oper->oprstat))
+	{
+		ObjectAddressSet(referenced, ProcedureRelationId, oper->oprstat);
+		add_exact_object_address(&referenced, addrs);
+	}
+
 	record_object_address_dependencies(&myself, addrs, DEPENDENCY_NORMAL);
 	free_object_addresses(addrs);
 
diff --git a/src/backend/commands/operatorcmds.c b/src/backend/commands/operatorcmds.c
index eb50f60ed13..813e369aef6 100644
--- a/src/backend/commands/operatorcmds.c
+++ b/src/backend/commands/operatorcmds.c
@@ -53,6 +53,12 @@
 static Oid	ValidateRestrictionEstimator(List *restrictionName);
 static Oid	ValidateJoinEstimator(List *joinName);
 
+/*
+ * XXX Maybe not the right name, because "estimator" implies we're calculating
+ * selectivity. But we're actually deriving statistics for an expression.
+ */
+static Oid	ValidateStatisticsEstimator(List *joinName);
+
 /*
  * DefineOperator
  *		this function extracts all the information from the
@@ -78,10 +84,12 @@ DefineOperator(List *names, List *parameters)
 	List	   *commutatorName = NIL;	/* optional commutator operator name */
 	List	   *negatorName = NIL;	/* optional negator operator name */
 	List	   *restrictionName = NIL;	/* optional restrict. sel. function */
+	List	   *statisticsName = NIL;	/* optional stats estimat. procedure */
 	List	   *joinName = NIL; /* optional join sel. function */
 	Oid			functionOid;	/* functions converted to OID */
 	Oid			restrictionOid;
 	Oid			joinOid;
+	Oid			statisticsOid;
 	Oid			typeId[2];		/* to hold left and right arg */
 	int			nargs;
 	ListCell   *pl;
@@ -131,6 +139,9 @@ DefineOperator(List *names, List *parameters)
 			restrictionName = defGetQualifiedName(defel);
 		else if (strcmp(defel->defname, "join") == 0)
 			joinName = defGetQualifiedName(defel);
+		/* XXX perhaps full "statistics" wording would be better */
+		else if (strcmp(defel->defname, "stats") == 0)
+			statisticsName = defGetQualifiedName(defel);
 		else if (strcmp(defel->defname, "hashes") == 0)
 			canHash = defGetBoolean(defel);
 		else if (strcmp(defel->defname, "merges") == 0)
@@ -246,6 +257,10 @@ DefineOperator(List *names, List *parameters)
 		joinOid = ValidateJoinEstimator(joinName);
 	else
 		joinOid = InvalidOid;
+	if (statisticsName)
+		statisticsOid = ValidateStatisticsEstimator(statisticsName);
+	else
+		statisticsOid = InvalidOid;
 
 	/*
 	 * now have OperatorCreate do all the work..
@@ -260,6 +275,7 @@ DefineOperator(List *names, List *parameters)
 					   negatorName, /* optional negator operator name */
 					   restrictionOid,	/* optional restrict. sel. function */
 					   joinOid, /* optional join sel. function name */
+					   statisticsOid, /* optional stats estimation procedure */
 					   canMerge,	/* operator merges */
 					   canHash);	/* operator hashes */
 }
@@ -357,6 +373,40 @@ ValidateJoinEstimator(List *joinName)
 	return joinOid;
 }
 
+/*
+ * Look up a statistics estimator function by name, and verify that it has the
+ * correct signature and we have the permissions to attach it to an operator.
+ */
+static Oid
+ValidateStatisticsEstimator(List *statName)
+{
+	Oid			typeId[4];
+	Oid			statOid;
+	AclResult	aclresult;
+
+	typeId[0] = INTERNALOID;	/* PlannerInfo */
+	typeId[1] = INTERNALOID;	/* OpExpr */
+	typeId[2] = INT4OID;		/* varRelid */
+	typeId[3] = INTERNALOID;	/* VariableStatData */
+
+	statOid = LookupFuncName(statName, 4, typeId, false);
+
+	/* statistics estimators must return void */
+	if (get_func_rettype(statOid) != BOOLOID)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+				 errmsg("statistics estimator function %s must return type %s",
+						NameListToString(statName), "boolean")));
+
+	/* Require EXECUTE rights for the estimator */
+	aclresult = pg_proc_aclcheck(statOid, GetUserId(), ACL_EXECUTE);
+	if (aclresult != ACLCHECK_OK)
+		aclcheck_error(aclresult, OBJECT_FUNCTION,
+					   NameListToString(statName));
+
+	return statOid;
+}
+
 /*
  * Guts of operator deletion.
  */
@@ -424,6 +474,9 @@ AlterOperator(AlterOperatorStmt *stmt)
 	List	   *joinName = NIL; /* optional join sel. function */
 	bool		updateJoin = false;
 	Oid			joinOid;
+	List	   *statName = NIL; /* optional statistics estimation procedure */
+	bool		updateStat = false;
+	Oid			statOid;
 
 	/* Look up the operator */
 	oprId = LookupOperWithArgs(stmt->opername, false);
@@ -454,6 +507,11 @@ AlterOperator(AlterOperatorStmt *stmt)
 			joinName = param;
 			updateJoin = true;
 		}
+		else if (pg_strcasecmp(defel->defname, "stats") == 0)
+		{
+			statName = param;
+			updateStat = true;
+		}
 
 		/*
 		 * The rest of the options that CREATE accepts cannot be changed.
@@ -496,6 +554,10 @@ AlterOperator(AlterOperatorStmt *stmt)
 		joinOid = ValidateJoinEstimator(joinName);
 	else
 		joinOid = InvalidOid;
+	if (statName)
+		statOid = ValidateStatisticsEstimator(statName);
+	else
+		statOid = InvalidOid;
 
 	/* Perform additional checks, like OperatorCreate does */
 	if (!(OidIsValid(oprForm->oprleft) && OidIsValid(oprForm->oprright)))
@@ -536,6 +598,11 @@ AlterOperator(AlterOperatorStmt *stmt)
 		replaces[Anum_pg_operator_oprjoin - 1] = true;
 		values[Anum_pg_operator_oprjoin - 1] = joinOid;
 	}
+	if (updateStat)
+	{
+		replaces[Anum_pg_operator_oprstat - 1] = true;
+		values[Anum_pg_operator_oprstat - 1] = statOid;
+	}
 
 	tup = heap_modify_tuple(tup, RelationGetDescr(catalog),
 							values, nulls, replaces);
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 10895fb2876..3015e949db7 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -4915,6 +4915,32 @@ ReleaseDummy(HeapTuple tuple)
 	pfree(tuple);
 }
 
+/*
+ * examine_operator_expression
+ *		Try to derive optimizer statistics for the operator expression using
+ *		operator's oprstat function.
+ *
+ * XXX Not sure why this returns bool - we ignore the return value anyway. We
+ * might as well return the calculated vardata (or NULL).
+ *
+ * XXX Not sure what to do about recursion - there can be another OpExpr in
+ * one of the arguments, and we should call this recursively from the oprstat
+ * procedure. But that's not possible, because it's marked as static.
+ */
+static bool
+examine_operator_expression(PlannerInfo *root, OpExpr *opexpr, int varRelid,
+							VariableStatData *vardata)
+{
+	RegProcedure oprstat = get_oprstat(opexpr->opno);
+
+	return OidIsValid(oprstat) &&
+		DatumGetBool(OidFunctionCall4(oprstat,
+									  PointerGetDatum(root),
+									  PointerGetDatum(opexpr),
+									  Int32GetDatum(varRelid),
+									  PointerGetDatum(vardata)));
+}
+
 /*
  * examine_variable
  *		Try to look up statistical data about an expression.
@@ -5335,6 +5361,19 @@ examine_variable(PlannerInfo *root, Node *node, int varRelid,
 				pos++;
 			}
 		}
+
+		/*
+		 * If there's no index or extended statistics matching the expression,
+		 * try deriving the statistics from statistics on arguments of the
+		 * operator expression (OpExpr). We do this last because it may be quite
+		 * expensive, and it's unclear how accurate the statistics will be.
+		 *
+		 * XXX Shouldn't this put more restrictions on the OpExpr? E.g. that
+		 * one of the arguments has to be a Const or something?
+		 */
+		if (!vardata->statsTuple && IsA(basenode, OpExpr))
+			examine_operator_expression(root, (OpExpr *) basenode, varRelid,
+										vardata);
 	}
 }
 
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index 9176514a962..77a8b01347f 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -1567,6 +1567,30 @@ get_oprjoin(Oid opno)
 		return (RegProcedure) InvalidOid;
 }
 
+/*
+ * get_oprstat
+ *
+ *		Returns procedure id for estimating statistics for an operator.
+ */
+RegProcedure
+get_oprstat(Oid opno)
+{
+	HeapTuple	tp;
+
+	tp = SearchSysCache1(OPEROID, ObjectIdGetDatum(opno));
+	if (HeapTupleIsValid(tp))
+	{
+		Form_pg_operator optup = (Form_pg_operator) GETSTRUCT(tp);
+		RegProcedure result;
+
+		result = optup->oprstat;
+		ReleaseSysCache(tp);
+		return result;
+	}
+	else
+		return (RegProcedure) InvalidOid;
+}
+
 /*				---------- FUNCTION CACHE ----------					 */
 
 /*
diff --git a/src/include/catalog/pg_operator.h b/src/include/catalog/pg_operator.h
index 6ab61517b1e..70d5f70964f 100644
--- a/src/include/catalog/pg_operator.h
+++ b/src/include/catalog/pg_operator.h
@@ -73,6 +73,9 @@ CATALOG(pg_operator,2617,OperatorRelationId)
 
 	/* OID of join estimator, or 0 */
 	regproc		oprjoin BKI_DEFAULT(-) BKI_LOOKUP_OPT(pg_proc);
+
+	/* OID of statistics estimator, or 0 */
+	regproc		oprstat BKI_DEFAULT(-) BKI_LOOKUP_OPT(pg_proc);
 } FormData_pg_operator;
 
 /* ----------------
@@ -95,6 +98,7 @@ extern ObjectAddress OperatorCreate(const char *operatorName,
 									List *negatorName,
 									Oid restrictionId,
 									Oid joinId,
+									Oid statisticsId,
 									bool canMerge,
 									bool canHash);
 
diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h
index 77871aaefc3..12fc1cadc62 100644
--- a/src/include/utils/lsyscache.h
+++ b/src/include/utils/lsyscache.h
@@ -118,6 +118,7 @@ extern Oid	get_commutator(Oid opno);
 extern Oid	get_negator(Oid opno);
 extern RegProcedure get_oprrest(Oid opno);
 extern RegProcedure get_oprjoin(Oid opno);
+extern RegProcedure get_oprstat(Oid opno);
 extern char *get_func_name(Oid funcid);
 extern Oid	get_func_namespace(Oid funcid);
 extern Oid	get_func_rettype(Oid funcid);
diff --git a/src/test/regress/expected/oidjoins.out b/src/test/regress/expected/oidjoins.out
index 215eb899be3..111ea99cdae 100644
--- a/src/test/regress/expected/oidjoins.out
+++ b/src/test/regress/expected/oidjoins.out
@@ -113,6 +113,7 @@ NOTICE:  checking pg_operator {oprnegate} => pg_operator {oid}
 NOTICE:  checking pg_operator {oprcode} => pg_proc {oid}
 NOTICE:  checking pg_operator {oprrest} => pg_proc {oid}
 NOTICE:  checking pg_operator {oprjoin} => pg_proc {oid}
+NOTICE:  checking pg_operator {oprstat} => pg_proc {oid}
 NOTICE:  checking pg_opfamily {opfmethod} => pg_am {oid}
 NOTICE:  checking pg_opfamily {opfnamespace} => pg_namespace {oid}
 NOTICE:  checking pg_opfamily {opfowner} => pg_authid {oid}
-- 
2.31.1

0002-Add-stats_form_tuple-20220106.patchtext/x-patch; charset=UTF-8; name=0002-Add-stats_form_tuple-20220106.patchDownload

From e2768155f02b9c83af78c6b0128a7e86e8936f73 Mon Sep 17 00:00:00 2001
From: Nikita Glukhov <n.gluhov@postgrespro.ru>
Date: Wed, 7 Dec 2016 16:12:55 +0300
Subject: [PATCH 2/6] Add stats_form_tuple()

---
 src/backend/utils/adt/selfuncs.c | 55 ++++++++++++++++++++++++++++++++
 src/include/utils/selfuncs.h     | 22 +++++++++++++
 2 files changed, 77 insertions(+)

diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 3015e949db7..3e8fc47f662 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -7958,3 +7958,58 @@ brincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 
 	*indexPages = index->pages;
 }
+
+/*
+ * stats_form_tuple - Form pg_statistic tuple from StatsData.
+ *
+ * If 'data' parameter is NULL, form all-NULL tuple (nullfrac = 1.0).
+ */
+HeapTuple
+stats_form_tuple(StatsData *data)
+{
+	Relation	rel;
+	HeapTuple	tuple;
+	Datum		values[Natts_pg_statistic];
+	bool		nulls[Natts_pg_statistic];
+	int			i;
+
+	for (i = 0; i < Natts_pg_statistic; ++i)
+		nulls[i] = false;
+
+	values[Anum_pg_statistic_starelid - 1] = ObjectIdGetDatum(InvalidOid);
+	values[Anum_pg_statistic_staattnum - 1] = Int16GetDatum(0);
+	values[Anum_pg_statistic_stainherit - 1] = BoolGetDatum(false);
+	values[Anum_pg_statistic_stanullfrac - 1] =
+									Float4GetDatum(data ? data->nullfrac : 1.0);
+	values[Anum_pg_statistic_stawidth - 1] =
+									Int32GetDatum(data ? data->width : 0);
+	values[Anum_pg_statistic_stadistinct - 1] =
+									Float4GetDatum(data ? data->distinct : 0);
+
+	for (i = 0; i < STATISTIC_NUM_SLOTS; i++)
+	{
+		StatsSlot *slot = data ? &data->slots[i] : NULL;
+
+		values[Anum_pg_statistic_stakind1 + i - 1] =
+								Int16GetDatum(slot ? slot->kind : 0);
+
+		values[Anum_pg_statistic_staop1 + i - 1] =
+					ObjectIdGetDatum(slot ? slot->opid : InvalidOid);
+
+		if (slot && DatumGetPointer(slot->numbers))
+			values[Anum_pg_statistic_stanumbers1 + i - 1] = slot->numbers;
+		else
+			nulls[Anum_pg_statistic_stanumbers1 + i - 1] = true;
+
+		if (slot && DatumGetPointer(slot->values))
+			values[Anum_pg_statistic_stavalues1 + i - 1] = slot->values;
+		else
+			nulls[Anum_pg_statistic_stavalues1 + i - 1] = true;
+	}
+
+	rel = table_open(StatisticRelationId, AccessShareLock);
+	tuple = heap_form_tuple(RelationGetDescr(rel), values, nulls);
+	table_close(rel, NoLock);
+
+	return tuple;
+}
diff --git a/src/include/utils/selfuncs.h b/src/include/utils/selfuncs.h
index 9dd444e1ff5..b90e05b8707 100644
--- a/src/include/utils/selfuncs.h
+++ b/src/include/utils/selfuncs.h
@@ -16,6 +16,7 @@
 #define SELFUNCS_H
 
 #include "access/htup.h"
+#include "catalog/pg_statistic.h"
 #include "fmgr.h"
 #include "nodes/pathnodes.h"
 
@@ -133,6 +134,24 @@ typedef struct
 	double		num_sa_scans;	/* # indexscans from ScalarArrayOpExprs */
 } GenericCosts;
 
+/* Single pg_statistic slot */
+typedef struct StatsSlot
+{
+	int16	kind;		/* stakindN: statistic kind (STATISTIC_KIND_XXX) */
+	Oid		opid;		/* staopN: associated operator, if needed */
+	Datum	numbers;	/* stanumbersN: float4 array of numbers */
+	Datum	values;		/* stavaluesN: anyarray of values */
+} StatsSlot;
+
+/* Deformed pg_statistic tuple */
+typedef struct StatsData
+{
+	float4		nullfrac;	/* stanullfrac: fraction of NULL values  */
+	float4		distinct;	/* stadistinct: number of distinct non-NULL values */
+	int32		width;		/* stawidth: average width in bytes of non-NULL values */
+	StatsSlot	slots[STATISTIC_NUM_SLOTS]; /* slots for different statistic types */
+} StatsData;
+
 /* Hooks for plugins to get control when we ask for stats */
 typedef bool (*get_relation_stats_hook_type) (PlannerInfo *root,
 											  RangeTblEntry *rte,
@@ -231,6 +250,9 @@ extern void genericcostestimate(PlannerInfo *root, IndexPath *path,
 								double loop_count,
 								GenericCosts *costs);
 
+extern HeapTuple stats_form_tuple(StatsData *data);
+
+
 /* Functions in array_selfuncs.c */
 
 extern Selectivity scalararraysel_containment(PlannerInfo *root,
-- 
2.31.1

0003-Add-symbolic-names-for-some-jsonb-operators-20220106.patchtext/x-patch; charset=UTF-8; name=0003-Add-symbolic-names-for-some-jsonb-operators-20220106.patchDownload

From 47e45ce7aac8d47c0aa4a563ebe4663d0fc56441 Mon Sep 17 00:00:00 2001
From: Nikita Glukhov <n.gluhov@postgrespro.ru>
Date: Sat, 12 Nov 2016 18:59:43 +0300
Subject: [PATCH 3/6] Add symbolic names for some jsonb operators

---
 src/include/catalog/pg_operator.dat | 45 +++++++++++++++++------------
 1 file changed, 26 insertions(+), 19 deletions(-)

diff --git a/src/include/catalog/pg_operator.dat b/src/include/catalog/pg_operator.dat
index 2bc7cc35484..c0ff8da722c 100644
--- a/src/include/catalog/pg_operator.dat
+++ b/src/include/catalog/pg_operator.dat
@@ -2067,7 +2067,7 @@
 { oid => '1751', descr => 'negate',
   oprname => '-', oprkind => 'l', oprleft => '0', oprright => 'numeric',
   oprresult => 'numeric', oprcode => 'numeric_uminus' },
-{ oid => '1752', descr => 'equal',
+{ oid => '1752', oid_symbol => 'NumericEqOperator', descr => 'equal',
   oprname => '=', oprcanmerge => 't', oprcanhash => 't', oprleft => 'numeric',
   oprright => 'numeric', oprresult => 'bool', oprcom => '=(numeric,numeric)',
   oprnegate => '<>(numeric,numeric)', oprcode => 'numeric_eq',
@@ -2077,7 +2077,7 @@
   oprresult => 'bool', oprcom => '<>(numeric,numeric)',
   oprnegate => '=(numeric,numeric)', oprcode => 'numeric_ne',
   oprrest => 'neqsel', oprjoin => 'neqjoinsel' },
-{ oid => '1754', descr => 'less than',
+{ oid => '1754', oid_symbol => 'NumericLtOperator', descr => 'less than',
   oprname => '<', oprleft => 'numeric', oprright => 'numeric',
   oprresult => 'bool', oprcom => '>(numeric,numeric)',
   oprnegate => '>=(numeric,numeric)', oprcode => 'numeric_lt',
@@ -3172,70 +3172,77 @@
 { oid => '3967', descr => 'get value from json as text with path elements',
   oprname => '#>>', oprleft => 'json', oprright => '_text', oprresult => 'text',
   oprcode => 'json_extract_path_text' },
-{ oid => '3211', descr => 'get jsonb object field',
+{ oid => '3211', oid_symbol => 'JsonbObjectFieldOperator',
+  descr => 'get jsonb object field',
   oprname => '->', oprleft => 'jsonb', oprright => 'text', oprresult => 'jsonb',
   oprcode => 'jsonb_object_field' },
-{ oid => '3477', descr => 'get jsonb object field as text',
+{ oid => '3477', oid_symbol => 'JsonbObjectFieldTextOperator',
+  descr => 'get jsonb object field as text',
   oprname => '->>', oprleft => 'jsonb', oprright => 'text', oprresult => 'text',
   oprcode => 'jsonb_object_field_text' },
-{ oid => '3212', descr => 'get jsonb array element',
+{ oid => '3212', oid_symbol => 'JsonbArrayElementOperator',
+  descr => 'get jsonb array element',
   oprname => '->', oprleft => 'jsonb', oprright => 'int4', oprresult => 'jsonb',
   oprcode => 'jsonb_array_element' },
-{ oid => '3481', descr => 'get jsonb array element as text',
+{ oid => '3481', oid_symbol => 'JsonbArrayElementTextOperator',
+  descr => 'get jsonb array element as text',
   oprname => '->>', oprleft => 'jsonb', oprright => 'int4', oprresult => 'text',
   oprcode => 'jsonb_array_element_text' },
-{ oid => '3213', descr => 'get value from jsonb with path elements',
+{ oid => '3213', oid_symbol => 'JsonbExtractPathOperator',
+  descr => 'get value from jsonb with path elements',
   oprname => '#>', oprleft => 'jsonb', oprright => '_text',
   oprresult => 'jsonb', oprcode => 'jsonb_extract_path' },
-{ oid => '3206', descr => 'get value from jsonb as text with path elements',
+{ oid => '3206', oid_symbol => 'JsonbExtractPathTextOperator',
+  descr => 'get value from jsonb as text with path elements',
   oprname => '#>>', oprleft => 'jsonb', oprright => '_text',
   oprresult => 'text', oprcode => 'jsonb_extract_path_text' },
-{ oid => '3240', descr => 'equal',
+{ oid => '3240', oid_symbol => 'JsonbEqOperator', descr => 'equal',
   oprname => '=', oprcanmerge => 't', oprcanhash => 't', oprleft => 'jsonb',
   oprright => 'jsonb', oprresult => 'bool', oprcom => '=(jsonb,jsonb)',
   oprnegate => '<>(jsonb,jsonb)', oprcode => 'jsonb_eq', oprrest => 'eqsel',
   oprjoin => 'eqjoinsel' },
-{ oid => '3241', descr => 'not equal',
+{ oid => '3241', oid_symbol => 'JsonbNeOperator', descr => 'not equal',
   oprname => '<>', oprleft => 'jsonb', oprright => 'jsonb', oprresult => 'bool',
   oprcom => '<>(jsonb,jsonb)', oprnegate => '=(jsonb,jsonb)',
   oprcode => 'jsonb_ne', oprrest => 'neqsel', oprjoin => 'neqjoinsel' },
-{ oid => '3242', descr => 'less than',
+{ oid => '3242', oid_symbol => 'JsonbLtOperator', descr => 'less than',
   oprname => '<', oprleft => 'jsonb', oprright => 'jsonb', oprresult => 'bool',
   oprcom => '>(jsonb,jsonb)', oprnegate => '>=(jsonb,jsonb)',
   oprcode => 'jsonb_lt', oprrest => 'scalarltsel',
   oprjoin => 'scalarltjoinsel' },
-{ oid => '3243', descr => 'greater than',
+{ oid => '3243', oid_symbol => 'JsonbGtOperator', descr => 'greater than',
   oprname => '>', oprleft => 'jsonb', oprright => 'jsonb', oprresult => 'bool',
   oprcom => '<(jsonb,jsonb)', oprnegate => '<=(jsonb,jsonb)',
   oprcode => 'jsonb_gt', oprrest => 'scalargtsel',
   oprjoin => 'scalargtjoinsel' },
-{ oid => '3244', descr => 'less than or equal',
+{ oid => '3244', oid_symbol => 'JsonbLeOperator', descr => 'less than or equal',
   oprname => '<=', oprleft => 'jsonb', oprright => 'jsonb', oprresult => 'bool',
   oprcom => '>=(jsonb,jsonb)', oprnegate => '>(jsonb,jsonb)',
   oprcode => 'jsonb_le', oprrest => 'scalarlesel',
   oprjoin => 'scalarlejoinsel' },
-{ oid => '3245', descr => 'greater than or equal',
+{ oid => '3245', oid_symbol => 'JsonbGeOperator',
+  descr => 'greater than or equal',
   oprname => '>=', oprleft => 'jsonb', oprright => 'jsonb', oprresult => 'bool',
   oprcom => '<=(jsonb,jsonb)', oprnegate => '<(jsonb,jsonb)',
   oprcode => 'jsonb_ge', oprrest => 'scalargesel',
   oprjoin => 'scalargejoinsel' },
-{ oid => '3246', descr => 'contains',
+{ oid => '3246', oid_symbol => 'JsonbContainsOperator', descr => 'contains',
   oprname => '@>', oprleft => 'jsonb', oprright => 'jsonb', oprresult => 'bool',
   oprcom => '<@(jsonb,jsonb)', oprcode => 'jsonb_contains',
   oprrest => 'matchingsel', oprjoin => 'matchingjoinsel' },
-{ oid => '3247', descr => 'key exists',
+{ oid => '3247', oid_symbol => 'JsonbExistsOperator', descr => 'key exists',
   oprname => '?', oprleft => 'jsonb', oprright => 'text', oprresult => 'bool',
   oprcode => 'jsonb_exists', oprrest => 'matchingsel',
   oprjoin => 'matchingjoinsel' },
-{ oid => '3248', descr => 'any key exists',
+{ oid => '3248', oid_symbol => 'JsonbExistsAnyOperator', descr => 'any key exists',
   oprname => '?|', oprleft => 'jsonb', oprright => '_text', oprresult => 'bool',
   oprcode => 'jsonb_exists_any', oprrest => 'matchingsel',
   oprjoin => 'matchingjoinsel' },
-{ oid => '3249', descr => 'all keys exist',
+{ oid => '3249', oid_symbol => 'JsonbExistsAllOperator', descr => 'all keys exist',
   oprname => '?&', oprleft => 'jsonb', oprright => '_text', oprresult => 'bool',
   oprcode => 'jsonb_exists_all', oprrest => 'matchingsel',
   oprjoin => 'matchingjoinsel' },
-{ oid => '3250', descr => 'is contained by',
+{ oid => '3250', oid_symbol => 'JsonbContainedOperator', descr => 'is contained by',
   oprname => '<@', oprleft => 'jsonb', oprright => 'jsonb', oprresult => 'bool',
   oprcom => '@>(jsonb,jsonb)', oprcode => 'jsonb_contained',
   oprrest => 'matchingsel', oprjoin => 'matchingjoinsel' },
-- 
2.31.1

0004-Add-helper-jsonb-functions-and-macros-20220106.patchtext/x-patch; charset=UTF-8; name=0004-Add-helper-jsonb-functions-and-macros-20220106.patchDownload

From 77b50a382338ce29cb963706d29de3fa6a5a3514 Mon Sep 17 00:00:00 2001
From: Nikita Glukhov <n.gluhov@postgrespro.ru>
Date: Fri, 16 Dec 2016 17:16:47 +0300
Subject: [PATCH 4/6] Add helper jsonb functions and macros

---
 src/backend/utils/adt/jsonb_util.c |  27 +++++
 src/backend/utils/adt/jsonfuncs.c  |  10 +-
 src/include/utils/jsonb.h          | 166 ++++++++++++++++++++++++++++-
 3 files changed, 196 insertions(+), 7 deletions(-)

diff --git a/src/backend/utils/adt/jsonb_util.c b/src/backend/utils/adt/jsonb_util.c
index 57111877959..8538fc6c37f 100644
--- a/src/backend/utils/adt/jsonb_util.c
+++ b/src/backend/utils/adt/jsonb_util.c
@@ -388,6 +388,22 @@ findJsonbValueFromContainer(JsonbContainer *container, uint32 flags,
 	return NULL;
 }
 
+/*
+ * findJsonbValueFromContainer() wrapper that sets up JsonbValue key string.
+ */
+JsonbValue *
+findJsonbValueFromContainerLen(JsonbContainer *container, uint32 flags,
+							   const char *key, uint32 keylen)
+{
+	JsonbValue	k;
+
+	k.type = jbvString;
+	k.val.string.val = key;
+	k.val.string.len = keylen;
+
+	return findJsonbValueFromContainer(container, flags, &k);
+}
+
 /*
  * Find value by key in Jsonb object and fetch it into 'res', which is also
  * returned.
@@ -602,6 +618,17 @@ pushJsonbValue(JsonbParseState **pstate, JsonbIteratorToken seq,
 		return pushJsonbValueScalar(pstate, seq, jbval);
 	}
 
+	/*
+	 * XXX I'm not quite sure why we actually do this? Why do we need to change
+	 * how JsonbValue is converted to Jsonb for the statistics patch?
+	 */
+	/* push value from scalar container without its enclosing array */
+	if (*pstate && JsonbExtractScalar(jbval->val.binary.data, &v))
+	{
+		Assert(IsAJsonbScalar(&v));
+		return pushJsonbValueScalar(pstate, seq, &v);
+	}
+
 	/* unpack the binary and add each piece to the pstate */
 	it = JsonbIteratorInit(jbval->val.binary.data);
 
diff --git a/src/backend/utils/adt/jsonfuncs.c b/src/backend/utils/adt/jsonfuncs.c
index 6335845d08e..adfaa8ec5ce 100644
--- a/src/backend/utils/adt/jsonfuncs.c
+++ b/src/backend/utils/adt/jsonfuncs.c
@@ -5371,7 +5371,8 @@ iterate_jsonb_values(Jsonb *jb, uint32 flags, void *state,
 		if (type == WJB_KEY)
 		{
 			if (flags & jtiKey)
-				action(state, v.val.string.val, v.val.string.len);
+				action(state, unconstify(char *, v.val.string.val),
+					   v.val.string.len);
 
 			continue;
 		}
@@ -5386,7 +5387,8 @@ iterate_jsonb_values(Jsonb *jb, uint32 flags, void *state,
 		{
 			case jbvString:
 				if (flags & jtiString)
-					action(state, v.val.string.val, v.val.string.len);
+					action(state, unconstify(char *, v.val.string.val),
+						   v.val.string.len);
 				break;
 			case jbvNumeric:
 				if (flags & jtiNumeric)
@@ -5508,7 +5510,9 @@ transform_jsonb_string_values(Jsonb *jsonb, void *action_state,
 	{
 		if ((type == WJB_VALUE || type == WJB_ELEM) && v.type == jbvString)
 		{
-			out = transform_action(action_state, v.val.string.val, v.val.string.len);
+			out = transform_action(action_state,
+								   unconstify(char *, v.val.string.val),
+								   v.val.string.len);
 			v.val.string.val = VARDATA_ANY(out);
 			v.val.string.len = VARSIZE_ANY_EXHDR(out);
 			res = pushJsonbValue(&st, type, type < WJB_BEGIN_ARRAY ? &v : NULL);
diff --git a/src/include/utils/jsonb.h b/src/include/utils/jsonb.h
index 4e07debf785..95cecb63ce7 100644
--- a/src/include/utils/jsonb.h
+++ b/src/include/utils/jsonb.h
@@ -14,6 +14,7 @@
 
 #include "lib/stringinfo.h"
 #include "utils/array.h"
+#include "utils/builtins.h"
 #include "utils/numeric.h"
 
 /* Tokens used when sequentially processing a jsonb value */
@@ -229,8 +230,7 @@ typedef struct
 #define JB_ROOT_IS_OBJECT(jbp_) ((*(uint32 *) VARDATA(jbp_) & JB_FOBJECT) != 0)
 #define JB_ROOT_IS_ARRAY(jbp_)	((*(uint32 *) VARDATA(jbp_) & JB_FARRAY) != 0)
 
-
-enum jbvType
+typedef enum jbvType
 {
 	/* Scalar types */
 	jbvNull = 0x0,
@@ -250,7 +250,7 @@ enum jbvType
 	 * into JSON strings when outputted to json/jsonb.
 	 */
 	jbvDatetime = 0x20,
-};
+} JsonbValueType;
 
 /*
  * JsonbValue:	In-memory representation of Jsonb.  This is a convenient
@@ -269,7 +269,7 @@ struct JsonbValue
 		struct
 		{
 			int			len;
-			char	   *val;	/* Not necessarily null-terminated */
+			const char *val;	/* Not necessarily null-terminated */
 		}			string;		/* String primitive type */
 
 		struct
@@ -382,6 +382,10 @@ extern int	compareJsonbContainers(JsonbContainer *a, JsonbContainer *b);
 extern JsonbValue *findJsonbValueFromContainer(JsonbContainer *sheader,
 											   uint32 flags,
 											   JsonbValue *key);
+extern JsonbValue *findJsonbValueFromContainerLen(JsonbContainer *container,
+												  uint32 flags,
+												  const char *key,
+												  uint32 keylen);
 extern JsonbValue *getKeyJsonValueFromContainer(JsonbContainer *container,
 												const char *keyVal, int keyLen,
 												JsonbValue *res);
@@ -399,6 +403,7 @@ extern bool JsonbDeepContains(JsonbIterator **val,
 extern void JsonbHashScalarValue(const JsonbValue *scalarVal, uint32 *hash);
 extern void JsonbHashScalarValueExtended(const JsonbValue *scalarVal,
 										 uint64 *hash, uint64 seed);
+extern bool JsonbExtractScalar(JsonbContainer *jbc, JsonbValue *res);
 
 /* jsonb.c support functions */
 extern char *JsonbToCString(StringInfo out, JsonbContainer *in,
@@ -412,4 +417,157 @@ extern Datum jsonb_set_element(Jsonb *jb, Datum *path, int path_len,
 							   JsonbValue *newval);
 extern Datum jsonb_get_element(Jsonb *jb, Datum *path, int npath,
 							   bool *isnull, bool as_text);
+
+/*
+ * XXX Not sure we want to add these functions to jsonb.h, which is the
+ * public API. Maybe it belongs rather to jsonb_typanalyze.c or elsewhere,
+ * closer to how it's used?
+ */
+
+/* helper inline functions for JsonbValue initialization */
+static inline JsonbValue *
+JsonValueInitObject(JsonbValue *val, int nPairs, int nPairsAllocated)
+{
+	val->type = jbvObject;
+	val->val.object.nPairs = nPairs;
+	val->val.object.pairs = nPairsAllocated ?
+							palloc(sizeof(JsonbPair) * nPairsAllocated) : NULL;
+
+	return val;
+}
+
+static inline JsonbValue *
+JsonValueInitArray(JsonbValue *val, int nElems, int nElemsAllocated,
+				   bool rawScalar)
+{
+	val->type = jbvArray;
+	val->val.array.nElems = nElems;
+	val->val.array.elems = nElemsAllocated ?
+							palloc(sizeof(JsonbValue) * nElemsAllocated) : NULL;
+	val->val.array.rawScalar = rawScalar;
+
+	return val;
+}
+
+static inline JsonbValue *
+JsonValueInitBinary(JsonbValue *val, Jsonb *jb)
+{
+	val->type = jbvBinary;
+	val->val.binary.data = &(jb)->root;
+	val->val.binary.len = VARSIZE_ANY_EXHDR(jb);
+	return val;
+}
+
+
+static inline JsonbValue *
+JsonValueInitString(JsonbValue *jbv, const char *str)
+{
+	jbv->type = jbvString;
+	jbv->val.string.len = strlen(str);
+	jbv->val.string.val = memcpy(palloc(jbv->val.string.len + 1), str,
+								 jbv->val.string.len + 1);
+	return jbv;
+}
+
+static inline JsonbValue *
+JsonValueInitStringWithLen(JsonbValue *jbv, const char *str, int len)
+{
+	jbv->type = jbvString;
+	jbv->val.string.val = str;
+	jbv->val.string.len = len;
+	return jbv;
+}
+
+static inline JsonbValue *
+JsonValueInitText(JsonbValue *jbv, text *txt)
+{
+	jbv->type = jbvString;
+	jbv->val.string.val = VARDATA_ANY(txt);
+	jbv->val.string.len = VARSIZE_ANY_EXHDR(txt);
+	return jbv;
+}
+
+static inline JsonbValue *
+JsonValueInitNumeric(JsonbValue *jbv, Numeric num)
+{
+	jbv->type = jbvNumeric;
+	jbv->val.numeric = num;
+	return jbv;
+}
+
+static inline JsonbValue *
+JsonValueInitInteger(JsonbValue *jbv, int64 i)
+{
+	jbv->type = jbvNumeric;
+	jbv->val.numeric = DatumGetNumeric(DirectFunctionCall1(
+											int8_numeric, Int64GetDatum(i)));
+	return jbv;
+}
+
+static inline JsonbValue *
+JsonValueInitFloat(JsonbValue *jbv, float4 f)
+{
+	jbv->type = jbvNumeric;
+	jbv->val.numeric = DatumGetNumeric(DirectFunctionCall1(
+											float4_numeric, Float4GetDatum(f)));
+	return jbv;
+}
+
+static inline JsonbValue *
+JsonValueInitDouble(JsonbValue *jbv, float8 f)
+{
+	jbv->type = jbvNumeric;
+	jbv->val.numeric = DatumGetNumeric(DirectFunctionCall1(
+											float8_numeric, Float8GetDatum(f)));
+	return jbv;
+}
+
+/* helper macros for jsonb building */
+#define pushJsonbKey(pstate, jbv, key) \
+		pushJsonbValue(pstate, WJB_KEY, JsonValueInitString(jbv, key))
+
+#define pushJsonbValueGeneric(Type, pstate, jbv, val) \
+		pushJsonbValue(pstate, WJB_VALUE, JsonValueInit##Type(jbv, val))
+
+#define pushJsonbElemGeneric(Type, pstate, jbv, val) \
+		pushJsonbValue(pstate, WJB_ELEM, JsonValueInit##Type(jbv, val))
+
+#define pushJsonbValueInteger(pstate, jbv, i) \
+		pushJsonbValueGeneric(Integer, pstate, jbv, i)
+
+#define pushJsonbValueFloat(pstate, jbv, f) \
+		pushJsonbValueGeneric(Float, pstate, jbv, f)
+
+#define pushJsonbElemFloat(pstate, jbv, f) \
+		pushJsonbElemGeneric(Float, pstate, jbv, f)
+
+#define pushJsonbElemString(pstate, jbv, txt) \
+		pushJsonbElemGeneric(String, pstate, jbv, txt)
+
+#define pushJsonbElemText(pstate, jbv, txt) \
+		pushJsonbElemGeneric(Text, pstate, jbv, txt)
+
+#define pushJsonbElemNumeric(pstate, jbv, num) \
+		pushJsonbElemGeneric(Numeric, pstate, jbv, num)
+
+#define pushJsonbElemInteger(pstate, jbv, num) \
+		pushJsonbElemGeneric(Integer, pstate, jbv, num)
+
+#define pushJsonbElemBinary(pstate, jbv, jbcont) \
+		pushJsonbElemGeneric(Binary, pstate, jbv, jbcont)
+
+#define pushJsonbKeyValueGeneric(Type, pstate, jbv, key, val) ( \
+		pushJsonbKey(pstate, jbv, key), \
+		pushJsonbValueGeneric(Type, pstate, jbv, val) \
+	)
+
+#define pushJsonbKeyValueString(pstate, jbv, key, val) \
+		pushJsonbKeyValueGeneric(String, pstate, jbv, key, val)
+
+#define pushJsonbKeyValueFloat(pstate, jbv, key, val) \
+		pushJsonbKeyValueGeneric(Float, pstate, jbv, key, val)
+
+#define pushJsonbKeyValueInteger(pstate, jbv, key, val) \
+		pushJsonbKeyValueGeneric(Integer, pstate, jbv, key, val)
+
 #endif							/* __JSONB_H__ */
-- 
2.31.1

0005-Export-scalarineqsel-20220106.patchtext/x-patch; charset=UTF-8; name=0005-Export-scalarineqsel-20220106.patchDownload

From 4a6290e876a7d83129f7a01a8549d5d356a0c287 Mon Sep 17 00:00:00 2001
From: Nikita Glukhov <n.gluhov@postgrespro.ru>
Date: Sat, 12 Nov 2016 19:01:43 +0300
Subject: [PATCH 5/6] Export scalarineqsel()

---
 src/backend/utils/adt/selfuncs.c | 2 +-
 src/include/utils/selfuncs.h     | 4 ++++
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 3e8fc47f662..1478ee536ad 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -573,7 +573,7 @@ neqsel(PG_FUNCTION_ARGS)
  * it will return an approximate estimate based on assuming that the constant
  * value falls in the middle of the bin identified by binary search.
  */
-static double
+double
 scalarineqsel(PlannerInfo *root, Oid operator, bool isgt, bool iseq,
 			  Oid collation,
 			  VariableStatData *vardata, Datum constval, Oid consttype)
diff --git a/src/include/utils/selfuncs.h b/src/include/utils/selfuncs.h
index b90e05b8707..74841a108b4 100644
--- a/src/include/utils/selfuncs.h
+++ b/src/include/utils/selfuncs.h
@@ -249,6 +249,10 @@ extern List *add_predicate_to_index_quals(IndexOptInfo *index,
 extern void genericcostestimate(PlannerInfo *root, IndexPath *path,
 								double loop_count,
 								GenericCosts *costs);
+extern double scalarineqsel(PlannerInfo *root, Oid operator, bool isgt,
+							bool iseq, Oid collation,
+							VariableStatData *vardata, Datum constval,
+							Oid consttype);
 
 extern HeapTuple stats_form_tuple(StatsData *data);
 
-- 
2.31.1

0006-Add-jsonb-statistics-20220106.patchtext/x-patch; charset=UTF-8; name=0006-Add-jsonb-statistics-20220106.patchDownload

From a44d9cfa7a7c1a41c110f717082af77a00202402 Mon Sep 17 00:00:00 2001
From: Nikita Glukhov <n.gluhov@postgrespro.ru>
Date: Sat, 12 Nov 2016 19:19:33 +0300
Subject: [PATCH 6/6] Add jsonb statistics

---
 src/backend/catalog/system_functions.sql  |   36 +
 src/backend/catalog/system_views.sql      |   56 +
 src/backend/utils/adt/Makefile            |    2 +
 src/backend/utils/adt/jsonb_selfuncs.c    | 1353 ++++++++++++++++++++
 src/backend/utils/adt/jsonb_typanalyze.c  | 1379 +++++++++++++++++++++
 src/backend/utils/adt/jsonpath_exec.c     |    2 +-
 src/include/catalog/pg_operator.dat       |   15 +-
 src/include/catalog/pg_proc.dat           |   11 +
 src/include/catalog/pg_statistic.h        |    2 +
 src/include/catalog/pg_type.dat           |    2 +-
 src/include/utils/json_selfuncs.h         |  100 ++
 src/test/regress/expected/jsonb_stats.out |  713 +++++++++++
 src/test/regress/expected/rules.out       |   32 +
 src/test/regress/parallel_schedule        |    2 +-
 src/test/regress/sql/jsonb_stats.sql      |  249 ++++
 15 files changed, 3944 insertions(+), 10 deletions(-)
 create mode 100644 src/backend/utils/adt/jsonb_selfuncs.c
 create mode 100644 src/backend/utils/adt/jsonb_typanalyze.c
 create mode 100644 src/include/utils/json_selfuncs.h
 create mode 100644 src/test/regress/expected/jsonb_stats.out
 create mode 100644 src/test/regress/sql/jsonb_stats.sql

diff --git a/src/backend/catalog/system_functions.sql b/src/backend/catalog/system_functions.sql
index 3a4fa9091b1..53cf6fc893a 100644
--- a/src/backend/catalog/system_functions.sql
+++ b/src/backend/catalog/system_functions.sql
@@ -594,6 +594,42 @@ LANGUAGE internal
 STRICT IMMUTABLE PARALLEL SAFE
 AS 'unicode_is_normalized';
 
+-- XXX is this function immutable / parallel safe?
+-- XXX do we actually need to cast to text and then to jsonb?
+CREATE FUNCTION pg_json_path_stats(tab regclass, path_index integer) RETURNS text
+AS $$
+	SELECT jsonb_pretty((
+		CASE
+			WHEN stakind1 = 8 THEN stavalues1
+			WHEN stakind2 = 8 THEN stavalues2
+			WHEN stakind3 = 8 THEN stavalues3
+			WHEN stakind4 = 8 THEN stavalues4
+			WHEN stakind5 = 8 THEN stavalues5
+		END::text::jsonb[])[$2])
+	FROM pg_statistic
+	WHERE starelid = $1
+$$ LANGUAGE 'sql';
+
+-- XXX is this function immutable / parallel safe?
+-- XXX do we actually need to cast to text and then to jsonb?
+CREATE FUNCTION pg_json_path_stats(tab regclass, path text) RETURNS text
+AS $$
+	SELECT jsonb_pretty(pathstats)
+	FROM (
+		SELECT unnest(
+			CASE
+				WHEN stakind1 = 8 THEN stavalues1
+				WHEN stakind2 = 8 THEN stavalues2
+				WHEN stakind3 = 8 THEN stavalues3
+				WHEN stakind4 = 8 THEN stavalues4
+				WHEN stakind5 = 8 THEN stavalues5
+			END::text::jsonb[]) pathstats
+		FROM pg_statistic
+		WHERE starelid = $1
+	) paths
+	WHERE pathstats->>'path' = $2
+$$ LANGUAGE 'sql';
+
 --
 -- The default permissions for functions mean that anyone can execute them.
 -- A number of functions shouldn't be executable by just anyone, but rather
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 61b515cdb85..357b8b7c5ed 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -362,6 +362,62 @@ CREATE VIEW pg_stats_ext_exprs WITH (security_barrier) AS
 -- unprivileged users may read pg_statistic_ext but not pg_statistic_ext_data
 REVOKE ALL ON pg_statistic_ext_data FROM public;
 
+-- XXX This probably needs to do the same checks as pg_stats, i.e.
+--    WHERE NOT attisdropped
+--    AND has_column_privilege(c.oid, a.attnum, 'select')
+--    AND (c.relrowsecurity = false OR NOT row_security_active(c.oid));
+CREATE VIEW pg_stats_json AS
+	SELECT
+		nspname AS schemaname,
+		relname AS tablename,
+		attname AS attname,
+
+		path->>'path' AS json_path,
+
+		stainherit AS inherited,
+
+		(path->'json'->>'nullfrac')::float4 AS null_frac,
+		(path->'json'->>'width')::float4 AS avg_width,
+		(path->'json'->>'distinct')::float4 AS n_distinct,
+
+		ARRAY(SELECT val FROM jsonb_array_elements(
+				path->'json'->'mcv'->'values') val)::anyarray
+			AS most_common_vals,
+
+		ARRAY(SELECT num::text::float4 FROM jsonb_array_elements(
+				path->'json'->'mcv'->'numbers') num)
+			AS most_common_freqs,
+
+		ARRAY(SELECT val FROM jsonb_array_elements(
+				path->'json'->'histogram'->'values') val)
+			AS histogram_bounds,
+
+		ARRAY(SELECT val::text::int FROM jsonb_array_elements(
+				path->'array_length'->'mcv'->'values') val)
+			AS most_common_array_lengths,
+
+		ARRAY(SELECT num::text::float4 FROM jsonb_array_elements(
+				path->'array_length'->'mcv'->'numbers') num)
+			AS most_common_array_length_freqs,
+
+		(path->'json'->>'correlation')::float4 AS correlation
+
+	FROM
+		pg_statistic s JOIN pg_class c ON (c.oid = s.starelid)
+		JOIN pg_attribute a ON (c.oid = attrelid AND attnum = s.staattnum)
+		LEFT JOIN pg_namespace n ON (n.oid = c.relnamespace),
+		LATERAL (
+			SELECT unnest((CASE
+					WHEN stakind1 = 8 THEN stavalues1
+					WHEN stakind2 = 8 THEN stavalues2
+					WHEN stakind3 = 8 THEN stavalues3
+					WHEN stakind4 = 8 THEN stavalues4
+					WHEN stakind5 = 8 THEN stavalues5
+				END ::text::jsonb[])[2:]) AS path
+		) paths;
+
+-- no need to revoke any privileges, we've already revoked accss to pg_statistic
+
 CREATE VIEW pg_publication_tables AS
     SELECT
         P.pubname AS pubname,
diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile
index 41b486bceff..5e359ccf4fb 100644
--- a/src/backend/utils/adt/Makefile
+++ b/src/backend/utils/adt/Makefile
@@ -50,6 +50,8 @@ OBJS = \
 	jsonb.o \
 	jsonb_gin.o \
 	jsonb_op.o \
+	jsonb_selfuncs.o \
+	jsonb_typanalyze.o \
 	jsonb_util.o \
 	jsonfuncs.o \
 	jsonbsubs.o \
diff --git a/src/backend/utils/adt/jsonb_selfuncs.c b/src/backend/utils/adt/jsonb_selfuncs.c
new file mode 100644
index 00000000000..f144b8c0db3
--- /dev/null
+++ b/src/backend/utils/adt/jsonb_selfuncs.c
@@ -0,0 +1,1353 @@
+/*-------------------------------------------------------------------------
+ *
+ * jsonb_selfuncs.c
+ *	  Functions for selectivity estimation of jsonb operators
+ *
+ * Copyright (c) 2016, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/adt/jsonb_selfuncs.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include <math.h>
+
+#include "fmgr.h"
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "catalog/pg_operator.h"
+#include "catalog/pg_statistic.h"
+#include "catalog/pg_type.h"
+#include "nodes/primnodes.h"
+#include "utils/builtins.h"
+#include "utils/json.h"
+#include "utils/jsonb.h"
+#include "utils/json_selfuncs.h"
+#include "utils/lsyscache.h"
+#include "utils/rel.h"
+#include "utils/selfuncs.h"
+
+#define DEFAULT_JSON_CONTAINS_SEL	0.001
+
+/*
+ * jsonGetField
+ *		Given a JSONB document and a key, extract the JSONB value for the key.
+ */
+static inline Datum
+jsonGetField(Datum obj, const char *field)
+{
+	Jsonb 	   *jb = DatumGetJsonbP(obj);
+	JsonbValue *jbv = findJsonbValueFromContainerLen(&jb->root, JB_FOBJECT,
+													 field, strlen(field));
+	return jbv ? JsonbPGetDatum(JsonbValueToJsonb(jbv)) : PointerGetDatum(NULL);
+}
+
+/*
+ * jsonGetFloat4
+ *		Given a JSONB value, interpret it as a float4 value.
+ *
+ * This expects the JSONB value to be a numeric, because that's how we store
+ * floats in JSONB, and we cast it to float4.
+ *
+ * XXX Not sure assert is a sufficient protection against different types of
+ * JSONB values to be passed in.
+ */
+static inline Datum
+jsonGetFloat4(Datum jsonb)
+{
+	Jsonb	   *jb = DatumGetJsonbP(jsonb);
+	JsonbValue	jv;
+
+	JsonbExtractScalar(&jb->root, &jv);
+	Assert(jv.type == jbvNumeric);
+
+	return DirectFunctionCall1(numeric_float4, NumericGetDatum(jv.val.numeric));
+}
+
+/*
+ * jsonStatsInit
+ *		Given a pg_statistic tuple, expand STATISTIC_KIND_JSON into JsonStats.
+ */
+bool
+jsonStatsInit(JsonStats data, const VariableStatData *vardata)
+{
+	Jsonb	   *jb;
+	JsonbValue	prefix;
+
+	memset(&data->attslot, 0, sizeof(data->attslot));
+	data->statsTuple = vardata->statsTuple;
+
+	/* FIXME Could be before the memset, I guess? Checking vardata->statsTuple. */
+	if (!data->statsTuple)
+		return false;
+
+	/* Were there just NULL values in the column? No JSON stats, but still useful. */
+	if (((Form_pg_statistic) GETSTRUCT(data->statsTuple))->stanullfrac >= 1.0)
+	{
+		data->nullfrac = 1.0;
+		return true;
+	}
+
+	/* Do we have the JSON stats built in the pg_statistic? */
+	if (!get_attstatsslot(&data->attslot, data->statsTuple,
+						  STATISTIC_KIND_JSON, InvalidOid,
+						  ATTSTATSSLOT_NUMBERS | ATTSTATSSLOT_VALUES))
+		return false;
+
+	/* XXX Not sure what this means / how could it happen? */
+	if (data->attslot.nvalues < 2)
+	{
+		free_attstatsslot(&data->attslot);
+		return false;
+	}
+
+	/* XXX If the ACL check was not OK, would we even get here? */
+	data->acl_ok = vardata->acl_ok;
+	data->rel = vardata->rel;
+	data->nullfrac =
+		data->attslot.nnumbers > 0 ? data->attslot.numbers[0] : 0.0;
+	data->values = data->attslot.values;
+	data->nvalues = data->attslot.nvalues;
+
+	jb = DatumGetJsonbP(data->values[0]);
+	JsonbExtractScalar(&jb->root, &prefix);
+	Assert(prefix.type == jbvString);
+	data->prefix = prefix.val.string.val;
+	data->prefixlen = prefix.val.string.len;
+
+	return true;
+}
+
+/*
+ * jsonStatsRelease
+ *		Release resources (statistics slot) associated with the JsonStats value.
+ */
+void
+jsonStatsRelease(JsonStats data)
+{
+	free_attstatsslot(&data->attslot);
+}
+
+/*
+ * jsonPathStatsGetSpecialStats
+ *		Extract statistics of given type for JSON path.
+ *
+ * XXX This does not really extract any stats, it merely allocates the struct?
+ */
+static JsonPathStats
+jsonPathStatsGetSpecialStats(JsonPathStats pstats, JsonPathStatsType type)
+{
+	JsonPathStats stats;
+
+	if (!pstats)
+		return NULL;
+
+	stats = palloc(sizeof(*stats));
+	*stats = *pstats;
+	stats->path = memcpy(palloc(stats->pathlen), stats->path, stats->pathlen);
+	stats->type = type;
+
+	return stats;
+}
+
+/*
+ * jsonPathStatsGetLengthStats
+ *		Extract statistics of lengths (for arrays or objects) for the path.
+ */
+JsonPathStats
+jsonPathStatsGetLengthStats(JsonPathStats pstats)
+{
+	/*
+	 * The length statistics is relevant only for values that are objects or
+	 * arrays. So if we observed no such values, we know there can't be such
+	 * statistics and so we simply return NULL.
+	 */
+	if (jsonPathStatsGetTypeFreq(pstats, jbvObject, 0.0) <= 0.0 &&
+		jsonPathStatsGetTypeFreq(pstats, jbvArray, 0.0) <= 0.0)
+		return NULL;
+
+	return jsonPathStatsGetSpecialStats(pstats, JsonPathStatsLength);
+}
+
+/*
+ * jsonPathStatsGetArrayLengthStats
+ *		Extract statistics of lengths for arrays.
+ *
+ * XXX Why doesn't this do jsonPathStatsGetTypeFreq check similar to what
+ * jsonPathStatsGetLengthStats does?
+ */
+static JsonPathStats
+jsonPathStatsGetArrayLengthStats(JsonPathStats pstats)
+{
+	return jsonPathStatsGetSpecialStats(pstats, JsonPathStatsArrayLength);
+}
+
+/*
+ * jsonPathStatsCompare
+ *		Compare two JsonPathStats structs, so that we can sort them.
+ *
+ * We do this so that we can search for stats for a given path simply by
+ * bsearch().
+ *
+ * XXX We never build two structs for the same path, so we know the paths
+ * are different - one may be a prefix of the other, but then we sort the
+ * strings by length.
+ */
+static int
+jsonPathStatsCompare(const void *pv1, const void *pv2)
+{
+	JsonbValue	pathkey;
+	JsonbValue *path2;
+	JsonbValue const *path1 = pv1;
+	/* XXX Seems a bit convoluted to first cast it to Datum, then Jsonb ... */
+	Datum const *pdatum = pv2;
+	Jsonb	   *jsonb = DatumGetJsonbP(*pdatum);
+	int			res;
+
+	/* extract path from the statistics represented as jsonb document */
+	JsonValueInitStringWithLen(&pathkey, "path", 4);
+	path2 = findJsonbValueFromContainer(&jsonb->root, JB_FOBJECT, &pathkey);
+
+	/* XXX Not sure about this? Does empty path mean global stats? */
+	if (!path2 || path2->type != jbvString)
+		return 1;
+
+	/* compare the shared part first, then compare by length */
+	res = strncmp(path1->val.string.val, path2->val.string.val,
+				  Min(path1->val.string.len, path2->val.string.len));
+
+	return res ? res : path1->val.string.len - path2->val.string.len;
+}
+
+/*
+ * jsonStatsFindPathStats
+ *		Find stats for a given path.
+ *
+ * The stats are sorted by path, so we can simply do bsearch().
+ */
+static JsonPathStats
+jsonStatsFindPathStats(JsonStats jsdata, char *path, int pathlen)
+{
+	JsonPathStats stats;
+	JsonbValue	jbvkey;
+	Datum	   *pdatum;
+
+	JsonValueInitStringWithLen(&jbvkey, path, pathlen);
+
+	pdatum = bsearch(&jbvkey, jsdata->values + 1, jsdata->nvalues - 1,
+					 sizeof(*jsdata->values), jsonPathStatsCompare);
+
+	if (!pdatum)
+		return NULL;
+
+	stats = palloc(sizeof(*stats));
+	stats->datum = pdatum;
+	stats->data = jsdata;
+	stats->path = path;
+	stats->pathlen = pathlen;
+	stats->type = JsonPathStatsValues;
+
+	return stats;
+}
+
+/*
+ * jsonStatsGetPathStatsStr
+ *		???
+ *
+ * XXX Seems to do essentially what jsonStatsFindPathStats, except that it also
+ * considers jsdata->prefix. Seems fairly easy to combine those into a single
+ * function.
+ */
+JsonPathStats
+jsonStatsGetPathStatsStr(JsonStats jsdata, const char *subpath, int subpathlen)
+{
+	JsonPathStats stats;
+	char	   *path;
+	int			pathlen;
+
+	if (jsdata->nullfrac >= 1.0)
+		return NULL;
+
+	pathlen = jsdata->prefixlen + subpathlen - 1;
+	path = palloc(pathlen);
+
+	memcpy(path, jsdata->prefix, jsdata->prefixlen);
+	memcpy(&path[jsdata->prefixlen], &subpath[1], subpathlen - 1);
+
+	stats = jsonStatsFindPathStats(jsdata, path, pathlen);
+
+	if (!stats)
+		pfree(path);
+
+	return stats;
+}
+
+/*
+ * jsonPathAppendEntry
+ *		Append entry (represented as simple string) to a path.
+ */
+static void
+jsonPathAppendEntry(StringInfo path, const char *entry)
+{
+	appendStringInfoCharMacro(path, '.');
+	escape_json(path, entry);
+}
+
+/*
+ * jsonPathAppendEntryWithLen
+ *		Append string (represented as string + length) to a path.
+ *
+ * XXX Doesn't this need ecape_json too?
+ */
+static void
+jsonPathAppendEntryWithLen(StringInfo path, const char *entry, int len)
+{
+	char *tmpentry = pnstrdup(entry, len);
+	jsonPathAppendEntry(path, tmpentry);
+	pfree(tmpentry);
+}
+
+/*
+ * jsonPathStatsGetSubpath
+ *		???
+ */
+JsonPathStats
+jsonPathStatsGetSubpath(JsonPathStats pstats, const char *key, int keylen)
+{
+	JsonPathStats spstats;
+	char	   *path;
+	int			pathlen;
+
+	if (key)
+	{
+		StringInfoData str;
+
+		initStringInfo(&str);
+		appendBinaryStringInfo(&str, pstats->path, pstats->pathlen);
+		jsonPathAppendEntryWithLen(&str, key, keylen);
+
+		path = str.data;
+		pathlen = str.len;
+	}
+	else
+	{
+		pathlen = pstats->pathlen + 2;
+		path = palloc(pathlen + 1);
+		snprintf(path, pstats->pathlen + pathlen, "%.*s.#",
+				 pstats->pathlen, pstats->path);
+	}
+
+	spstats = jsonStatsFindPathStats(pstats->data, path, pathlen);
+	if (!spstats)
+		pfree(path);
+
+	return spstats;
+}
+
+/*
+ * jsonPathStatsGetArrayIndexSelectivity
+ *		Given stats for a path, determine selectivity for an array index.
+ */
+Selectivity
+jsonPathStatsGetArrayIndexSelectivity(JsonPathStats pstats, int index)
+{
+	JsonPathStats lenstats = jsonPathStatsGetArrayLengthStats(pstats);
+	JsonbValue	tmpjbv;
+	Jsonb	   *jb;
+
+	/*
+	 * If we have no array length stats, assume all documents match.
+	 *
+	 * XXX Shouldn't this use a default smaller than 1.0? What do the selfuncs
+	 * for regular arrays use?
+	 */
+	if (!lenstats)
+		return 1.0;
+
+	jb = JsonbValueToJsonb(JsonValueInitInteger(&tmpjbv, index));
+
+	/* calculate fraction of elements smaller than the index */
+	return jsonSelectivity(lenstats, JsonbPGetDatum(jb), JsonbGtOperator);
+}
+
+/*
+ * jsonStatsGetPathStats
+ *		???
+ *
+ * XXX I guess pathLen stored number of pathEntries elements, so it should be
+ * nEntries or something. pathLen implies it's a string length.
+ */
+static JsonPathStats
+jsonStatsGetPathStats(JsonStats jsdata, Datum *pathEntries, int pathLen,
+					  float4 *nullfrac)
+{
+	JsonPathStats pstats;
+	Selectivity	sel = 1.0;
+	int			i;
+
+	if (!pathEntries && pathLen < 0)
+	{
+		if ((pstats = jsonStatsGetPathStatsStr(jsdata, "$.#", 3)))
+		{
+			sel = jsonPathStatsGetArrayIndexSelectivity(pstats, -1 - pathLen);
+			sel /= jsonPathStatsGetFreq(pstats, 0.0);
+		}
+	}
+	else
+	{
+		pstats = jsonStatsGetPathStatsStr(jsdata, "$", 1);
+
+		for (i = 0; pstats && i < pathLen; i++)
+		{
+			char	   *key = text_to_cstring(DatumGetTextP(pathEntries[i]));
+			int			keylen = strlen(key);
+
+			/* XXX What's this key "0123456789" about? */
+			if (key[0] >= '0' && key[0] <= '9' &&
+				key[strspn(key, "0123456789")] == '\0')
+			{
+				char	   *tail;
+				long		index;
+
+				errno = 0;
+				index = strtol(key, &tail, 10);
+
+				if (*tail || errno || index > INT_MAX || index < 0)
+					pstats = jsonPathStatsGetSubpath(pstats, key, keylen);
+				else
+				{
+					float4	arrfreq;
+
+					/* FIXME consider key also */
+					pstats = jsonPathStatsGetSubpath(pstats, NULL, 0);
+					sel *= jsonPathStatsGetArrayIndexSelectivity(pstats, index);
+					arrfreq = jsonPathStatsGetFreq(pstats, 0.0);
+
+					if (arrfreq > 0.0)
+						sel /= arrfreq;
+				}
+			}
+			else
+				pstats = jsonPathStatsGetSubpath(pstats, key, keylen);
+
+			pfree(key);
+		}
+	}
+
+	*nullfrac = 1.0 - sel;
+
+	return pstats;
+}
+
+/*
+ * jsonPathStatsGetNextKeyStats
+ *		???
+ */
+bool
+jsonPathStatsGetNextKeyStats(JsonPathStats stats, JsonPathStats *pkeystats,
+							 bool keysOnly)
+{
+	JsonPathStats keystats = *pkeystats;
+	int			index =
+		(keystats ? keystats->datum : stats->datum) - stats->data->values + 1;
+
+	for (; index < stats->data->nvalues; index++)
+	{
+		JsonbValue	pathkey;
+		JsonbValue *jbvpath;
+		Jsonb	   *jb = DatumGetJsonbP(stats->data->values[index]);
+
+		JsonValueInitStringWithLen(&pathkey, "path", 4);
+		jbvpath = findJsonbValueFromContainer(&jb->root, JB_FOBJECT, &pathkey);
+
+		if (jbvpath->type != jbvString ||
+			jbvpath->val.string.len <= stats->pathlen ||
+			memcmp(jbvpath->val.string.val, stats->path, stats->pathlen))
+			break;
+
+		if (keysOnly)
+		{
+			const char *c = &jbvpath->val.string.val[stats->pathlen];
+
+			Assert(*c == '.');
+			c++;
+
+			if (*c == '#')
+			{
+				if (keysOnly || jbvpath->val.string.len > stats->pathlen + 2)
+					continue;
+			}
+			else
+			{
+				Assert(*c == '"');
+
+				while (*++c != '"')
+					if (*c == '\\')
+						c++;
+
+				if (c - jbvpath->val.string.val < jbvpath->val.string.len - 1)
+					continue;
+			}
+		}
+
+		if (!keystats)
+			keystats = palloc(sizeof(*keystats));
+
+		keystats->datum = &stats->data->values[index];
+		keystats->data = stats->data;
+		keystats->pathlen = jbvpath->val.string.len;
+		keystats->path = memcpy(palloc(keystats->pathlen),
+								jbvpath->val.string.val, keystats->pathlen);
+		keystats->type = JsonPathStatsValues;
+
+		*pkeystats = keystats;
+
+		return true;
+	}
+
+	return false;
+}
+
+/*
+ * jsonStatsConvertArray
+ *		Convert a JSONB array into an array of some regular data type.
+ *
+ * The "type" identifies what elements are in the input JSONB array, while
+ * typid determines the target type.
+ */
+static Datum
+jsonStatsConvertArray(Datum jsonbValueArray, JsonStatType type, Oid typid,
+					  float4 multiplier)
+{
+	Datum	   *values;
+	Jsonb	   *jbvals;
+	JsonbValue	jbv;
+	JsonbIterator *it;
+	JsonbIteratorToken r;
+	int			nvalues;
+	int			i;
+
+	if (!DatumGetPointer(jsonbValueArray))
+		return PointerGetDatum(NULL);
+
+	jbvals = DatumGetJsonbP(jsonbValueArray);
+
+	nvalues = JsonContainerSize(&jbvals->root);
+
+	values = palloc(sizeof(Datum) * nvalues);
+
+	for (i = 0, it = JsonbIteratorInit(&jbvals->root);
+		(r = JsonbIteratorNext(&it, &jbv, true)) != WJB_DONE;)
+	{
+		if (r == WJB_ELEM)
+		{
+			Datum value;
+
+			switch (type)
+			{
+				case JsonStatJsonb:
+				case JsonStatJsonbWithoutSubpaths:
+					value = JsonbPGetDatum(JsonbValueToJsonb(&jbv));
+					break;
+
+				case JsonStatText:
+				case JsonStatString:
+					Assert(jbv.type == jbvString);
+					value = PointerGetDatum(
+								cstring_to_text_with_len(jbv.val.string.val,
+														 jbv.val.string.len));
+					break;
+
+				case JsonStatNumeric:
+					Assert(jbv.type == jbvNumeric);
+					value = DirectFunctionCall1(numeric_float4,
+												NumericGetDatum(jbv.val.numeric));
+					value = Float4GetDatum(DatumGetFloat4(value) * multiplier);
+					break;
+
+				default:
+					elog(ERROR, "invalid json stat type %d", type);
+					value = (Datum) 0;
+					break;
+			}
+
+			Assert(i < nvalues);
+			values[i++] = value;
+		}
+	}
+
+	Assert(i == nvalues);
+
+	/*
+	 * FIXME Does this actually work on all 32/64-bit systems? What if typid is
+	 * FLOAT8OID or something? Should look at TypeCache instead, probably.
+	 */
+	return PointerGetDatum(
+			construct_array(values, nvalues,
+							typid,
+							typid == FLOAT4OID ? 4 : -1,
+							typid == FLOAT4OID ? true /* FLOAT4PASSBYVAL */ : false,
+							'i'));
+}
+
+/*
+ * jsonPathStatsExtractData
+ *		Extract pg_statistics values from statistics for a single path.
+ *
+ *
+ */
+static bool
+jsonPathStatsExtractData(JsonPathStats pstats, JsonStatType stattype,
+						 float4 nullfrac, StatsData *statdata)
+{
+	Datum		data;
+	Datum		nullf;
+	Datum		dist;
+	Datum		width;
+	Datum		mcv;
+	Datum		hst;
+	Datum		corr;
+	Oid			type;
+	Oid			eqop;
+	Oid			ltop;
+	const char *key;
+	StatsSlot  *slot = statdata->slots;
+
+	nullfrac = 1.0 - (1.0 - pstats->data->nullfrac) * (1.0 - nullfrac);
+
+	switch (stattype)
+	{
+		case JsonStatJsonb:
+		case JsonStatJsonbWithoutSubpaths:
+			key = pstats->type == JsonPathStatsArrayLength ? "array_length" :
+				  pstats->type == JsonPathStatsLength ? "length" : "json";
+			type = JSONBOID;
+			eqop = JsonbEqOperator;
+			ltop = JsonbLtOperator;
+			break;
+		case JsonStatText:
+			key = "text";
+			type = TEXTOID;
+			eqop = TextEqualOperator;
+			ltop = TextLessOperator;
+			break;
+		case JsonStatString:
+			key = "string";
+			type = TEXTOID;
+			eqop = TextEqualOperator;
+			ltop = TextLessOperator;
+			break;
+		case JsonStatNumeric:
+			key = "numeric";
+			type = NUMERICOID;
+			eqop = NumericEqOperator;
+			ltop = NumericLtOperator;
+			break;
+		default:
+			elog(ERROR, "invalid json statistic type %d", stattype);
+			break;
+	}
+
+	data = jsonGetField(*pstats->datum, key);
+
+	if (!DatumGetPointer(data))
+		return false;
+
+	nullf = jsonGetField(data, "nullfrac");
+	dist = jsonGetField(data, "distinct");
+	width = jsonGetField(data, "width");
+	mcv = jsonGetField(data, "mcv");
+	hst = jsonGetField(data, "histogram");
+	corr = jsonGetField(data, "correlation");
+
+	statdata->nullfrac = DatumGetPointer(nullf) ?
+							DatumGetFloat4(jsonGetFloat4(nullf)) : 0.0;
+	statdata->distinct = DatumGetPointer(dist) ?
+							DatumGetFloat4(jsonGetFloat4(dist)) : 0.0;
+	statdata->width = DatumGetPointer(width) ?
+							(int32) DatumGetFloat4(jsonGetFloat4(width)) : 0;
+
+	statdata->nullfrac += (1.0 - statdata->nullfrac) * nullfrac;
+
+	if (DatumGetPointer(mcv))
+	{
+		slot->kind = STATISTIC_KIND_MCV;
+		slot->opid = eqop;
+		slot->numbers = jsonStatsConvertArray(jsonGetField(mcv, "numbers"),
+											  JsonStatNumeric, FLOAT4OID,
+											  1.0 - nullfrac);
+		slot->values  = jsonStatsConvertArray(jsonGetField(mcv, "values"),
+											  stattype, type, 0);
+		slot++;
+	}
+
+	if (DatumGetPointer(hst))
+	{
+		slot->kind = STATISTIC_KIND_HISTOGRAM;
+		slot->opid = ltop;
+		slot->numbers = jsonStatsConvertArray(jsonGetField(hst, "numbers"),
+											  JsonStatNumeric, FLOAT4OID, 1.0);
+		slot->values  = jsonStatsConvertArray(jsonGetField(hst, "values"),
+											  stattype, type, 0);
+		slot++;
+	}
+
+	if (DatumGetPointer(corr))
+	{
+		Datum	correlation = jsonGetFloat4(corr);
+		slot->kind = STATISTIC_KIND_CORRELATION;
+		slot->opid = ltop;
+		slot->numbers = PointerGetDatum(construct_array(&correlation, 1,
+														FLOAT4OID, 4, true,
+														'i'));
+		slot++;
+	}
+
+	if ((stattype == JsonStatJsonb ||
+		 stattype == JsonStatJsonbWithoutSubpaths) &&
+		jsonAnalyzeBuildSubPathsData(pstats->data->values,
+									 pstats->data->nvalues,
+									 pstats->datum - pstats->data->values,
+									 pstats->path,
+									 pstats->pathlen,
+									 stattype == JsonStatJsonb,
+									 nullfrac,
+									 &slot->values,
+									 &slot->numbers))
+	{
+		slot->kind = STATISTIC_KIND_JSON;
+		slot++;
+	}
+
+	return true;
+}
+
+static float4
+jsonPathStatsGetFloat(JsonPathStats pstats, const char *key,
+					float4 defaultval)
+{
+	Datum		freq;
+
+	if (!pstats || !(freq = jsonGetField(*pstats->datum, key)))
+		return defaultval;
+
+	return DatumGetFloat4(jsonGetFloat4(freq));
+}
+
+float4
+jsonPathStatsGetFreq(JsonPathStats pstats, float4 defaultfreq)
+{
+	return jsonPathStatsGetFloat(pstats, "freq", defaultfreq);
+}
+
+float4
+jsonPathStatsGetAvgArraySize(JsonPathStats pstats)
+{
+	return jsonPathStatsGetFloat(pstats, "avg_array_length", 1.0);
+}
+
+/*
+ * jsonPathStatsGetTypeFreq
+ *		Get frequency of different JSON object types for a given path.
+ *
+ * JSON documents don't have any particular schema, and the same path may point
+ * to values with different types in multiple documents. Consider for example
+ * two documents {"a" : "b"} and {"a" : 100} which have both a string and int
+ * for the same path. So we track the frequency of different JSON types for
+ * each path, so that we can consider this later.
+ */
+float4
+jsonPathStatsGetTypeFreq(JsonPathStats pstats, JsonbValueType type,
+						 float4 defaultfreq)
+{
+	const char *key;
+
+	if (!pstats)
+		return defaultfreq;
+
+	/*
+	 * When dealing with (object/array) length stats, we only really care about
+	 * objects and arrays.
+	 */
+	if (pstats->type == JsonPathStatsLength ||
+		pstats->type == JsonPathStatsArrayLength)
+	{
+		/* XXX Seems more like an error, no? Why ignore it? */
+		if (type != jbvNumeric)
+			return 0.0;
+
+		/* FIXME This is really hard to read/understand, with two nested ternary operators. */
+		return pstats->type == JsonPathStatsArrayLength
+				? jsonPathStatsGetFreq(pstats, defaultfreq)
+				: jsonPathStatsGetFloat(pstats, "freq_array", defaultfreq) +
+				  jsonPathStatsGetFloat(pstats, "freq_object", defaultfreq);
+	}
+
+	/* Which JSON type are we interested in? Pick the right freq_type key. */
+	switch (type)
+	{
+		case jbvNull:
+			key = "freq_null";
+			break;
+		case jbvString:
+			key = "freq_string";
+			break;
+		case jbvNumeric:
+			key = "freq_numeric";
+			break;
+		case jbvBool:
+			key = "freq_boolean";
+			break;
+		case jbvObject:
+			key = "freq_object";
+			break;
+		case jbvArray:
+			key = "freq_array";
+			break;
+		default:
+			elog(ERROR, "Invalid jsonb value type: %d", type);
+			break;
+	}
+
+	return jsonPathStatsGetFloat(pstats, key, defaultfreq);
+}
+
+/*
+ * jsonPathStatsFormTuple
+ *		For a pg_statistic tuple representing JSON statistics.
+ *
+ * XXX Maybe it's a bit expensive to first build StatsData and then transform it
+ * again while building the tuple. Could it be done in a single step? Would it be
+ * more efficient? Not sure how expensive it actually is, though.
+ */
+static HeapTuple
+jsonPathStatsFormTuple(JsonPathStats pstats, JsonStatType type, float4 nullfrac)
+{
+	StatsData	statdata;
+
+	if (!pstats || !pstats->datum)
+		return NULL;
+
+	/* FIXME What does this mean? */
+	if (pstats->datum == &pstats->data->values[1] &&
+		pstats->type == JsonPathStatsValues)
+		return heap_copytuple(pstats->data->statsTuple);
+
+	MemSet(&statdata, 0, sizeof(statdata));
+
+	if (!jsonPathStatsExtractData(pstats, type, nullfrac, &statdata))
+		return NULL;
+
+	return stats_form_tuple(&statdata);
+}
+
+/*
+ * jsonStatsGetPathStatsTuple
+ *		???
+ */
+static HeapTuple
+jsonStatsGetPathStatsTuple(JsonStats jsdata, JsonStatType type,
+						   Datum *path, int pathlen)
+{
+	float4			nullfrac;
+	JsonPathStats	pstats = jsonStatsGetPathStats(jsdata, path, pathlen,
+												   &nullfrac);
+
+	return jsonPathStatsFormTuple(pstats, type, nullfrac);
+}
+
+/*
+ * jsonStatsGetPathFreq
+ *		Return frequency of a path (fraction of documents containing it).
+ */
+static float4
+jsonStatsGetPathFreq(JsonStats jsdata, Datum *path, int pathlen)
+{
+	float4			nullfrac;
+	JsonPathStats	pstats = jsonStatsGetPathStats(jsdata, path, pathlen,
+												   &nullfrac);
+	float4			freq = (1.0 - nullfrac) * jsonPathStatsGetFreq(pstats, 0.0);
+
+	CLAMP_PROBABILITY(freq);
+	return freq;
+}
+
+/*
+ * jsonbStatsVarOpConst
+ *		Prepare optimizer statistics for a given operator, from JSON stats.
+ *
+ * This handles only OpExpr expressions, with variable and a constant. We get
+ * the constant as is, and the variable is represented by statistics fetched
+ * by get_restriction_variable().
+ *
+ * opid    - OID of the operator (input parameter)
+ * resdata - pointer to calculated statistics for result of operator
+ * vardata - statistics for the restriction variable
+ * cnst    - constant from the operator expression
+ *
+ * Returns true when useful optimizer statistics have been calculated.
+ */
+static bool
+jsonbStatsVarOpConst(Oid opid, VariableStatData *resdata,
+					 const VariableStatData *vardata, Const *cnst)
+{
+	JsonStatData jsdata;
+	JsonStatType statype = JsonStatJsonb;
+
+	if (!jsonStatsInit(&jsdata, vardata))
+		return false;
+
+	switch (opid)
+	{
+		case JsonbObjectFieldTextOperator:
+			statype = JsonStatText;
+			/* fall through */
+		case JsonbObjectFieldOperator:
+		{
+			if (cnst->consttype != TEXTOID)
+			{
+				jsonStatsRelease(&jsdata);
+				return false;
+			}
+
+			resdata->statsTuple =
+				jsonStatsGetPathStatsTuple(&jsdata, statype,
+										  &cnst->constvalue, 1);
+			break;
+		}
+
+		case JsonbArrayElementTextOperator:
+			statype = JsonStatText;
+			/* fall through */
+		case JsonbArrayElementOperator:
+		{
+			if (cnst->consttype != INT4OID)
+			{
+				jsonStatsRelease(&jsdata);
+				return false;
+			}
+
+			resdata->statsTuple =
+				jsonStatsGetPathStatsTuple(&jsdata, statype, NULL,
+										   -1 - DatumGetInt32(cnst->constvalue));
+			break;
+		}
+
+		case JsonbExtractPathTextOperator:
+			statype = JsonStatText;
+			/* fall through */
+		case JsonbExtractPathOperator:
+		{
+			Datum	   *path;
+			bool	   *nulls;
+			int			pathlen;
+			int			i;
+
+			if (cnst->consttype != TEXTARRAYOID)
+			{
+				jsonStatsRelease(&jsdata);
+				return false;
+			}
+
+			deconstruct_array(DatumGetArrayTypeP(cnst->constvalue), TEXTOID,
+							  -1, false, 'i', &path, &nulls, &pathlen);
+
+			for (i = 0; i < pathlen; i++)
+			{
+				if (nulls[i])
+				{
+					pfree(path);
+					pfree(nulls);
+					PG_RETURN_VOID();
+				}
+			}
+
+			resdata->statsTuple =
+				jsonStatsGetPathStatsTuple(&jsdata, statype, path, pathlen);
+
+			pfree(path);
+			pfree(nulls);
+			break;
+		}
+
+		default:
+			jsonStatsRelease(&jsdata);
+			return false;
+	}
+
+	if (!resdata->statsTuple)
+		resdata->statsTuple = stats_form_tuple(NULL);	/* form all-NULL tuple */
+
+	resdata->acl_ok = vardata->acl_ok;
+	resdata->freefunc = heap_freetuple;
+	Assert(resdata->rel == vardata->rel);
+	Assert(resdata->atttype ==
+		(statype == JsonStatJsonb ? JSONBOID :
+		 statype == JsonStatText ? TEXTOID :
+		 /* statype == JsonStatFreq */ BOOLOID));
+
+	jsonStatsRelease(&jsdata);
+	return true;
+}
+
+/*
+ * jsonb_stats
+ *		Statistics estimation procedure for JSONB data type.
+ *
+ * This only supports OpExpr expressions, with (Var op Const) shape.
+ *
+ * XXX It might be useful to allow recursion, i.e. get_restriction_variable
+ * might derive statistics too. I don't think it does that now, right?
+ */
+Datum
+jsonb_stats(PG_FUNCTION_ARGS)
+{
+	PlannerInfo *root = (PlannerInfo *) PG_GETARG_POINTER(0);
+	OpExpr	   *opexpr = (OpExpr *) PG_GETARG_POINTER(1);
+	int			varRelid = PG_GETARG_INT32(2);
+	VariableStatData *resdata	= (VariableStatData *) PG_GETARG_POINTER(3);
+	VariableStatData vardata;
+	Node	   *constexpr;
+	bool		result;
+	bool		varonleft;
+
+	/* should only be called for OpExpr expressions */
+	Assert(IsA(opexpr, OpExpr));
+
+	/* Is the expression simple enough? (Var op Const) or similar? */
+	if (!get_restriction_variable(root, opexpr->args, varRelid,
+								  &vardata, &constexpr, &varonleft))
+		return false;
+
+	/* XXX Could we also get varonleft=false in useful cases? */
+	result = IsA(constexpr, Const) && varonleft &&
+		jsonbStatsVarOpConst(opexpr->opno, resdata, &vardata,
+							 (Const *) constexpr);
+
+	ReleaseVariableStats(vardata);
+
+	return result;
+}
+
+/*
+ * jsonSelectivity
+ *		Use JSON statistics to estimate selectivity for (in)equalities.
+ *
+ * The statistics is represented as (arrays of) JSON values etc. so we
+ * need to pass the right operators to the functions.
+ */
+Selectivity
+jsonSelectivity(JsonPathStats stats, Datum scalar, Oid operator)
+{
+	VariableStatData vardata;
+	Selectivity sel;
+
+	if (!stats)
+		return 0.0;
+
+	vardata.atttype = JSONBOID;
+	vardata.atttypmod = -1;
+	vardata.isunique = false;
+	vardata.rel = stats->data->rel;
+	vardata.var = NULL;
+	vardata.vartype = JSONBOID;
+	vardata.acl_ok = stats->data->acl_ok;
+	vardata.statsTuple = jsonPathStatsFormTuple(stats,
+												JsonStatJsonbWithoutSubpaths, 0.0);
+
+	if (operator == JsonbEqOperator)
+		sel = var_eq_const(&vardata, operator, InvalidOid, scalar, false, true, false);
+	else
+		sel = scalarineqsel(NULL, operator,
+							operator == JsonbGtOperator ||
+							operator == JsonbGeOperator,
+							operator == JsonbLeOperator ||
+							operator == JsonbGeOperator,
+							InvalidOid,
+							&vardata, scalar, JSONBOID);
+
+	if (vardata.statsTuple)
+		heap_freetuple(vardata.statsTuple);
+
+	return sel;
+}
+
+/*
+ * jsonSelectivityContains
+ *		Estimate selectivity for containment operator on JSON.
+ *
+ * XXX This really needs more comments explaining the logic.
+ */
+static Selectivity
+jsonSelectivityContains(JsonStats stats, Jsonb *jb)
+{
+	JsonbValue		v;
+	JsonbIterator  *it;
+	JsonbIteratorToken r;
+	StringInfoData	pathstr;
+	struct Path
+	{
+		struct Path *parent;
+		int			len;
+		JsonPathStats stats;
+		Selectivity	freq;
+		Selectivity	sel;
+	}			root,
+			   *path = &root;
+	Selectivity	scalarSel = 0.0;
+	Selectivity	sel;
+	bool		rawScalar = false;
+
+	initStringInfo(&pathstr);
+
+	appendStringInfo(&pathstr, "$");
+
+	root.parent = NULL;
+	root.len = pathstr.len;
+	root.stats = jsonStatsGetPathStatsStr(stats, pathstr.data, pathstr.len);
+	root.freq = jsonPathStatsGetFreq(root.stats, 0.0);
+	root.sel = 1.0;
+
+	if (root.freq <= 0.0)
+		return 0.0;
+
+	it = JsonbIteratorInit(&jb->root);
+
+	while ((r = JsonbIteratorNext(&it, &v, false)) != WJB_DONE)
+	{
+		switch (r)
+		{
+			case WJB_BEGIN_OBJECT:
+			{
+				struct Path *p = palloc(sizeof(*p));
+
+				p->len = pathstr.len;
+				p->parent = path;
+				p->stats = NULL;
+				p->freq = jsonPathStatsGetTypeFreq(path->stats, jbvObject, 0.0);
+				if (p->freq <= 0.0)
+					return 0.0;
+				p->sel = 1.0;
+				path = p;
+				break;
+			}
+
+			case WJB_BEGIN_ARRAY:
+			{
+				struct Path *p = palloc(sizeof(*p));
+
+				rawScalar = v.val.array.rawScalar;
+
+				appendStringInfo(&pathstr, ".#");
+				p->len = pathstr.len;
+				p->parent = path;
+				p->stats = jsonStatsGetPathStatsStr(stats, pathstr.data,
+													pathstr.len);
+				p->freq = jsonPathStatsGetFreq(p->stats, 0.0);
+				if (p->freq <= 0.0 && !rawScalar)
+					return 0.0;
+				p->sel = 1.0;
+				path = p;
+
+				break;
+			}
+
+			case WJB_END_OBJECT:
+			case WJB_END_ARRAY:
+			{
+				struct Path *p = path;
+
+				path = path->parent;
+				sel = p->sel * p->freq / path->freq;
+				pfree(p);
+				pathstr.data[pathstr.len = path->len] = '\0';
+				if (pathstr.data[pathstr.len - 1] == '#')
+					sel = 1.0 - pow(1.0 - sel,
+								jsonPathStatsGetAvgArraySize(path->stats));
+				path->sel *= sel;
+				break;
+			}
+
+			case WJB_KEY:
+			{
+				pathstr.data[pathstr.len = path->parent->len] = '\0';
+				jsonPathAppendEntryWithLen(&pathstr, v.val.string.val,
+										   v.val.string.len);
+				path->len = pathstr.len;
+				break;
+			}
+
+			case WJB_VALUE:
+			case WJB_ELEM:
+			{
+				JsonPathStats	pstats = r == WJB_ELEM ? path->stats :
+					jsonStatsGetPathStatsStr(stats, pathstr.data, pathstr.len);
+				Datum			scalar = JsonbPGetDatum(JsonbValueToJsonb(&v));
+
+				if (path->freq <= 0.0)
+					sel = 0.0;
+				else
+				{
+					sel = jsonSelectivity(pstats, scalar, JsonbEqOperator);
+					sel /= path->freq;
+					if (pathstr.data[pathstr.len - 1] == '#')
+						sel = 1.0 - pow(1.0 - sel,
+										jsonPathStatsGetAvgArraySize(path->stats));
+				}
+
+				path->sel *= sel;
+
+				if (r == WJB_ELEM && path->parent == &root && rawScalar)
+					scalarSel = jsonSelectivity(root.stats, scalar,
+												JsonbEqOperator);
+				break;
+			}
+
+			default:
+				break;
+		}
+	}
+
+	sel = scalarSel + root.sel * root.freq;
+	CLAMP_PROBABILITY(sel);
+	return sel;
+}
+
+/*
+ * jsonSelectivityExists
+ *		Estimate selectivity for JSON "exists" operator.
+ */
+static Selectivity
+jsonSelectivityExists(JsonStats stats, Datum key)
+{
+	JsonPathStats arrstats;
+	JsonbValue	jbvkey;
+	Datum		jbkey;
+	Selectivity keysel;
+	Selectivity scalarsel;
+	Selectivity arraysel;
+	Selectivity sel;
+
+	JsonValueInitStringWithLen(&jbvkey,
+							   VARDATA_ANY(key), VARSIZE_ANY_EXHDR(key));
+
+	jbkey = JsonbPGetDatum(JsonbValueToJsonb(&jbvkey));
+
+	keysel = jsonStatsGetPathFreq(stats, &key, 1);
+
+	scalarsel = jsonSelectivity(jsonStatsGetPathStatsStr(stats, "$", 1),
+								jbkey, JsonbEqOperator);
+
+	arrstats = jsonStatsGetPathStatsStr(stats, "$.#", 3);
+	arraysel = jsonSelectivity(arrstats, jbkey, JsonbEqOperator);
+	arraysel = 1.0 - pow(1.0 - arraysel,
+						 jsonPathStatsGetAvgArraySize(arrstats));
+
+	sel = keysel + scalarsel + arraysel;
+	CLAMP_PROBABILITY(sel);
+	return sel;
+}
+
+/*
+ * jsonb_sel
+ *		The main procedure estimating selectivity for all JSONB operators.
+ */
+Datum
+jsonb_sel(PG_FUNCTION_ARGS)
+{
+	PlannerInfo *root = (PlannerInfo *) PG_GETARG_POINTER(0);
+	Oid			operator = PG_GETARG_OID(1);
+	List	   *args = (List *) PG_GETARG_POINTER(2);
+	int			varRelid = PG_GETARG_INT32(3);
+	double		sel = DEFAULT_JSON_CONTAINS_SEL;
+	Node	   *other;
+	Const	   *cnst;
+	bool		varonleft;
+	JsonStatData stats;
+	VariableStatData vardata;
+
+	if (!get_restriction_variable(root, args, varRelid,
+								  &vardata, &other, &varonleft))
+		PG_RETURN_FLOAT8(sel);
+
+	if (!IsA(other, Const))
+		goto out;
+
+	cnst = (Const *) other;
+
+	if (cnst->constisnull)
+	{
+		sel = 0.0;
+		goto out;
+	}
+
+	if (!jsonStatsInit(&stats, &vardata))
+		goto out;
+
+	switch (operator)
+	{
+		case JsonbExistsOperator:
+			if (!varonleft || cnst->consttype != TEXTOID)
+				goto out;
+
+			sel = jsonSelectivityExists(&stats, cnst->constvalue);
+			break;
+
+		case JsonbExistsAnyOperator:
+		case JsonbExistsAllOperator:
+		{
+			Datum	   *keys;
+			bool	   *nulls;
+			Selectivity	freq = 1.0;
+			int			nkeys;
+			int			i;
+			bool		all = operator == JsonbExistsAllOperator;
+
+			if (!varonleft || cnst->consttype != TEXTARRAYOID)
+				goto out;
+
+			deconstruct_array(DatumGetArrayTypeP(cnst->constvalue), TEXTOID,
+							  -1, false, 'i', &keys, &nulls, &nkeys);
+
+			for (i = 0; i < nkeys; i++)
+				if (!nulls[i])
+				{
+					Selectivity pathfreq = jsonSelectivityExists(&stats,
+																 keys[i]);
+					freq *= all ? pathfreq : (1.0 - pathfreq);
+				}
+
+			pfree(keys);
+			pfree(nulls);
+
+			if (!all)
+				freq = 1.0 - freq;
+
+			sel = freq;
+			break;
+		}
+
+		case JsonbContainedOperator:
+			/* TODO */
+			break;
+
+		case JsonbContainsOperator:
+		{
+			if (cnst->consttype != JSONBOID)
+				goto out;
+
+			sel = jsonSelectivityContains(&stats,
+										  DatumGetJsonbP(cnst->constvalue));
+			break;
+		}
+	}
+
+out:
+	jsonStatsRelease(&stats);
+	ReleaseVariableStats(vardata);
+
+	PG_RETURN_FLOAT8((float8) sel);
+}
diff --git a/src/backend/utils/adt/jsonb_typanalyze.c b/src/backend/utils/adt/jsonb_typanalyze.c
new file mode 100644
index 00000000000..d2fbfbf42db
--- /dev/null
+++ b/src/backend/utils/adt/jsonb_typanalyze.c
@@ -0,0 +1,1379 @@
+/*-------------------------------------------------------------------------
+ *
+ * jsonb_typanalyze.c
+ *	  Functions for gathering statistics from jsonb columns
+ *
+ * Copyright (c) 2016, PostgreSQL Global Development Group
+ *
+ * Functions in this module are used to analyze contents of JSONB columns
+ * and build optimizer statistics. In principle we extract paths from all
+ * sampled documents and calculate the usual statistics (MCV, histogram)
+ * for each path - in principle each path is treated as a column.
+ *
+ * Because we're not enforcing any JSON schema, the documents may differ
+ * a lot - the documents may contain large number of different keys, the
+ * types of values may be entirely different, etc. This makes it more
+ * challenging than building stats for regular columns. For example not
+ * only do we need to decide which values to keep in the MCV, but also
+ * which paths to keep (in case the documents are so variable we can't
+ * keep all paths).
+ *
+ * The statistics is stored in pg_statistic, in a slot with a new stakind
+ * value (STATISTIC_KIND_JSON). The statistics is serialized as an array
+ * of JSONB values, eash element storing statistics for one path.
+ *
+ * For each path, we store the following keys:
+ *
+ * - path         - path this stats is for, serialized as jsonpath
+ * - freq         - frequency of documents containing this path
+ * - json         - the regular per-column stats (MCV, histogram, ...)
+ * - freq_null    - frequency of JSON null values
+ * - freq_array   - frequency of JSON array values
+ * - freq_object  - frequency of JSON object values
+ * - freq_string  - frequency of JSON string values
+ * - freq_numeric - frequency of JSON numeric values
+ *
+ * This is stored in the stavalues array.
+ *
+ * The per-column stats (stored in the "json" key) have additional internal
+ * structure, to allow storing multiple stakind types (histogram, mcv). See
+ * jsonAnalyzeMakeScalarStats for details.
+ *
+ *
+ * XXX There's additional stuff (prefix, length stats) stored in the first
+ * two elements, I think.
+ *
+ * XXX It's a bit weird the "regular" stats are stored in the "json" key,
+ * while the JSON stats (frequencies of different JSON types) are right
+ * at the top level.
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/adt/jsonb_typanalyze.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "fmgr.h"
+#include "access/hash.h"
+#include "access/detoast.h"
+#include "catalog/pg_attribute.h"
+#include "catalog/pg_operator.h"
+#include "catalog/pg_type.h"
+#include "commands/vacuum.h"
+#include "utils/builtins.h"
+#include "utils/guc.h"
+#include "utils/hsearch.h"
+#include "utils/json.h"
+#include "utils/jsonb.h"
+#include "utils/json_selfuncs.h"
+#include "utils/lsyscache.h"
+#include "utils/memutils.h"
+
+typedef struct JsonPathEntry JsonPathEntry;
+
+/*
+ * Element of a path in the JSON document (i.e. not jsonpath). Elements
+ * are linked together to build longer paths.
+ *
+ * XXX We need entry+lenth because JSON path elements may contain null
+ * bytes, I guess?
+ */
+struct JsonPathEntry
+{
+	JsonPathEntry  *parent;
+	const char	   *entry;		/* element of the path as a string */
+	int				len;		/* length of entry string (may be 0) */
+	uint32			hash;		/* hash of the whole path (with parent) */
+};
+
+/*
+ * A path is simply a pointer to the last element (we can traverse to
+ * the top easily).
+ */
+typedef JsonPathEntry *JsonPath;
+
+/* An array containing a dynamic number of JSON values. */
+typedef struct JsonValues
+{
+	Datum	   *buf;
+	int			count;
+	int			allocated;
+} JsonValues;
+
+/*
+ * Scalar statistics built for an array of values, extracted from a JSON
+ * document (for one particular path).
+ *
+ * XXX The array can contain values of different JSON type, probably?
+ */
+typedef struct JsonScalarStats
+{
+	JsonValues		values;
+	VacAttrStats	stats;
+} JsonScalarStats;
+
+/*
+ * Statistics calculated for a set of values.
+ *
+ *
+ * XXX This seems rather complicated and needs simplification. We're not
+ * really using all the various JsonScalarStats bits, there's a lot of
+ * duplication (e.g. each JsonScalarStats contains it's own array, which
+ * has a copy of data from the one in "jsons"). Some of it is defined as
+ * a typedef, but booleans have inline struct.
+ */
+typedef struct JsonValueStats
+{
+	JsonScalarStats	jsons;		/* stats for all JSON types together */
+
+	/* XXX used only with JSON_ANALYZE_SCALARS defined */
+	JsonScalarStats	strings;	/* stats for JSON strings */
+	JsonScalarStats	numerics;	/* stats for JSON numerics */
+
+	/* stats for booleans */
+	struct
+	{
+		int		ntrue;
+		int		nfalse;
+	}				booleans;
+
+	int				nnulls;		/* number of JSON null values */
+	int				nobjects;	/* number of JSON objects */
+	int				narrays;	/* number of JSON arrays */
+
+	JsonScalarStats	lens;		/* stats of object lengths */
+	JsonScalarStats	arrlens;	/* stats of array lengths */
+} JsonValueStats;
+
+/* ??? */
+typedef struct JsonPathAnlStats
+{
+	JsonPathEntry		path;
+	JsonValueStats		vstats;
+	Jsonb			   *stats;
+	char			   *pathstr;
+	double				freq;
+	int					depth;
+
+	JsonPathEntry	  **entries;
+	int					nentries;
+} JsonPathAnlStats;
+
+/* various bits needed while analyzing JSON */
+typedef struct JsonAnalyzeContext
+{
+	VacAttrStats		   *stats;
+	MemoryContext			mcxt;
+	AnalyzeAttrFetchFunc	fetchfunc;
+	HTAB				   *pathshash;
+	JsonPathAnlStats	   *root;
+	JsonPathAnlStats	  **paths;
+	int						npaths;
+	double					totalrows;
+	double					total_width;
+	int						samplerows;
+	int						target;
+	int						null_cnt;
+	int						analyzed_cnt;
+	int						maxdepth;
+	bool					scalarsOnly;
+} JsonAnalyzeContext;
+
+/*
+ * JsonPathMatch
+ *		Determine when two JSON paths (list of JsonPathEntry) match.
+ *
+ * XXX Sould be JsonPathEntryMatch as it deals with JsonPathEntry nodes
+ * not whole paths, no?
+ *
+ * XXX Seems a bit silly to return int, when the return statement only
+ * really returns bool (because of how it compares paths). It's not really
+ * a comparator for sorting, for example.
+ */
+static int
+JsonPathMatch(const void *key1, const void *key2, Size keysize)
+{
+	const JsonPathEntry *path1 = key1;
+	const JsonPathEntry *path2 = key2;
+
+	return path1->parent != path2->parent ||
+		   path1->len != path2->len ||
+		   (path1->len > 0 &&
+			strncmp(path1->entry, path2->entry, path1->len));
+}
+
+/*
+ * JsonPathHash
+ *		Calculate hash of the path entry.
+ *
+ * XXX Again, maybe JsonPathEntryHash would be a better name?
+ *
+ * XXX Maybe should call JsonPathHash on the parent, instead of looking
+ * at the field directly. Could easily happen we have not calculated it
+ * yet, I guess.
+ */
+static uint32
+JsonPathHash(const void *key, Size keysize)
+{
+	const JsonPathEntry	   *path = key;
+
+	/* XXX Call JsonPathHash instead of direct access? */
+	uint32					hash = path->parent ? path->parent->hash : 0;
+
+	hash = (hash << 1) | (hash >> 31);
+	hash ^= path->len < 0 ? 0 : DatumGetUInt32(
+					hash_any((const unsigned char *) path->entry, path->len));
+
+	return hash;
+}
+
+/*
+ * jsonAnalyzeAddPath
+ *		Add an entry for a JSON path to the working list of statistics.
+ *
+ * Returns a pointer to JsonPathAnlStats (which might have already existed
+ * if the path was in earlier document), which can then be populated or
+ * updated.
+ */
+static inline JsonPathAnlStats *
+jsonAnalyzeAddPath(JsonAnalyzeContext *ctx, JsonPath path)
+{
+	JsonPathAnlStats   *stats;
+	bool				found;
+
+	path->hash = JsonPathHash(path, 0);
+
+	/* XXX See if we already saw this path earlier. */
+	stats = hash_search_with_hash_value(ctx->pathshash, path, path->hash,
+										HASH_ENTER, &found);
+
+	/*
+	 * Nope, it's the first time we see this path, so initialize all the
+	 * fields (path string, counters, ...).
+	 */
+	if (!found)
+	{
+		JsonPathAnlStats   *parent = (JsonPathAnlStats *) stats->path.parent;
+		const char		   *ppath = parent->pathstr;
+
+		path = &stats->path;
+
+		/* Is it valid path? If not, we treat it as $.# */
+		if (path->len >= 0)
+		{
+			StringInfoData	si;
+			MemoryContext	oldcxt = MemoryContextSwitchTo(ctx->mcxt);
+
+			initStringInfo(&si);
+
+			path->entry = pnstrdup(path->entry, path->len);
+
+			appendStringInfo(&si, "%s.", ppath);
+			escape_json(&si, path->entry);
+
+			stats->pathstr = si.data;
+
+			MemoryContextSwitchTo(oldcxt);
+		}
+		else
+		{
+			int pathstrlen = strlen(ppath) + 3;
+			stats->pathstr = MemoryContextAlloc(ctx->mcxt, pathstrlen);
+			snprintf(stats->pathstr, pathstrlen, "%s.#", ppath);
+		}
+
+		/* initialize the stats counter for this path entry */
+		memset(&stats->vstats, 0, sizeof(JsonValueStats));
+		stats->stats = NULL;
+		stats->freq = 0.0;
+		stats->depth = parent->depth + 1;
+		stats->entries = NULL;
+		stats->nentries = 0;
+
+		/* XXX Seems strange. Should we even add the path in this case? */
+		if (stats->depth > ctx->maxdepth)
+			ctx->maxdepth = stats->depth;
+	}
+
+	return stats;
+}
+
+/*
+ * JsonValuesAppend
+ *		Add a JSON value to the dynamic array (enlarge it if needed).
+ *
+ * XXX This is likely one of the problems - the documents may be pretty
+ * large, with a lot of different values for each path. At that point
+ * it's problematic to keep all of that in memory at once. So maybe we
+ * need to introduce some sort of compaction (e.g. we could try
+ * deduplicating the values), limit on size of the array or something.
+ */
+static inline void
+JsonValuesAppend(JsonValues *values, Datum value, int initialSize)
+{
+	if (values->count >= values->allocated)
+	{
+		if (values->allocated)
+		{
+			values->allocated = values->allocated * 2;
+			values->buf = repalloc(values->buf,
+									sizeof(values->buf[0]) * values->allocated);
+		}
+		else
+		{
+			values->allocated = initialSize;
+			values->buf = palloc(sizeof(values->buf[0]) * values->allocated);
+		}
+	}
+
+	values->buf[values->count++] = value;
+}
+
+/*
+ * jsonAnalyzeJsonValue
+ *		Process a value extracted from the document (for a given path).
+ */
+static inline void
+jsonAnalyzeJsonValue(JsonAnalyzeContext *ctx, JsonValueStats *vstats,
+					 JsonbValue *jv)
+{
+	JsonScalarStats	   *sstats;
+	JsonbValue		   *jbv;
+	JsonbValue			jbvtmp;
+	Datum				value;
+
+	/* ??? */
+	if (ctx->scalarsOnly && jv->type == jbvBinary)
+	{
+		if (JsonContainerIsObject(jv->val.binary.data))
+			jbv = JsonValueInitObject(&jbvtmp, 0, 0);
+		else
+		{
+			Assert(JsonContainerIsArray(jv->val.binary.data));
+			jbv = JsonValueInitArray(&jbvtmp, 0, 0, false);
+		}
+	}
+	else
+		jbv = jv;
+
+	/* always add it to the "global" JSON stats, shared by all types */
+	JsonValuesAppend(&vstats->jsons.values,
+					 JsonbPGetDatum(JsonbValueToJsonb(jbv)),
+					 ctx->target);
+
+	/*
+	 * Maybe also update the type-specific counters.
+	 *
+	 * XXX The mix of break/return statements in this block is really
+	 * confusing.
+	 */
+	switch (jv->type)
+	{
+		case jbvNull:
+			++vstats->nnulls;
+			return;
+
+		case jbvBool:
+			++*(jv->val.boolean ? &vstats->booleans.ntrue
+								: &vstats->booleans.nfalse);
+			return;
+
+		case jbvString:
+#ifdef JSON_ANALYZE_SCALARS
+			sstats = &vstats->strings;
+			value = PointerGetDatum(
+						cstring_to_text_with_len(jv->val.string.val,
+												 jv->val.string.len));
+			break;
+#else
+			return;
+#endif
+
+		case jbvNumeric:
+#ifdef JSON_ANALYZE_SCALARS
+			sstats = &vstats->numerics;
+			value = PointerGetDatum(jv->val.numeric);
+			break;
+#else
+			return;
+#endif
+
+		case jbvBinary:
+			if (JsonContainerIsObject(jv->val.binary.data))
+			{
+				uint32 size = JsonContainerSize(jv->val.binary.data);
+				value = DatumGetInt32(size);
+				sstats = &vstats->lens;
+				vstats->nobjects++;
+				break;
+			}
+			else if (JsonContainerIsArray(jv->val.binary.data))
+			{
+				uint32 size = JsonContainerSize(jv->val.binary.data);
+				value = DatumGetInt32(size);
+				sstats = &vstats->lens;
+				vstats->narrays++;
+				JsonValuesAppend(&vstats->arrlens.values, value, ctx->target);
+				break;
+			}
+			return;
+
+		default:
+			elog(ERROR, "invalid scalar json value type %d", jv->type);
+			break;
+	}
+
+	JsonValuesAppend(&sstats->values, value, ctx->target);
+}
+
+/*
+ * jsonAnalyzeJson
+ *		Parse the JSON document and build/update stats.
+ *
+ * XXX The name seems a bit weird, with the two json bits.
+ *
+ * XXX The param is either NULL, (char *) -1, or a pointer
+ */
+static void
+jsonAnalyzeJson(JsonAnalyzeContext *ctx, Jsonb *jb, void *param)
+{
+	JsonbValue			jv;
+	JsonbIterator	   *it;
+	JsonbIteratorToken	tok;
+	JsonPathAnlStats   *target = (JsonPathAnlStats *) param;
+	JsonPathAnlStats   *stats = ctx->root;
+	JsonPath			path = &stats->path;
+	JsonPathEntry		entry;
+	bool				scalar = false;
+
+	if ((!target || target == stats) &&
+		!JB_ROOT_IS_SCALAR(jb))
+		jsonAnalyzeJsonValue(ctx, &stats->vstats, JsonValueInitBinary(&jv, jb));
+
+	it = JsonbIteratorInit(&jb->root);
+
+	while ((tok = JsonbIteratorNext(&it, &jv, true)) != WJB_DONE)
+	{
+		switch (tok)
+		{
+			case WJB_BEGIN_OBJECT:
+				entry.entry = NULL;
+				entry.len = -1;
+				entry.parent = path;
+				path = &entry;
+
+				break;
+
+			case WJB_END_OBJECT:
+				stats = (JsonPathAnlStats *)(path = path->parent);
+				break;
+
+			case WJB_BEGIN_ARRAY:
+				if (!(scalar = jv.val.array.rawScalar))
+				{
+					entry.entry = NULL;
+					entry.len = -1;
+					entry.parent = path;
+					path = &(stats = jsonAnalyzeAddPath(ctx, &entry))->path;
+				}
+				break;
+
+			case WJB_END_ARRAY:
+				if (!scalar)
+					stats = (JsonPathAnlStats *)(path = path->parent);
+				break;
+
+			case WJB_KEY:
+				entry.entry = jv.val.string.val;
+				entry.len = jv.val.string.len;
+				entry.parent = path->parent;
+				path = &(stats = jsonAnalyzeAddPath(ctx, &entry))->path;
+				break;
+
+			case WJB_VALUE:
+			case WJB_ELEM:
+				if (!target || target == stats)
+					jsonAnalyzeJsonValue(ctx, &stats->vstats, &jv);
+
+				/* XXX not sure why we're doing this? */
+				if (jv.type == jbvBinary)
+				{
+					/* recurse into container */
+					JsonbIterator *it2 = JsonbIteratorInit(jv.val.binary.data);
+
+					it2->parent = it;
+					it = it2;
+				}
+				break;
+
+			default:
+				break;
+		}
+	}
+}
+
+/*
+ * jsonAnalyzeJsonSubpath
+ *		???
+ */
+static void
+jsonAnalyzeJsonSubpath(JsonAnalyzeContext *ctx, JsonPathAnlStats *pstats,
+					   JsonbValue *jbv, int n)
+{
+	JsonbValue	scalar;
+	int			i;
+
+	for (i = n; i < pstats->depth; i++)
+	{
+		JsonPathEntry  *entry = pstats->entries[i];
+		JsonbContainer *jbc = jbv->val.binary.data;
+		JsonbValueType	type = jbv->type;
+
+		if (i > n)
+			pfree(jbv);
+
+		if (type != jbvBinary)
+			return;
+
+		if (entry->len == -1)
+		{
+			JsonbIterator	   *it;
+			JsonbIteratorToken	r;
+			JsonbValue			elem;
+
+			if (!JsonContainerIsArray(jbc) || JsonContainerIsScalar(jbc))
+				return;
+
+			it = JsonbIteratorInit(jbc);
+
+			while ((r = JsonbIteratorNext(&it, &elem, true)) != WJB_DONE)
+			{
+				if (r == WJB_ELEM)
+					jsonAnalyzeJsonSubpath(ctx, pstats, &elem, i + 1);
+			}
+
+			return;
+		}
+		else
+		{
+			if (!JsonContainerIsObject(jbc))
+				return;
+
+			jbv = findJsonbValueFromContainerLen(jbc, JB_FOBJECT,
+												 entry->entry, entry->len);
+
+			if (!jbv)
+				return;
+		}
+	}
+
+	if (i == n &&
+		jbv->type == jbvBinary &&
+		JsonbExtractScalar(jbv->val.binary.data, &scalar))
+		jbv = &scalar;
+
+	jsonAnalyzeJsonValue(ctx, &pstats->vstats, jbv);
+
+	if (i > n)
+		pfree(jbv);
+}
+
+/*
+ * jsonAnalyzeJsonPath
+ *		???
+ */
+static void
+jsonAnalyzeJsonPath(JsonAnalyzeContext *ctx, Jsonb *jb, void *param)
+{
+	JsonPathAnlStats   *pstats = (JsonPathAnlStats *) param;
+	JsonbValue			jbvtmp;
+	JsonbValue		   *jbv = JsonValueInitBinary(&jbvtmp, jb);
+	JsonPath			path;
+
+	if (!pstats->entries)
+	{
+		int i;
+
+		pstats->entries = MemoryContextAlloc(ctx->mcxt,
+									sizeof(*pstats->entries) * pstats->depth);
+
+		for (path = &pstats->path, i = pstats->depth - 1;
+			 path->parent && i >= 0;
+			 path = path->parent, i--)
+			pstats->entries[i] = path;
+	}
+
+	jsonAnalyzeJsonSubpath(ctx, pstats, jbv, 0);
+}
+
+static Datum
+jsonAnalyzePathFetch(VacAttrStatsP stats, int rownum, bool *isnull)
+{
+	*isnull = false;
+	return stats->exprvals[rownum];
+}
+
+/*
+ * jsonAnalyzePathValues
+ *		Calculate per-column statistics for values for a single path.
+ *
+ * We have already accumulated all the values for the path, so we simply
+ * call the typanalyze function for the proper data type, and then
+ * compute_stats (which points to compute_scalar_stats or so).
+ */
+static void
+jsonAnalyzePathValues(JsonAnalyzeContext *ctx, JsonScalarStats *sstats,
+					  Oid typid, double freq)
+{
+	JsonValues			   *values = &sstats->values;
+	VacAttrStats		   *stats = &sstats->stats;
+	FormData_pg_attribute	attr;
+	FormData_pg_type		type;
+	int						i;
+
+	if (!sstats->values.count)
+		return;
+
+	get_typlenbyvalalign(typid, &type.typlen, &type.typbyval, &type.typalign);
+
+	attr.attstattarget = ctx->target;
+
+	stats->attr = &attr;
+	stats->attrtypid = typid;
+	stats->attrtypmod = -1;
+	stats->attrtype = &type;
+	stats->anl_context = ctx->stats->anl_context;
+
+	stats->exprvals = values->buf;
+
+	/* XXX Do we need to initialize all slots? */
+	for (i = 0; i < STATISTIC_NUM_SLOTS; i++)
+	{
+		stats->statypid[i] = stats->attrtypid;
+		stats->statyplen[i] = stats->attrtype->typlen;
+		stats->statypbyval[i] = stats->attrtype->typbyval;
+		stats->statypalign[i] = stats->attrtype->typalign;
+	}
+
+	std_typanalyze(stats);
+
+	stats->compute_stats(stats, jsonAnalyzePathFetch,
+						 values->count,
+						 ctx->totalrows / ctx->samplerows * values->count);
+
+	/*
+	 * We've only kept the non-null values, so compute_stats will always
+	 * leave this as 1.0. But we have enough info to calculate the correct
+	 * value.
+	 */
+	stats->stanullfrac = (float4)(1.0 - freq);
+
+	/*
+	 * Similarly, we need to correct the MCV frequencies, becuse those are
+	 * also calculated only from the non-null values. All we need to do is
+	 * simply multiply that with the non-NULL frequency.
+	 */
+	for (i = 0; i < STATISTIC_NUM_SLOTS; i++)
+	{
+		if (stats->stakind[i] == STATISTIC_KIND_MCV)
+		{
+			int j;
+			for (j = 0; j < stats->numnumbers[i]; j++)
+				stats->stanumbers[i][j] *= freq;
+		}
+	}
+}
+
+/*
+ * jsonAnalyzeMakeScalarStats
+ *		Serialize scalar stats into a JSON representation.
+ *
+ * We simply produce a JSON document with a list of predefined keys:
+ *
+ * - nullfrac
+ * - distinct
+ * - width
+ * - correlation
+ * - mcv or histogram
+ *
+ * For the mcv / histogram, we store a nested values / numbers.
+ */
+static JsonbValue *
+jsonAnalyzeMakeScalarStats(JsonbParseState **ps, const char *name,
+							const VacAttrStats *stats)
+{
+	JsonbValue	val;
+	int			i;
+	int			j;
+
+	pushJsonbKey(ps, &val, name);
+
+	pushJsonbValue(ps, WJB_BEGIN_OBJECT, NULL);
+
+	pushJsonbKeyValueFloat(ps, &val, "nullfrac", stats->stanullfrac);
+	pushJsonbKeyValueFloat(ps, &val, "distinct", stats->stadistinct);
+	pushJsonbKeyValueInteger(ps, &val, "width", stats->stawidth);
+
+	for (i = 0; i < STATISTIC_NUM_SLOTS; i++)
+	{
+		if (!stats->stakind[i])
+			break;
+
+		switch (stats->stakind[i])
+		{
+			case STATISTIC_KIND_MCV:
+				pushJsonbKey(ps, &val, "mcv");
+				break;
+
+			case STATISTIC_KIND_HISTOGRAM:
+				pushJsonbKey(ps, &val, "histogram");
+				break;
+
+			case STATISTIC_KIND_CORRELATION:
+				pushJsonbKeyValueFloat(ps, &val, "correlation",
+									   stats->stanumbers[i][0]);
+				continue;
+
+			default:
+				elog(ERROR, "unexpected stakind %d", stats->stakind[i]);
+				break;
+		}
+
+		pushJsonbValue(ps, WJB_BEGIN_OBJECT, NULL);
+
+		if (stats->numvalues[i] > 0)
+		{
+			pushJsonbKey(ps, &val, "values");
+			pushJsonbValue(ps, WJB_BEGIN_ARRAY, NULL);
+			for (j = 0; j < stats->numvalues[i]; j++)
+			{
+				Datum v = stats->stavalues[i][j];
+				if (stats->attrtypid == JSONBOID)
+					pushJsonbElemBinary(ps, &val, DatumGetJsonbP(v));
+				else if (stats->attrtypid == TEXTOID)
+					pushJsonbElemText(ps, &val, DatumGetTextP(v));
+				else if (stats->attrtypid == NUMERICOID)
+					pushJsonbElemNumeric(ps, &val, DatumGetNumeric(v));
+				else if (stats->attrtypid == INT4OID)
+					pushJsonbElemInteger(ps, &val, DatumGetInt32(v));
+				else
+					elog(ERROR, "unexpected stat value type %d",
+						 stats->attrtypid);
+			}
+			pushJsonbValue(ps, WJB_END_ARRAY, NULL);
+		}
+
+		if (stats->numnumbers[i] > 0)
+		{
+			pushJsonbKey(ps, &val, "numbers");
+			pushJsonbValue(ps, WJB_BEGIN_ARRAY, NULL);
+			for (j = 0; j < stats->numnumbers[i]; j++)
+				pushJsonbElemFloat(ps, &val, stats->stanumbers[i][j]);
+			pushJsonbValue(ps, WJB_END_ARRAY, NULL);
+		}
+
+		pushJsonbValue(ps, WJB_END_OBJECT, NULL);
+	}
+
+	return pushJsonbValue(ps, WJB_END_OBJECT, NULL);
+}
+
+/*
+ * jsonAnalyzeBuildPathStats
+ *		Serialize statistics for a particular json path.
+ *
+ * This includes both the per-column stats (stored in "json" key) and the
+ * JSON specific stats (like frequencies of different object types).
+ */
+static Jsonb *
+jsonAnalyzeBuildPathStats(JsonPathAnlStats *pstats)
+{
+	const JsonValueStats *vstats = &pstats->vstats;
+	float4				freq = pstats->freq;
+	bool				full = !!pstats->path.parent;
+	JsonbValue			val;
+	JsonbValue		   *jbv;
+	JsonbParseState	   *ps = NULL;
+
+	pushJsonbValue(&ps, WJB_BEGIN_OBJECT, NULL);
+
+	pushJsonbKeyValueString(&ps, &val, "path", pstats->pathstr);
+
+	pushJsonbKeyValueFloat(&ps, &val, "freq", freq);
+
+	pushJsonbKeyValueFloat(&ps, &val, "freq_null",
+						   freq * vstats->nnulls /
+								  vstats->jsons.values.count);
+
+	pushJsonbKeyValueFloat(&ps, &val, "freq_boolean",
+						   freq * (vstats->booleans.nfalse +
+								   vstats->booleans.ntrue) /
+								  vstats->jsons.values.count);
+
+	pushJsonbKeyValueFloat(&ps, &val, "freq_string",
+						   freq * vstats->strings.values.count /
+								  vstats->jsons.values.count);
+
+	pushJsonbKeyValueFloat(&ps, &val, "freq_numeric",
+						   freq * vstats->numerics.values.count /
+								  vstats->jsons.values.count);
+
+	pushJsonbKeyValueFloat(&ps, &val, "freq_array",
+						   freq * vstats->narrays /
+								  vstats->jsons.values.count);
+
+	pushJsonbKeyValueFloat(&ps, &val, "freq_object",
+						   freq * vstats->nobjects /
+								  vstats->jsons.values.count);
+
+	/*
+	 * XXX not sure why we keep length and array length stats at this level.
+	 * Aren't those covered by the per-column stats? We certainly have
+	 * frequencies for array elements etc.
+	 */
+	if (pstats->vstats.lens.values.count)
+		jsonAnalyzeMakeScalarStats(&ps, "length", &vstats->lens.stats);
+
+	if (pstats->path.len == -1)
+	{
+		JsonPathAnlStats *parent = (JsonPathAnlStats *) pstats->path.parent;
+
+		pushJsonbKeyValueFloat(&ps, &val, "avg_array_length",
+							   (float4) vstats->jsons.values.count /
+										parent->vstats.narrays);
+
+		jsonAnalyzeMakeScalarStats(&ps, "array_length",
+									&parent->vstats.arrlens.stats);
+	}
+
+	if (full)
+	{
+#ifdef JSON_ANALYZE_SCALARS
+		jsonAnalyzeMakeScalarStats(&ps, "string", &vstats->strings.stats);
+		jsonAnalyzeMakeScalarStats(&ps, "numeric", &vstats->numerics.stats);
+#endif
+		jsonAnalyzeMakeScalarStats(&ps, "json", &vstats->jsons.stats);
+	}
+
+	jbv = pushJsonbValue(&ps, WJB_END_OBJECT, NULL);
+
+	return JsonbValueToJsonb(jbv);
+}
+
+/*
+ * jsonAnalyzeCalcPathFreq
+ *		Calculate path frequency, i.e. how many documents contain this path.
+ */
+static void
+jsonAnalyzeCalcPathFreq(JsonAnalyzeContext *ctx, JsonPathAnlStats *pstats)
+{
+	JsonPathAnlStats  *parent = (JsonPathAnlStats *) pstats->path.parent;
+
+	if (parent)
+	{
+		pstats->freq = parent->freq *
+			(pstats->path.len == -1 ? parent->vstats.narrays
+									: pstats->vstats.jsons.values.count) /
+			parent->vstats.jsons.values.count;
+
+		CLAMP_PROBABILITY(pstats->freq);
+	}
+	else
+		pstats->freq = (double) ctx->analyzed_cnt / ctx->samplerows;
+}
+
+/*
+ * jsonAnalyzePath
+ *		Build statistics for values accumulated for this path.
+ *
+ * We're done with accumulating values for this path, so calculate the
+ * statistics for the various arrays.
+ *
+ * XXX I wonder if we could introduce some simple heuristict on which
+ * paths to keep, similarly to what we do for MCV lists. For example a
+ * path that occurred just once is not very interesting, so we could
+ * decide to ignore it and not build the stats. Although that won't
+ * save much, because there'll be very few values accumulated.
+ */
+static void
+jsonAnalyzePath(JsonAnalyzeContext *ctx, JsonPathAnlStats *pstats)
+{
+	MemoryContext		oldcxt;
+	JsonValueStats	   *vstats = &pstats->vstats;
+
+	jsonAnalyzeCalcPathFreq(ctx, pstats);
+
+	/* values combining all object types */
+	jsonAnalyzePathValues(ctx, &vstats->jsons, JSONBOID, pstats->freq);
+
+	/*
+	 * lengths and array lengths
+	 *
+	 * XXX Not sure why we divide it by the number of json values?
+	 */
+	jsonAnalyzePathValues(ctx, &vstats->lens, INT4OID,
+						  pstats->freq * vstats->lens.values.count /
+										 vstats->jsons.values.count);
+	jsonAnalyzePathValues(ctx, &vstats->arrlens, INT4OID,
+						  pstats->freq * vstats->arrlens.values.count /
+										 vstats->jsons.values.count);
+
+#ifdef JSON_ANALYZE_SCALARS
+	/* stats for values of string/numeric types only */
+	jsonAnalyzePathValues(ctx, &vstats->strings, TEXTOID, pstats->freq);
+	jsonAnalyzePathValues(ctx, &vstats->numerics, NUMERICOID, pstats->freq);
+#endif
+
+	oldcxt = MemoryContextSwitchTo(ctx->stats->anl_context);
+	pstats->stats = jsonAnalyzeBuildPathStats(pstats);
+	MemoryContextSwitchTo(oldcxt);
+}
+
+/*
+ * JsonPathStatsCompare
+ *		Compare two path stats (by path string).
+ *
+ * We store the stats sorted by path string, and this is the comparator.
+ */
+static int
+JsonPathStatsCompare(const void *pv1, const void *pv2)
+{
+	return strcmp((*((const JsonPathAnlStats **) pv1))->pathstr,
+				  (*((const JsonPathAnlStats **) pv2))->pathstr);
+}
+
+/*
+ * jsonAnalyzeSortPaths
+ *		Reads all stats stored in the hash table and sorts them.
+ *
+ * XXX It's a bit strange we simply store the result in the context instead
+ * of just returning it.
+ */
+static void
+jsonAnalyzeSortPaths(JsonAnalyzeContext *ctx)
+{
+	HASH_SEQ_STATUS		hseq;
+	JsonPathAnlStats   *path;
+	int					i;
+
+	ctx->npaths = hash_get_num_entries(ctx->pathshash) + 1;
+	ctx->paths = MemoryContextAlloc(ctx->mcxt,
+									sizeof(*ctx->paths) * ctx->npaths);
+
+	ctx->paths[0] = ctx->root;
+
+	hash_seq_init(&hseq, ctx->pathshash);
+
+	for (i = 1; (path = hash_seq_search(&hseq)); i++)
+		ctx->paths[i] = path;
+
+	pg_qsort(ctx->paths, ctx->npaths, sizeof(*ctx->paths),
+			 JsonPathStatsCompare);
+}
+
+/*
+ * jsonAnalyzePaths
+ *		Sort the paths and calculate statistics for each of them.
+ *
+ * Now that we're done with processing the documents, we sort the paths
+ * we extracted and calculate stats for each of them.
+ *
+ * XXX I wonder if we could do this in two phases, to maybe not collect
+ * (or even accumulate) values for paths that are not interesting.
+ */
+static void
+jsonAnalyzePaths(JsonAnalyzeContext	*ctx)
+{
+	int	i;
+
+	jsonAnalyzeSortPaths(ctx);
+
+	for (i = 0; i < ctx->npaths; i++)
+		jsonAnalyzePath(ctx, ctx->paths[i]);
+}
+
+/*
+ * jsonAnalyzeBuildPathStatsArray
+ *		???
+ */
+static Datum *
+jsonAnalyzeBuildPathStatsArray(JsonPathAnlStats **paths, int npaths, int *nvals,
+								const char *prefix, int prefixlen)
+{
+	Datum	   *values = palloc(sizeof(Datum) * (npaths + 1));
+	JsonbValue *jbvprefix = palloc(sizeof(JsonbValue));
+	int			i;
+
+	JsonValueInitStringWithLen(jbvprefix,
+							   memcpy(palloc(prefixlen), prefix, prefixlen),
+							   prefixlen);
+
+	values[0] = JsonbPGetDatum(JsonbValueToJsonb(jbvprefix));
+
+	for (i = 0; i < npaths; i++)
+		values[i + 1] = JsonbPGetDatum(paths[i]->stats);
+
+	*nvals = npaths + 1;
+
+	return values;
+}
+
+/*
+ * jsonAnalyzeMakeStats
+ *		???
+ */
+static Datum *
+jsonAnalyzeMakeStats(JsonAnalyzeContext *ctx, int *numvalues)
+{
+	Datum		   *values;
+	MemoryContext	oldcxt = MemoryContextSwitchTo(ctx->stats->anl_context);
+
+	values = jsonAnalyzeBuildPathStatsArray(ctx->paths, ctx->npaths,
+											numvalues, "$", 1);
+
+	MemoryContextSwitchTo(oldcxt);
+
+	return values;
+}
+
+/*
+ * jsonAnalyzeBuildSubPathsData
+ *		???
+ */
+bool
+jsonAnalyzeBuildSubPathsData(Datum *pathsDatums, int npaths, int index,
+							 const char	*path, int pathlen,
+							 bool includeSubpaths, float4 nullfrac,
+							 Datum *pvals, Datum *pnums)
+{
+	JsonPathAnlStats  **pvalues = palloc(sizeof(*pvalues) * npaths);
+	Datum			   *values;
+	Datum				numbers[1];
+	JsonbValue			pathkey;
+	int					nsubpaths = 0;
+	int					nvalues;
+	int					i;
+
+	JsonValueInitStringWithLen(&pathkey, "path", 4);
+
+	for (i = index; i < npaths; i++)
+	{
+		Jsonb	   *jb = DatumGetJsonbP(pathsDatums[i]);
+		JsonbValue *jbv = findJsonbValueFromContainer(&jb->root, JB_FOBJECT,
+													  &pathkey);
+
+		if (!jbv || jbv->type != jbvString ||
+			jbv->val.string.len < pathlen ||
+			memcmp(jbv->val.string.val, path, pathlen))
+			break;
+
+		pfree(jbv);
+
+		pvalues[nsubpaths] = palloc(sizeof(**pvalues));
+		pvalues[nsubpaths]->stats = jb;
+
+		nsubpaths++;
+
+		if (!includeSubpaths)
+			break;
+	}
+
+	if (!nsubpaths)
+	{
+		pfree(pvalues);
+		return false;
+	}
+
+	values = jsonAnalyzeBuildPathStatsArray(pvalues, nsubpaths, &nvalues,
+											path, pathlen);
+	*pvals = PointerGetDatum(construct_array(values, nvalues, JSONBOID, -1,
+											 false, 'i'));
+
+	pfree(pvalues);
+	pfree(values);
+
+	numbers[0] = Float4GetDatum(nullfrac);
+	*pnums = PointerGetDatum(construct_array(numbers, 1, FLOAT4OID, 4,
+											 true /*FLOAT4PASSBYVAL*/, 'i'));
+
+	return true;
+}
+
+/*
+ * jsonAnalyzeInit
+ *		Initialize the analyze context so that we can start adding paths.
+ */
+static void
+jsonAnalyzeInit(JsonAnalyzeContext *ctx, VacAttrStats *stats,
+				AnalyzeAttrFetchFunc fetchfunc,
+				int samplerows, double totalrows)
+{
+	HASHCTL	hash_ctl;
+
+	memset(ctx, 0, sizeof(*ctx));
+
+	ctx->stats = stats;
+	ctx->fetchfunc = fetchfunc;
+	ctx->mcxt = CurrentMemoryContext;
+	ctx->samplerows = samplerows;
+	ctx->totalrows = totalrows;
+	ctx->target = stats->attr->attstattarget;
+	ctx->scalarsOnly = false;
+
+	MemSet(&hash_ctl, 0, sizeof(hash_ctl));
+	hash_ctl.keysize = sizeof(JsonPathEntry);
+	hash_ctl.entrysize = sizeof(JsonPathAnlStats);
+	hash_ctl.hash = JsonPathHash;
+	hash_ctl.match = JsonPathMatch;
+	hash_ctl.hcxt = ctx->mcxt;
+
+	ctx->pathshash = hash_create("JSON analyze path table", 100, &hash_ctl,
+					HASH_ELEM | HASH_FUNCTION | HASH_COMPARE | HASH_CONTEXT);
+
+	ctx->root = MemoryContextAllocZero(ctx->mcxt, sizeof(JsonPathAnlStats));
+	ctx->root->pathstr = "$";
+}
+
+/*
+ * jsonAnalyzePass
+ *		One analysis pass over the JSON column.
+ *
+ * Performs one analysis pass on the JSON documents, and passes them to the
+ * custom analyzefunc.
+ */
+static void
+jsonAnalyzePass(JsonAnalyzeContext *ctx,
+				void (*analyzefunc)(JsonAnalyzeContext *, Jsonb *, void *),
+				void *analyzearg)
+{
+	int	row_num;
+
+	MemoryContext	tmpcxt = AllocSetContextCreate(CurrentMemoryContext,
+												"Json Analyze Pass Context",
+												ALLOCSET_DEFAULT_MINSIZE,
+												ALLOCSET_DEFAULT_INITSIZE,
+												ALLOCSET_DEFAULT_MAXSIZE);
+
+	MemoryContext	oldcxt = MemoryContextSwitchTo(tmpcxt);
+
+	ctx->null_cnt = 0;
+	ctx->analyzed_cnt = 0;
+	ctx->total_width = 0;
+
+	/* Loop over the arrays. */
+	for (row_num = 0; row_num < ctx->samplerows; row_num++)
+	{
+		Datum		value;
+		Jsonb	   *jb;
+		Size		width;
+		bool		isnull;
+
+		vacuum_delay_point();
+
+		value = ctx->fetchfunc(ctx->stats, row_num, &isnull);
+
+		if (isnull)
+		{
+			/* json is null, just count that */
+			ctx->null_cnt++;
+			continue;
+		}
+
+		width = toast_raw_datum_size(value);
+
+		ctx->total_width += VARSIZE_ANY(DatumGetPointer(value)); /* FIXME raw width? */
+
+		/* Skip too-large values. */
+#define JSON_WIDTH_THRESHOLD (100 * 1024)
+
+		if (width > JSON_WIDTH_THRESHOLD)
+			continue;
+
+		ctx->analyzed_cnt++;
+
+		jb = DatumGetJsonbP(value);
+
+		MemoryContextSwitchTo(oldcxt);
+
+		analyzefunc(ctx, jb, analyzearg);
+
+		oldcxt = MemoryContextSwitchTo(tmpcxt);
+		MemoryContextReset(tmpcxt);
+	}
+
+	MemoryContextSwitchTo(oldcxt);
+}
+
+/*
+ * compute_json_stats() -- compute statistics for a json column
+ */
+static void
+compute_json_stats(VacAttrStats *stats, AnalyzeAttrFetchFunc fetchfunc,
+				   int samplerows, double totalrows)
+{
+	JsonAnalyzeContext	ctx;
+
+	jsonAnalyzeInit(&ctx, stats, fetchfunc, samplerows, totalrows);
+
+	/* XXX Not sure what the first branch is doing (or supposed to)? */
+	if (false)
+	{
+		jsonAnalyzePass(&ctx, jsonAnalyzeJson, NULL);
+		jsonAnalyzePaths(&ctx);
+	}
+	else
+	{
+		int				i;
+		MemoryContext	oldcxt;
+		MemoryContext	tmpcxt = AllocSetContextCreate(CurrentMemoryContext,
+													"Json Analyze Tmp Context",
+													ALLOCSET_DEFAULT_MINSIZE,
+													ALLOCSET_DEFAULT_INITSIZE,
+													ALLOCSET_DEFAULT_MAXSIZE);
+
+		elog(DEBUG1, "analyzing %s attribute \"%s\"",
+			stats->attrtypid == JSONBOID ? "jsonb" : "json",
+			NameStr(stats->attr->attname));
+
+		elog(DEBUG1, "collecting json paths");
+
+		oldcxt = MemoryContextSwitchTo(tmpcxt);
+
+		/*
+		 * XXX It's not immediately clear why this is (-1) and not simply
+		 * NULL. It crashes, so presumably it's used to tweak the behavior,
+		 * but it's not clear why/how, and it affects place that is pretty
+		 * far away, and so not obvious. We should use some sort of flag
+		 * with a descriptive name instead.
+		 *
+		 * XXX If I understand correctly, we simply collect all paths first,
+		 * without accumulating any Values. And then in the next step we
+		 * process each path independently, probably to save memory (we
+		 * don't want to accumulate all values for all paths, with a lot
+		 * of duplicities).
+		 */
+		jsonAnalyzePass(&ctx, jsonAnalyzeJson, (void *) -1);
+		jsonAnalyzeSortPaths(&ctx);
+
+		MemoryContextReset(tmpcxt);
+
+		for (i = 0; i < ctx.npaths; i++)
+		{
+			JsonPathAnlStats *path = ctx.paths[i];
+
+			elog(DEBUG1, "analyzing json path (%d/%d) %s",
+				 i + 1, ctx.npaths, path->pathstr);
+
+			jsonAnalyzePass(&ctx, jsonAnalyzeJsonPath, path);
+			jsonAnalyzePath(&ctx, path);
+
+			MemoryContextReset(tmpcxt);
+		}
+
+		MemoryContextSwitchTo(oldcxt);
+
+		MemoryContextDelete(tmpcxt);
+	}
+
+	/* We can only compute real stats if we found some non-null values. */
+	if (ctx.null_cnt >= samplerows)
+	{
+		/* We found only nulls; assume the column is entirely null */
+		stats->stats_valid = true;
+		stats->stanullfrac = 1.0;
+		stats->stawidth = 0;		/* "unknown" */
+		stats->stadistinct = 0.0;	/* "unknown" */
+	}
+	else if (!ctx.analyzed_cnt)
+	{
+		int	nonnull_cnt = samplerows - ctx.null_cnt;
+
+		/* We found some non-null values, but they were all too wide */
+		stats->stats_valid = true;
+		/* Do the simple null-frac and width stats */
+		stats->stanullfrac = (double) ctx.null_cnt / (double) samplerows;
+		stats->stawidth = ctx.total_width / (double) nonnull_cnt;
+		/* Assume all too-wide values are distinct, so it's a unique column */
+		stats->stadistinct = -1.0 * (1.0 - stats->stanullfrac);
+	}
+	else
+	{
+		VacAttrStats   *jsstats = &ctx.root->vstats.jsons.stats;
+		int				i;
+		int				empty_slot = -1;
+
+		stats->stats_valid = true;
+
+		stats->stanullfrac	= jsstats->stanullfrac;
+		stats->stawidth		= jsstats->stawidth;
+		stats->stadistinct	= jsstats->stadistinct;
+
+		/*
+		 * We need to store the statistics the statistics slots. We simply
+		 * store the regular stats in the first slots, and then we put the
+		 * JSON stats into the first empty slot.
+		 */
+		for (i = 0; i < STATISTIC_NUM_SLOTS; i++)
+		{
+			/* once we hit an empty slot, we're done */
+			if (!jsstats->staop[i])
+			{
+				empty_slot = i;		/* remember the empty slot */
+				break;
+			}
+
+			stats->stakind[i] 		= jsstats->stakind[i];
+			stats->staop[i] 		= jsstats->staop[i];
+			stats->stanumbers[i] 	= jsstats->stanumbers[i];
+			stats->stavalues[i] 	= jsstats->stavalues[i];
+			stats->statypid[i] 		= jsstats->statypid[i];
+			stats->statyplen[i] 	= jsstats->statyplen[i];
+			stats->statypbyval[i] 	= jsstats->statypbyval[i];
+			stats->statypalign[i] 	= jsstats->statypalign[i];
+			stats->numnumbers[i] 	= jsstats->numnumbers[i];
+			stats->numvalues[i] 	= jsstats->numvalues[i];
+		}
+
+		Assert((empty_slot >= 0) && (empty_slot < STATISTIC_NUM_SLOTS));
+
+		stats->stakind[empty_slot] = STATISTIC_KIND_JSON;
+		stats->staop[empty_slot] = InvalidOid;
+		stats->numnumbers[empty_slot] = 1;
+		stats->stanumbers[empty_slot] = MemoryContextAlloc(stats->anl_context,
+														   sizeof(float4));
+		stats->stanumbers[empty_slot][0] = 0.0; /* nullfrac */
+		stats->stavalues[empty_slot] =
+				jsonAnalyzeMakeStats(&ctx, &stats->numvalues[empty_slot]);
+
+		/* We are storing jsonb values */
+		/* XXX Could the parameters be different on other platforms? */
+		stats->statypid[empty_slot] = JSONBOID;
+		stats->statyplen[empty_slot] = -1;
+		stats->statypbyval[empty_slot] = false;
+		stats->statypalign[empty_slot] = 'i';
+	}
+}
+
+/*
+ * json_typanalyze -- typanalyze function for jsonb
+ */
+Datum
+jsonb_typanalyze(PG_FUNCTION_ARGS)
+{
+	VacAttrStats *stats = (VacAttrStats *) PG_GETARG_POINTER(0);
+	Form_pg_attribute attr = stats->attr;
+
+	/* If the attstattarget column is negative, use the default value */
+	/* NB: it is okay to scribble on stats->attr since it's a copy */
+	if (attr->attstattarget < 0)
+		attr->attstattarget = default_statistics_target;
+
+	stats->compute_stats = compute_json_stats;
+	/* see comment about the choice of minrows in commands/analyze.c */
+	stats->minrows = 300 * attr->attstattarget;
+
+	PG_RETURN_BOOL(true);
+}
diff --git a/src/backend/utils/adt/jsonpath_exec.c b/src/backend/utils/adt/jsonpath_exec.c
index 078aaef5392..16a08561c10 100644
--- a/src/backend/utils/adt/jsonpath_exec.c
+++ b/src/backend/utils/adt/jsonpath_exec.c
@@ -1723,7 +1723,7 @@ executeLikeRegex(JsonPathItem *jsp, JsonbValue *str, JsonbValue *rarg,
 		cxt->cflags = jspConvertRegexFlags(jsp->content.like_regex.flags);
 	}
 
-	if (RE_compile_and_execute(cxt->regex, str->val.string.val,
+	if (RE_compile_and_execute(cxt->regex, unconstify(char *, str->val.string.val),
 							   str->val.string.len,
 							   cxt->cflags, DEFAULT_COLLATION_OID, 0, NULL))
 		return jpbTrue;
diff --git a/src/include/catalog/pg_operator.dat b/src/include/catalog/pg_operator.dat
index c0ff8da722c..736f4a7ec3b 100644
--- a/src/include/catalog/pg_operator.dat
+++ b/src/include/catalog/pg_operator.dat
@@ -3175,7 +3175,7 @@
 { oid => '3211', oid_symbol => 'JsonbObjectFieldOperator',
   descr => 'get jsonb object field',
   oprname => '->', oprleft => 'jsonb', oprright => 'text', oprresult => 'jsonb',
-  oprcode => 'jsonb_object_field' },
+  oprcode => 'jsonb_object_field', oprstat => 'jsonb_stats' },
 { oid => '3477', oid_symbol => 'JsonbObjectFieldTextOperator',
   descr => 'get jsonb object field as text',
   oprname => '->>', oprleft => 'jsonb', oprright => 'text', oprresult => 'text',
@@ -3183,7 +3183,7 @@
 { oid => '3212', oid_symbol => 'JsonbArrayElementOperator',
   descr => 'get jsonb array element',
   oprname => '->', oprleft => 'jsonb', oprright => 'int4', oprresult => 'jsonb',
-  oprcode => 'jsonb_array_element' },
+  oprcode => 'jsonb_array_element', oprstat => 'jsonb_stats' },
 { oid => '3481', oid_symbol => 'JsonbArrayElementTextOperator',
   descr => 'get jsonb array element as text',
   oprname => '->>', oprleft => 'jsonb', oprright => 'int4', oprresult => 'text',
@@ -3191,7 +3191,8 @@
 { oid => '3213', oid_symbol => 'JsonbExtractPathOperator',
   descr => 'get value from jsonb with path elements',
   oprname => '#>', oprleft => 'jsonb', oprright => '_text',
-  oprresult => 'jsonb', oprcode => 'jsonb_extract_path' },
+  oprresult => 'jsonb', oprcode => 'jsonb_extract_path',
+  oprstat => 'jsonb_stats' },
 { oid => '3206', oid_symbol => 'JsonbExtractPathTextOperator',
   descr => 'get value from jsonb as text with path elements',
   oprname => '#>>', oprleft => 'jsonb', oprright => '_text',
@@ -3232,20 +3233,20 @@
   oprrest => 'matchingsel', oprjoin => 'matchingjoinsel' },
 { oid => '3247', oid_symbol => 'JsonbExistsOperator', descr => 'key exists',
   oprname => '?', oprleft => 'jsonb', oprright => 'text', oprresult => 'bool',
-  oprcode => 'jsonb_exists', oprrest => 'matchingsel',
+  oprcode => 'jsonb_exists', oprrest => 'jsonb_sel',
   oprjoin => 'matchingjoinsel' },
 { oid => '3248', oid_symbol => 'JsonbExistsAnyOperator', descr => 'any key exists',
   oprname => '?|', oprleft => 'jsonb', oprright => '_text', oprresult => 'bool',
-  oprcode => 'jsonb_exists_any', oprrest => 'matchingsel',
+  oprcode => 'jsonb_exists_any', oprrest => 'jsonb_sel',
   oprjoin => 'matchingjoinsel' },
 { oid => '3249', oid_symbol => 'JsonbExistsAllOperator', descr => 'all keys exist',
   oprname => '?&', oprleft => 'jsonb', oprright => '_text', oprresult => 'bool',
-  oprcode => 'jsonb_exists_all', oprrest => 'matchingsel',
+  oprcode => 'jsonb_exists_all', oprrest => 'jsonb_sel',
   oprjoin => 'matchingjoinsel' },
 { oid => '3250', oid_symbol => 'JsonbContainedOperator', descr => 'is contained by',
   oprname => '<@', oprleft => 'jsonb', oprright => 'jsonb', oprresult => 'bool',
   oprcom => '@>(jsonb,jsonb)', oprcode => 'jsonb_contained',
-  oprrest => 'matchingsel', oprjoin => 'matchingjoinsel' },
+  oprrest => 'jsonb_sel', oprjoin => 'matchingjoinsel' },
 { oid => '3284', descr => 'concatenate',
   oprname => '||', oprleft => 'jsonb', oprright => 'jsonb',
   oprresult => 'jsonb', oprcode => 'jsonb_concat' },
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 4d992dc2241..73ba2d08690 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11715,4 +11715,15 @@
   prorettype => 'bytea', proargtypes => 'pg_brin_minmax_multi_summary',
   prosrc => 'brin_minmax_multi_summary_send' },
 
+# jsonb statistics
+{ oid => '8526', descr => 'jsonb typanalyze',
+  proname => 'jsonb_typanalyze', provolatile => 's', prorettype => 'bool',
+  proargtypes => 'internal', prosrc => 'jsonb_typanalyze' },
+{ oid => '8527', descr => 'jsonb selectivity estimation',
+  proname => 'jsonb_sel', provolatile => 's', prorettype => 'float8',
+  proargtypes => 'internal oid internal int4', prosrc => 'jsonb_sel' },
+{ oid => '8528', descr => 'jsonb statsistics estimation',
+  proname => 'jsonb_stats', provolatile => 's', prorettype => 'bool',
+  proargtypes => 'internal internal int4 internal', prosrc => 'jsonb_stats' },
+
 ]
diff --git a/src/include/catalog/pg_statistic.h b/src/include/catalog/pg_statistic.h
index 4f95d7ade47..85e6f9e0fb6 100644
--- a/src/include/catalog/pg_statistic.h
+++ b/src/include/catalog/pg_statistic.h
@@ -277,6 +277,8 @@ DECLARE_FOREIGN_KEY((starelid, staattnum), pg_attribute, (attrelid, attnum));
  */
 #define STATISTIC_KIND_BOUNDS_HISTOGRAM  7
 
+#define STATISTIC_KIND_JSON 8
+
 #endif							/* EXPOSE_TO_CLIENT_CODE */
 
 #endif							/* PG_STATISTIC_H */
diff --git a/src/include/catalog/pg_type.dat b/src/include/catalog/pg_type.dat
index f3d94f3cf5d..1f881eef516 100644
--- a/src/include/catalog/pg_type.dat
+++ b/src/include/catalog/pg_type.dat
@@ -445,7 +445,7 @@
   typname => 'jsonb', typlen => '-1', typbyval => 'f', typcategory => 'U',
   typsubscript => 'jsonb_subscript_handler', typinput => 'jsonb_in',
   typoutput => 'jsonb_out', typreceive => 'jsonb_recv', typsend => 'jsonb_send',
-  typalign => 'i', typstorage => 'x' },
+  typanalyze => 'jsonb_typanalyze', typalign => 'i', typstorage => 'x' },
 { oid => '4072', array_type_oid => '4073', descr => 'JSON path',
   typname => 'jsonpath', typlen => '-1', typbyval => 'f', typcategory => 'U',
   typinput => 'jsonpath_in', typoutput => 'jsonpath_out',
diff --git a/src/include/utils/json_selfuncs.h b/src/include/utils/json_selfuncs.h
new file mode 100644
index 00000000000..c8999d105bc
--- /dev/null
+++ b/src/include/utils/json_selfuncs.h
@@ -0,0 +1,100 @@
+/*-------------------------------------------------------------------------
+ *
+ * json_selfuncs.h
+ *	  JSON cost estimation functions.
+ *
+ *
+ * Portions Copyright (c) 2016, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *    src/include/utils/json_selfuncs.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef JSON_SELFUNCS_H_
+#define JSON_SELFUNCS_H 1
+
+#include "postgres.h"
+#include "access/htup.h"
+#include "utils/jsonb.h"
+#include "utils/lsyscache.h"
+#include "utils/selfuncs.h"
+
+typedef struct JsonStatData
+{
+	AttStatsSlot	attslot;
+	HeapTuple		statsTuple;
+	RelOptInfo	   *rel;
+	Datum		   *values;
+	int				nvalues;
+	float4			nullfrac;
+	const char	   *prefix;
+	int				prefixlen;
+	bool			acl_ok;
+} JsonStatData, *JsonStats;
+
+typedef enum
+{
+	JsonPathStatsValues,
+	JsonPathStatsLength,
+	JsonPathStatsArrayLength,
+} JsonPathStatsType;
+
+typedef struct JsonPathStatsData
+{
+	Datum			   *datum;
+	JsonStats		   	data;
+	char			   *path;
+	int					pathlen;
+	JsonPathStatsType	type;
+} JsonPathStatsData, *JsonPathStats;
+
+typedef enum JsonStatType
+{
+	JsonStatJsonb,
+	JsonStatJsonbWithoutSubpaths,
+	JsonStatText,
+	JsonStatString,
+	JsonStatNumeric,
+	JsonStatFreq,
+} JsonStatType;
+
+extern bool jsonStatsInit(JsonStats stats, const VariableStatData *vardata);
+extern void jsonStatsRelease(JsonStats data);
+
+extern JsonPathStats jsonStatsGetPathStatsStr(JsonStats stats,
+												const char *path, int pathlen);
+
+extern JsonPathStats jsonPathStatsGetSubpath(JsonPathStats stats,
+										const char *subpath, int subpathlen);
+
+extern bool jsonPathStatsGetNextKeyStats(JsonPathStats stats,
+										JsonPathStats *keystats, bool keysOnly);
+
+extern JsonPathStats jsonPathStatsGetLengthStats(JsonPathStats pstats);
+
+extern float4 jsonPathStatsGetFreq(JsonPathStats pstats, float4 defaultfreq);
+
+extern float4 jsonPathStatsGetTypeFreq(JsonPathStats pstats,
+									JsonbValueType type, float4 defaultfreq);
+
+extern float4 jsonPathStatsGetAvgArraySize(JsonPathStats pstats);
+
+extern Selectivity jsonPathStatsGetArrayIndexSelectivity(JsonPathStats pstats,
+														 int index);
+
+extern Selectivity jsonSelectivity(JsonPathStats stats, Datum scalar, Oid oper);
+
+
+extern bool jsonAnalyzeBuildSubPathsData(Datum *pathsDatums,
+										 int npaths, int index,
+										 const char	*path, int pathlen,
+										 bool includeSubpaths, float4 nullfrac,
+										 Datum *pvals, Datum *pnums);
+
+extern Datum jsonb_typanalyze(PG_FUNCTION_ARGS);
+extern Datum jsonb_stats(PG_FUNCTION_ARGS);
+extern Datum jsonb_sel(PG_FUNCTION_ARGS);
+
+#endif /* JSON_SELFUNCS_H */
diff --git a/src/test/regress/expected/jsonb_stats.out b/src/test/regress/expected/jsonb_stats.out
new file mode 100644
index 00000000000..0badf8e8479
--- /dev/null
+++ b/src/test/regress/expected/jsonb_stats.out
@@ -0,0 +1,713 @@
+CREATE OR REPLACE FUNCTION explain_jsonb(sql_query text)
+RETURNS TABLE(explain_line json) AS
+$$
+BEGIN
+	RETURN QUERY EXECUTE 'EXPLAIN (ANALYZE, FORMAT json) ' || sql_query;
+END;
+$$ LANGUAGE plpgsql;
+CREATE OR REPLACE FUNCTION get_plan_and_actual_rows(sql_query text)
+RETURNS TABLE(plan integer, actual integer) AS
+$$
+	SELECT
+		(plan->>'Plan Rows')::integer plan,
+		(plan->>'Actual Rows')::integer actual
+	FROM (
+		SELECT explain_jsonb(sql_query) #> '{0,Plan,Plans,0}'
+	) p(plan)
+$$ LANGUAGE sql;
+CREATE OR REPLACE FUNCTION check_estimate(sql_query text, accuracy real)
+RETURNS boolean AS
+$$
+	SELECT plan BETWEEN actual / (1 + accuracy) AND (actual + 1) * (1 + accuracy)
+	FROM (SELECT * FROM get_plan_and_actual_rows(sql_query)) x
+$$ LANGUAGE sql;
+CREATE OR REPLACE FUNCTION check_estimate2(sql_query text, accuracy real)
+RETURNS TABLE(min integer, max integer) AS
+$$
+	SELECT (actual * (1 - accuracy))::integer, ((actual + 1) * (1 + accuracy))::integer
+	FROM (SELECT * FROM get_plan_and_actual_rows(sql_query)) x
+$$ LANGUAGE sql;
+CREATE TABLE jsonb_stats_test(js jsonb);
+INSERT INTO jsonb_stats_test SELECT NULL FROM generate_series(1, 1000);
+INSERT INTO jsonb_stats_test SELECT 'null' FROM generate_series(1, 200);
+INSERT INTO jsonb_stats_test SELECT 'true' FROM generate_series(1, 300);
+INSERT INTO jsonb_stats_test SELECT 'false' FROM generate_series(1, 500);
+INSERT INTO jsonb_stats_test SELECT '12345' FROM generate_series(1, 100);
+INSERT INTO jsonb_stats_test SELECT (1000 * (i % 10))::text::jsonb FROM generate_series(1, 400) i;
+INSERT INTO jsonb_stats_test SELECT i::text::jsonb FROM generate_series(1, 500) i;
+INSERT INTO jsonb_stats_test SELECT '"foo"' FROM generate_series(1, 100);
+INSERT INTO jsonb_stats_test SELECT format('"bar%s"', i % 10)::jsonb FROM generate_series(1, 400) i;
+INSERT INTO jsonb_stats_test SELECT format('"baz%s"', i)::jsonb FROM generate_series(1, 500) i;
+INSERT INTO jsonb_stats_test SELECT '{}' FROM generate_series(1, 100);
+INSERT INTO jsonb_stats_test SELECT jsonb_build_object('foo', 'bar') FROM generate_series(1, 100);
+INSERT INTO jsonb_stats_test SELECT jsonb_build_object('foo', 'baz' || (i % 10)) FROM generate_series(1, 300) i;
+INSERT INTO jsonb_stats_test SELECT jsonb_build_object('foo', i % 10) FROM generate_series(1, 200) i;
+INSERT INTO jsonb_stats_test SELECT jsonb_build_object('"foo \"bar"', i % 10) FROM generate_series(1, 200) i;
+INSERT INTO jsonb_stats_test SELECT '[]' FROM generate_series(1, 100);
+INSERT INTO jsonb_stats_test SELECT '["foo"]' FROM generate_series(1, 200);
+INSERT INTO jsonb_stats_test SELECT '[12345]' FROM generate_series(1, 300);
+INSERT INTO jsonb_stats_test SELECT '[["foo"]]' FROM generate_series(1, 200);
+INSERT INTO jsonb_stats_test SELECT '[{"key": "foo"}]' FROM generate_series(1, 200);
+INSERT INTO jsonb_stats_test SELECT '[null, "foo"]' FROM generate_series(1, 200);
+INSERT INTO jsonb_stats_test SELECT '[null, 12345]' FROM generate_series(1, 300);
+INSERT INTO jsonb_stats_test SELECT '[null, ["foo"]]' FROM generate_series(1, 200);
+INSERT INTO jsonb_stats_test SELECT '[null, {"key": "foo"}]' FROM generate_series(1, 200);
+-- Build random variable-length integer arrays
+SELECT setseed(0.0);
+ setseed 
+---------
+ 
+(1 row)
+
+INSERT INTO jsonb_stats_test
+SELECT jsonb_build_object('array',
+	jsonb_build_array())
+FROM generate_series(1, 1000);
+INSERT INTO jsonb_stats_test
+SELECT jsonb_build_object('array',
+	jsonb_build_array(
+		floor(random() * 10)::int))
+FROM generate_series(1, 4000);
+INSERT INTO jsonb_stats_test
+SELECT jsonb_build_object('array',
+	jsonb_build_array(
+		floor(random() * 10)::int,
+		floor(random() * 10)::int))
+FROM generate_series(1, 3000);
+INSERT INTO jsonb_stats_test
+SELECT jsonb_build_object('array',
+	jsonb_build_array(
+		floor(random() * 10)::int,
+		floor(random() * 10)::int,
+		floor(random() * 10)::int))
+FROM generate_series(1, 2000);
+ANALYZE jsonb_stats_test;
+CREATE OR REPLACE FUNCTION check_jsonb_stats_test_estimate(sql_condition text, accuracy real)
+RETURNS boolean AS
+$$
+	SELECT check_estimate('SELECT count(*) FROM jsonb_stats_test WHERE ' || sql_condition, accuracy)
+$$ LANGUAGE sql;
+DROP FUNCTION IF EXISTS check_jsonb_stats_test_estimate2(text, real);
+NOTICE:  function check_jsonb_stats_test_estimate2(text,pg_catalog.float4) does not exist, skipping
+CREATE OR REPLACE FUNCTION check_jsonb_stats_test_estimate2(sql_condition text, accuracy real)
+RETURNS TABLE(plan integer, actual integer) AS
+$$
+	SELECT get_plan_and_actual_rows('SELECT count(*) FROM jsonb_stats_test WHERE ' || sql_condition)
+$$ LANGUAGE sql;
+-- Check NULL estimate
+SELECT check_jsonb_stats_test_estimate($$js IS NULL$$, 0.03);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'bad_key' IS NULL$$, 0.01);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js #> '{bad_key}' IS NULL$$, 0.01);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 1000000 IS NULL$$, 0.01);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js #> '{1000000}' IS NULL$$, 0.01);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'bad_key1' -> 'bad_key2' IS NULL$$, 0.01);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js #> '{bad_key1,bad_key2}' IS NULL$$, 0.01);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'bad_key1' -> 1 IS NULL$$, 0.01);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js #> '{bad_key1,1}' IS NULL$$, 0.01);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 1000000 -> 'foo' IS NULL$$, 0.01);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js #> '{1000000,foo}' IS NULL$$, 0.01);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'bad_key' = '123'$$, 0.01);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 1000000 = '123'$$, 0.01);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+-- Check null eq estimate
+SELECT check_jsonb_stats_test_estimate($$js =  'null'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> 'null'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+-- Check boolean eq estimate
+SELECT check_jsonb_stats_test_estimate($$js =  'true'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> 'true'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js =  'false'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> 'false'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+-- Check numeric eq estimate
+SELECT check_jsonb_stats_test_estimate($$js = '12345'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js#>'{}' = '12345'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js = '3000'$$, 0.3);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js = '1234'$$, 1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> '6000'$$, 0.2);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+-- Check numeric range estimate
+SELECT check_jsonb_stats_test_estimate($$js < '0'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js < '100'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js < '1000'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js < '3456'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js < '10000'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js < '100000'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js > '100' AND js < '600'$$, 0.5);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js > '6800' AND js < '12000'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+-- Check string eq estimate
+SELECT check_jsonb_stats_test_estimate($$js = '"foo"'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js = '"bar7"'$$, 0.2);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js = '"baz1234"'$$, 10);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> '"bar4"'$$, 0.3);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+-- Check string range estimate
+SELECT check_jsonb_stats_test_estimate($$js > '"foo"'$$, 0.01);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js > '"bar"'$$, 0.01);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js > '"baz"'$$, 0.01);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+-- Check object eq estimate
+SELECT check_jsonb_stats_test_estimate($$js = '{}'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js > '{}'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+-- Check object key eq estimate
+SELECT check_jsonb_stats_test_estimate($$js -> 'foo' = '"bar"'$$, 0.2);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'foo' = '"baz"'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'foo' = '"baz5"'$$, 0.3);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js #> '{foo}' = '"bar"'$$, 0.2);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+-- Check object key range estimate
+SELECT check_jsonb_stats_test_estimate($$js -> 'foo' >= '"baz2"'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'foo' <  '"baz9"'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'foo' >= '"baz2"' AND js -> 'foo' < '"baz9"'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+-- Check array eq estimate
+SELECT check_jsonb_stats_test_estimate($$js = '[]'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js >= '[]' AND js < '{}'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+-- Check variable-length array element eq estimate
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 0 = '1'$$, 0.2);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 1 = '6'$$, 0.2);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 2 = '8'$$, 0.2);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 3 = '1'$$, 0.2);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+-- Check variable-length array element range estimate
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 0 < '7'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 1 < '7'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 2 < '7'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 3 < '7'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+-- Check variable-length array containment estimate
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' @> '[]'$$, 0.2);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' @> '[1]'$$, 0.2);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' @> '[100]'$$, 0.2);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' @> '[1, 2]'$$, 1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' @> '[1, 100]'$$, 1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' @> '[1, 2, 100]'$$, 1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' @> '[1, 2, 3]'$$, 5);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' @> '1'$$, 0.3);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' @> '100'$$, 10);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 0 @> '1'$$, 0.3);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 1 @> '1'$$, 0.3);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 2 @> '1'$$, 0.3);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 3 @> '1'$$, 0.3);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 0 @> '[1]'$$, 0.3);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> '{"array": []}'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> '{"array": [1]}'$$, 0.3);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> '{"array": [100]}'$$, 0.3);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> '{"array": [1, 2]}'$$, 1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> '{"array": [1, 100]}'$$, 1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> '{"array": [1, 2, 100]}'$$, 1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> '{"array": [1, 2, 3]}'$$, 100);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+-- check misc containment
+SELECT check_jsonb_stats_test_estimate($$js @> '"foo"'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> '12345'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> '[]'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> '[12345]'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> '["foo"]'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> '[["foo", "bar"]]'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> '[["foo"]]'$$, 0.2);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> '[{"key": "foo"}]'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> '[null]'$$, 0.3);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+-- Check object key null estimate
+SELECT check_jsonb_stats_test_estimate($$js -> 'foo' IS NULL$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'foo' IS NOT NULL$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> '"foo \"bar"' IS NOT NULL$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'bad_key' IS NULL$$, 0.01);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'bad_key' IS NOT NULL$$, 0.01);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+-- Check object key existence
+SELECT check_jsonb_stats_test_estimate($$js ? 'bad_key'$$, 10);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js ? 'foo'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js ? 'array'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js ?| '{foo,bad_key}'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js ?| '{foo,array}'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js ?& '{foo,bad_key}'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js ?& '{foo,bar}'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index b58b062b10d..ffbce87296e 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2535,6 +2535,38 @@ pg_stats_ext_exprs| SELECT cn.nspname AS schemaname,
      LEFT JOIN pg_namespace sn ON ((sn.oid = s.stxnamespace)))
      JOIN LATERAL ( SELECT unnest(pg_get_statisticsobjdef_expressions(s.oid)) AS expr,
             unnest(sd.stxdexpr) AS a) stat ON ((stat.expr IS NOT NULL)));
+pg_stats_json| SELECT n.nspname AS schemaname,
+    c.relname AS tablename,
+    a.attname,
+    (paths.path ->> 'path'::text) AS json_path,
+    s.stainherit AS inherited,
+    (((paths.path -> 'json'::text) ->> 'nullfrac'::text))::real AS null_frac,
+    (((paths.path -> 'json'::text) ->> 'width'::text))::real AS avg_width,
+    (((paths.path -> 'json'::text) ->> 'distinct'::text))::real AS n_distinct,
+    ARRAY( SELECT val.value AS val
+           FROM jsonb_array_elements((((paths.path -> 'json'::text) -> 'mcv'::text) -> 'values'::text)) val(value)) AS most_common_vals,
+    ARRAY( SELECT ((num.value)::text)::real AS num
+           FROM jsonb_array_elements((((paths.path -> 'json'::text) -> 'mcv'::text) -> 'numbers'::text)) num(value)) AS most_common_freqs,
+    ARRAY( SELECT val.value AS val
+           FROM jsonb_array_elements((((paths.path -> 'json'::text) -> 'histogram'::text) -> 'values'::text)) val(value)) AS histogram_bounds,
+    ARRAY( SELECT ((val.value)::text)::integer AS val
+           FROM jsonb_array_elements((((paths.path -> 'array_length'::text) -> 'mcv'::text) -> 'values'::text)) val(value)) AS most_common_array_lengths,
+    ARRAY( SELECT ((num.value)::text)::real AS num
+           FROM jsonb_array_elements((((paths.path -> 'array_length'::text) -> 'mcv'::text) -> 'numbers'::text)) num(value)) AS most_common_array_length_freqs,
+    (((paths.path -> 'json'::text) ->> 'correlation'::text))::real AS correlation
+   FROM (((pg_statistic s
+     JOIN pg_class c ON ((c.oid = s.starelid)))
+     JOIN pg_attribute a ON (((c.oid = a.attrelid) AND (a.attnum = s.staattnum))))
+     LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace))),
+    LATERAL ( SELECT unnest((((
+                CASE
+                    WHEN (s.stakind1 = 8) THEN s.stavalues1
+                    WHEN (s.stakind2 = 8) THEN s.stavalues2
+                    WHEN (s.stakind3 = 8) THEN s.stavalues3
+                    WHEN (s.stakind4 = 8) THEN s.stavalues4
+                    WHEN (s.stakind5 = 8) THEN s.stavalues5
+                    ELSE NULL::anyarray
+                END)::text)::jsonb[])[2:]) AS path) paths;
 pg_tables| SELECT n.nspname AS schemaname,
     c.relname AS tablename,
     pg_get_userbyid(c.relowner) AS tableowner,
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 5b0c73d7e37..d108cc62107 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -112,7 +112,7 @@ test: select_views portals_p2 foreign_key cluster dependency guc bitmapops combo
 # ----------
 # Another group of parallel tests (JSON related)
 # ----------
-test: json jsonb json_encoding jsonpath jsonpath_encoding jsonb_jsonpath
+test: json jsonb json_encoding jsonpath jsonpath_encoding jsonb_jsonpath jsonb_stats
 
 # ----------
 # Another group of parallel tests
diff --git a/src/test/regress/sql/jsonb_stats.sql b/src/test/regress/sql/jsonb_stats.sql
new file mode 100644
index 00000000000..37c83aa8e32
--- /dev/null
+++ b/src/test/regress/sql/jsonb_stats.sql
@@ -0,0 +1,249 @@
+CREATE OR REPLACE FUNCTION explain_jsonb(sql_query text)
+RETURNS TABLE(explain_line json) AS
+$$
+BEGIN
+	RETURN QUERY EXECUTE 'EXPLAIN (ANALYZE, FORMAT json) ' || sql_query;
+END;
+$$ LANGUAGE plpgsql;
+
+CREATE OR REPLACE FUNCTION get_plan_and_actual_rows(sql_query text)
+RETURNS TABLE(plan integer, actual integer) AS
+$$
+	SELECT
+		(plan->>'Plan Rows')::integer plan,
+		(plan->>'Actual Rows')::integer actual
+	FROM (
+		SELECT explain_jsonb(sql_query) #> '{0,Plan,Plans,0}'
+	) p(plan)
+$$ LANGUAGE sql;
+
+CREATE OR REPLACE FUNCTION check_estimate(sql_query text, accuracy real)
+RETURNS boolean AS
+$$
+	SELECT plan BETWEEN actual / (1 + accuracy) AND (actual + 1) * (1 + accuracy)
+	FROM (SELECT * FROM get_plan_and_actual_rows(sql_query)) x
+$$ LANGUAGE sql;
+
+CREATE OR REPLACE FUNCTION check_estimate2(sql_query text, accuracy real)
+RETURNS TABLE(min integer, max integer) AS
+$$
+	SELECT (actual * (1 - accuracy))::integer, ((actual + 1) * (1 + accuracy))::integer
+	FROM (SELECT * FROM get_plan_and_actual_rows(sql_query)) x
+$$ LANGUAGE sql;
+
+CREATE TABLE jsonb_stats_test(js jsonb);
+
+INSERT INTO jsonb_stats_test SELECT NULL FROM generate_series(1, 1000);
+
+INSERT INTO jsonb_stats_test SELECT 'null' FROM generate_series(1, 200);
+INSERT INTO jsonb_stats_test SELECT 'true' FROM generate_series(1, 300);
+INSERT INTO jsonb_stats_test SELECT 'false' FROM generate_series(1, 500);
+
+INSERT INTO jsonb_stats_test SELECT '12345' FROM generate_series(1, 100);
+INSERT INTO jsonb_stats_test SELECT (1000 * (i % 10))::text::jsonb FROM generate_series(1, 400) i;
+INSERT INTO jsonb_stats_test SELECT i::text::jsonb FROM generate_series(1, 500) i;
+
+INSERT INTO jsonb_stats_test SELECT '"foo"' FROM generate_series(1, 100);
+INSERT INTO jsonb_stats_test SELECT format('"bar%s"', i % 10)::jsonb FROM generate_series(1, 400) i;
+INSERT INTO jsonb_stats_test SELECT format('"baz%s"', i)::jsonb FROM generate_series(1, 500) i;
+
+INSERT INTO jsonb_stats_test SELECT '{}' FROM generate_series(1, 100);
+INSERT INTO jsonb_stats_test SELECT jsonb_build_object('foo', 'bar') FROM generate_series(1, 100);
+INSERT INTO jsonb_stats_test SELECT jsonb_build_object('foo', 'baz' || (i % 10)) FROM generate_series(1, 300) i;
+INSERT INTO jsonb_stats_test SELECT jsonb_build_object('foo', i % 10) FROM generate_series(1, 200) i;
+INSERT INTO jsonb_stats_test SELECT jsonb_build_object('"foo \"bar"', i % 10) FROM generate_series(1, 200) i;
+
+INSERT INTO jsonb_stats_test SELECT '[]' FROM generate_series(1, 100);
+INSERT INTO jsonb_stats_test SELECT '["foo"]' FROM generate_series(1, 200);
+INSERT INTO jsonb_stats_test SELECT '[12345]' FROM generate_series(1, 300);
+INSERT INTO jsonb_stats_test SELECT '[["foo"]]' FROM generate_series(1, 200);
+INSERT INTO jsonb_stats_test SELECT '[{"key": "foo"}]' FROM generate_series(1, 200);
+INSERT INTO jsonb_stats_test SELECT '[null, "foo"]' FROM generate_series(1, 200);
+INSERT INTO jsonb_stats_test SELECT '[null, 12345]' FROM generate_series(1, 300);
+INSERT INTO jsonb_stats_test SELECT '[null, ["foo"]]' FROM generate_series(1, 200);
+INSERT INTO jsonb_stats_test SELECT '[null, {"key": "foo"}]' FROM generate_series(1, 200);
+
+-- Build random variable-length integer arrays
+SELECT setseed(0.0);
+
+INSERT INTO jsonb_stats_test
+SELECT jsonb_build_object('array',
+	jsonb_build_array())
+FROM generate_series(1, 1000);
+
+INSERT INTO jsonb_stats_test
+SELECT jsonb_build_object('array',
+	jsonb_build_array(
+		floor(random() * 10)::int))
+FROM generate_series(1, 4000);
+
+INSERT INTO jsonb_stats_test
+SELECT jsonb_build_object('array',
+	jsonb_build_array(
+		floor(random() * 10)::int,
+		floor(random() * 10)::int))
+FROM generate_series(1, 3000);
+
+INSERT INTO jsonb_stats_test
+SELECT jsonb_build_object('array',
+	jsonb_build_array(
+		floor(random() * 10)::int,
+		floor(random() * 10)::int,
+		floor(random() * 10)::int))
+FROM generate_series(1, 2000);
+
+
+ANALYZE jsonb_stats_test;
+
+CREATE OR REPLACE FUNCTION check_jsonb_stats_test_estimate(sql_condition text, accuracy real)
+RETURNS boolean AS
+$$
+	SELECT check_estimate('SELECT count(*) FROM jsonb_stats_test WHERE ' || sql_condition, accuracy)
+$$ LANGUAGE sql;
+
+DROP FUNCTION IF EXISTS check_jsonb_stats_test_estimate2(text, real);
+
+CREATE OR REPLACE FUNCTION check_jsonb_stats_test_estimate2(sql_condition text, accuracy real)
+RETURNS TABLE(plan integer, actual integer) AS
+$$
+	SELECT get_plan_and_actual_rows('SELECT count(*) FROM jsonb_stats_test WHERE ' || sql_condition)
+$$ LANGUAGE sql;
+
+-- Check NULL estimate
+SELECT check_jsonb_stats_test_estimate($$js IS NULL$$, 0.03);
+SELECT check_jsonb_stats_test_estimate($$js -> 'bad_key' IS NULL$$, 0.01);
+SELECT check_jsonb_stats_test_estimate($$js #> '{bad_key}' IS NULL$$, 0.01);
+SELECT check_jsonb_stats_test_estimate($$js -> 1000000 IS NULL$$, 0.01);
+SELECT check_jsonb_stats_test_estimate($$js #> '{1000000}' IS NULL$$, 0.01);
+SELECT check_jsonb_stats_test_estimate($$js -> 'bad_key1' -> 'bad_key2' IS NULL$$, 0.01);
+SELECT check_jsonb_stats_test_estimate($$js #> '{bad_key1,bad_key2}' IS NULL$$, 0.01);
+SELECT check_jsonb_stats_test_estimate($$js -> 'bad_key1' -> 1 IS NULL$$, 0.01);
+SELECT check_jsonb_stats_test_estimate($$js #> '{bad_key1,1}' IS NULL$$, 0.01);
+SELECT check_jsonb_stats_test_estimate($$js -> 1000000 -> 'foo' IS NULL$$, 0.01);
+SELECT check_jsonb_stats_test_estimate($$js #> '{1000000,foo}' IS NULL$$, 0.01);
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'bad_key' = '123'$$, 0.01);
+SELECT check_jsonb_stats_test_estimate($$js -> 1000000 = '123'$$, 0.01);
+
+-- Check null eq estimate
+SELECT check_jsonb_stats_test_estimate($$js =  'null'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js @> 'null'$$, 0.1);
+
+-- Check boolean eq estimate
+SELECT check_jsonb_stats_test_estimate($$js =  'true'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js @> 'true'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js =  'false'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js @> 'false'$$, 0.1);
+
+-- Check numeric eq estimate
+SELECT check_jsonb_stats_test_estimate($$js = '12345'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js#>'{}' = '12345'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js = '3000'$$, 0.3);
+SELECT check_jsonb_stats_test_estimate($$js = '1234'$$, 1);
+SELECT check_jsonb_stats_test_estimate($$js @> '6000'$$, 0.2);
+
+-- Check numeric range estimate
+SELECT check_jsonb_stats_test_estimate($$js < '0'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js < '100'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js < '1000'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js < '3456'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js < '10000'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js < '100000'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js > '100' AND js < '600'$$, 0.5);
+SELECT check_jsonb_stats_test_estimate($$js > '6800' AND js < '12000'$$, 0.1);
+
+-- Check string eq estimate
+SELECT check_jsonb_stats_test_estimate($$js = '"foo"'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js = '"bar7"'$$, 0.2);
+SELECT check_jsonb_stats_test_estimate($$js = '"baz1234"'$$, 10);
+SELECT check_jsonb_stats_test_estimate($$js @> '"bar4"'$$, 0.3);
+
+-- Check string range estimate
+SELECT check_jsonb_stats_test_estimate($$js > '"foo"'$$, 0.01);
+SELECT check_jsonb_stats_test_estimate($$js > '"bar"'$$, 0.01);
+SELECT check_jsonb_stats_test_estimate($$js > '"baz"'$$, 0.01);
+
+-- Check object eq estimate
+SELECT check_jsonb_stats_test_estimate($$js = '{}'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js > '{}'$$, 0.1);
+
+-- Check object key eq estimate
+SELECT check_jsonb_stats_test_estimate($$js -> 'foo' = '"bar"'$$, 0.2);
+SELECT check_jsonb_stats_test_estimate($$js -> 'foo' = '"baz"'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js -> 'foo' = '"baz5"'$$, 0.3);
+SELECT check_jsonb_stats_test_estimate($$js #> '{foo}' = '"bar"'$$, 0.2);
+
+-- Check object key range estimate
+SELECT check_jsonb_stats_test_estimate($$js -> 'foo' >= '"baz2"'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js -> 'foo' <  '"baz9"'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js -> 'foo' >= '"baz2"' AND js -> 'foo' < '"baz9"'$$, 0.1);
+
+-- Check array eq estimate
+SELECT check_jsonb_stats_test_estimate($$js = '[]'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js >= '[]' AND js < '{}'$$, 0.1);
+
+-- Check variable-length array element eq estimate
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 0 = '1'$$, 0.2);
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 1 = '6'$$, 0.2);
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 2 = '8'$$, 0.2);
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 3 = '1'$$, 0.2);
+
+-- Check variable-length array element range estimate
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 0 < '7'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 1 < '7'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 2 < '7'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 3 < '7'$$, 0.1);
+
+-- Check variable-length array containment estimate
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' @> '[]'$$, 0.2);
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' @> '[1]'$$, 0.2);
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' @> '[100]'$$, 0.2);
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' @> '[1, 2]'$$, 1);
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' @> '[1, 100]'$$, 1);
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' @> '[1, 2, 100]'$$, 1);
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' @> '[1, 2, 3]'$$, 5);
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' @> '1'$$, 0.3);
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' @> '100'$$, 10);
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 0 @> '1'$$, 0.3);
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 1 @> '1'$$, 0.3);
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 2 @> '1'$$, 0.3);
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 3 @> '1'$$, 0.3);
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 0 @> '[1]'$$, 0.3);
+
+SELECT check_jsonb_stats_test_estimate($$js @> '{"array": []}'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js @> '{"array": [1]}'$$, 0.3);
+SELECT check_jsonb_stats_test_estimate($$js @> '{"array": [100]}'$$, 0.3);
+SELECT check_jsonb_stats_test_estimate($$js @> '{"array": [1, 2]}'$$, 1);
+SELECT check_jsonb_stats_test_estimate($$js @> '{"array": [1, 100]}'$$, 1);
+SELECT check_jsonb_stats_test_estimate($$js @> '{"array": [1, 2, 100]}'$$, 1);
+SELECT check_jsonb_stats_test_estimate($$js @> '{"array": [1, 2, 3]}'$$, 100);
+
+-- check misc containment
+SELECT check_jsonb_stats_test_estimate($$js @> '"foo"'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js @> '12345'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js @> '[]'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js @> '[12345]'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js @> '["foo"]'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js @> '[["foo", "bar"]]'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js @> '[["foo"]]'$$, 0.2);
+SELECT check_jsonb_stats_test_estimate($$js @> '[{"key": "foo"}]'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js @> '[null]'$$, 0.3);
+
+-- Check object key null estimate
+SELECT check_jsonb_stats_test_estimate($$js -> 'foo' IS NULL$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js -> 'foo' IS NOT NULL$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js -> '"foo \"bar"' IS NOT NULL$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js -> 'bad_key' IS NULL$$, 0.01);
+SELECT check_jsonb_stats_test_estimate($$js -> 'bad_key' IS NOT NULL$$, 0.01);
+
+-- Check object key existence
+SELECT check_jsonb_stats_test_estimate($$js ? 'bad_key'$$, 10);
+SELECT check_jsonb_stats_test_estimate($$js ? 'foo'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js ? 'array'$$, 0.1);
+
+SELECT check_jsonb_stats_test_estimate($$js ?| '{foo,bad_key}'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js ?| '{foo,array}'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js ?& '{foo,bad_key}'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js ?& '{foo,bar}'$$, 0.1);
-- 
2.31.1

Tomas Vondra

tomas.vondra@enterprisedb.com

about 4 years ago

In reply to: Simon Riggs (#5)

Re: Collecting statistics about contents of JSONB columns

On 1/5/22 21:22, Simon Riggs wrote:

On Fri, 31 Dec 2021 at 22:07, Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

The patch does something far more
elegant - it simply uses stavalues to store an array of JSONB documents,
each describing stats for one path extracted from the sampled documents.

Sounds good.

I'm sure there's plenty open questions - for example I think we'll need
some logic to decide which paths to keep, otherwise the statistics can
get quite big, if we're dealing with large / variable documents. We're
already doing something similar for MCV lists.

One of Nikita's patches not included in this thread allow "selective"
statistics, where you can define in advance a "filter" restricting which
parts are considered interesting by ANALYZE. That's interesting, but I
think we need some simple MCV-like heuristics first anyway.

Another open question is how deep the stats should be. Imagine documents
like this:

{"a" : {"b" : {"c" : {"d" : ...}}}}

The current patch build stats for all possible paths:

"a"
"a.b"
"a.b.c"
"a.b.c.d"

and of course many of the paths will often have JSONB documents as
values, not simple scalar values. I wonder if we should limit the depth
somehow, and maybe build stats only for scalar values.

The user interface for designing filters sounds hard, so I'd hope we
can ignore that for now.

Not sure I understand. I wasn't suggesting any user-defined filtering,
but something done by default, similarly to what we do for regular MCV
lists, based on frequency. We'd include frequent paths while excluding
rare ones.

So no need for a user interface.

That might not work for documents with stable schema and a lot of
top-level paths, because all the top-level paths have 1.0 frequency. But
for documents with dynamic schema (different documents having different
schemas/paths) it might help.

Similarly for the non-scalar values - I don't think we can really keep
regular statistics on such values (for the same reason why it's not
enough for whole JSONB columns), so why to build/store that anyway.

Nikita did implement a way to specify custom filters using jsonpath, but
I did not include that into this patch series. And questions regarding
the interface were one of the reasons.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#10

Nikita Glukhov

n.gluhov@postgrespro.ru

almost 4 years ago

In reply to: Tomas Vondra (#8)

6 attachment(s)

Re: Collecting statistics about contents of JSONB columns

Hi!

I am glad that you found my very old patch interesting and started to
work on it. We failed to post it in 2016 mostly because we were not
satisfied with JSONB storage. Also we decided to wait for completion
of work on extended statistics as we thought that it could help us.
But in early 2017 we switched to SQL/JSON and forgot about this patch.

I think custom datatype is necessary for better performance. With a
plain JSONB we need to do a lot of work for extraction of path stats:
- iterate through MCV/Histogram JSONB arrays
- cast numeric values to float, string to text etc.
- build PG arrays from extracted datums
- form pg_statistic tuple.

With a custom data type we could store pg_statistic tuple unmodified
and use it without any overhead. But then we need modify a bit
VariableStatData and several functions to pass additional nullfrac
corrections.

Maybe simple record type (path text, data pg_statistic, ext_data jsonb)
would be enough.

Also there is an idea to store per-path separately in pg_statistic_ext
rows using expression like (jb #> '{foo,bar}') as stxexprs. This could
also help user to select which paths to analyze simply by using some
sort of CREATE STATISTICS. But it is really unclear how to:
* store pg_statistic_ext rows from typanalyze
* form path expressions for array elements (maybe add new jsonpath
operator)
* match various -> operator chains to stxexprs
* jsonpath-like languages need simple API for searching by stxexprs

Per-path statistics should only be collected for scalars. This can be
enabled by flag JsonAnalyzeContext.scalarsOnly. But there are is a
problem: computed MCVs and histograms will be broken and we will not be
able to use them for queries like (jb > const) in general case. Also
we will not be and use internally in scalarineqsel() and var_eq_const()
(see jsonSelectivity()). Instead, we will have to implement own
estimator functions for JSONB comparison operators that will correctly
use our hacked MCVs and histograms (and maybe not all cases will be
supported; for example, comparison to scalars only).

It's possible to replace objects and arrays with empty ones when
scalarsOnly is set to keep correct frequencies of non-scalars.
But there is an old bug in JSONB comparison: empty arrays are placed
before other values in the JSONB sort order, although they should go
after all scalars. So we need also to distinguish empty and non-empty
arrays here.

I tried to fix a major part of places marked as XXX and FIXME, the new
version of the patches is attached. There are a lot of changes, you
can see them in a step-by-step form in the corresponding branch
jsonb_stats_20220122 in our GitHub repo [1]https://github.com/postgrespro/postgres/tree/jsonb_stats_20220122.

Below is the explanation of fixed XXXs:

Patch 0001

src/backend/commands/operatorcmds.c:

XXX Maybe not the right name, because "estimator" implies we're
calculating selectivity. But we're actually deriving statistics for
an expression.

Renamed to "Derivator".

XXX perhaps full "statistics" wording would be better

Renamed to "statistics".

src/backend/utils/adt/selfuncs.c:

examine_operator_expression():

XXX Not sure why this returns bool - we ignore the return value
anyway. We might as well return the calculated vardata (or NULL).

Oprstat was changed to return void.

XXX Not sure what to do about recursion - there can be another OpExpr
in one of the arguments, and we should call this recursively from the
oprstat procedure. But that's not possible, because it's marked as
static.

Oprstat call chain: get_restriction_variable() => examine_variable() =>
examine_operator_expression().

The main thing here is that OpExpr with oprstat acts like ordinary Var.

examine_variable():
XXX Shouldn't this put more restrictions on the OpExpr? E.g. that
one of the arguments has to be a Const or something?

This is simply a responsibility of oprstat.

Patch 0004

XXX Not sure we want to add these functions to jsonb.h, which is the
public API. Maybe it belongs rather to jsonb_typanalyze.c or
elsewhere, closer to how it's used?

Maybe it needs to be moved the new file jsonb_utils.h. I think these
functions become very useful, when we start to build JSONBs with
predefined structure.

pushJsonbValue():
XXX I'm not quite sure why we actually do this? Why do we need to
change how JsonbValue is converted to Jsonb for the statistics patch?

Scalars in JSONB are encoded as one-element preudo-array containers.
So when we are inserting binary JsonbValues, that was initialized
directly from JSONB datums, inside other non-empty JSONB in
pushJsonbValue(), all values except scalars are inserted as
expected but scalars become [scalar]. So we need to extract scalar
values in caller functions or in pushJsonbValue(). I think
autoextraction in pushJsonbValue() is better. This case simply was not
used before JSONB stats introduction.

Patch 0006

src/backend/utils/adt/jsonb_selfuncs.c

jsonPathStatsGetSpecialStats()
XXX This does not really extract any stats, it merely allocates the
struct?

Renamed with "Alloc" suffix.

jsonPathStatsGetArrayLengthStats()
XXX Why doesn't this do jsonPathStatsGetTypeFreq check similar to
what jsonPathStatsGetLengthStats does?

"length" stats were collected inside parent paths, but "array_length"
and "avg_array_length" stats were collected inside child array paths.
This resulted in inconsistencies in TypeFreq checks.

I have removed "length" stats, moved "array_length" and
"avg_array_length" to the parent path, and added separate
"object_length" stats. TypeFreq checks become consistent.

XXX Seems to do essentially what jsonStatsFindPath, except that it
also considers jsdata->prefix. Seems fairly easy to combine those
into a single function.

I don't think that it would be better to combine those function into
one with considerPrefix flag, because jsonStatsFindPathStats() is a
some kind of low-level function which is called only in two places.
In all other places considerPrefix will be true. Also
jsonStatsFindPathStatsStr() is exported in jsonb_selfuncs.h for to give
external jsonpath-like query operators ability to use JSON statistics.

jsonPathStatsCompare()
XXX Seems a bit convoluted to first cast it to Datum, then Jsonb ...
Datum const *pdatum = pv2;
Jsonb *jsonb = DatumGetJsonbP(*pdatum);

The problem of simply using 'jsonb = *(Jsonb **) pv2' is that
DatumGetJsonbP() may deTOAST datums.

XXX Not sure about this? Does empty path mean global stats?

Empty "path" is simply a sign of invalid stats data.

jsonStatsConvertArray()
FIXME Does this actually work on all 32/64-bit systems? What if typid
is FLOAT8OID or something? Should look at TypeCache instead, probably.

Used get_typlenbyvalalign() instead of hardcoded values.

jsonb_stats()
XXX It might be useful to allow recursion, i.e.
get_restriction_variable might derive statistics too.
I don't think it does that now, right?

get_restriction_variable() already might derive statistics: it calls
examine_variable() on its both operands, and examine_variable() calls
examine_operator_expression().

XXX Could we also get varonleft=false in useful cases?

All supported JSONB operators have signature jsonb OP arg.
If varonleft = false, then we need to derive stats for expression like
'{"foo": "bar"}' -> text_column
having stats for text_column. It is possible to implement too, but it
is not related to JSONB stats and needs to be done in the separate
patch.

jsonSelectivityContains():
XXX This really needs more comments explaining the logic.

I have refactored this function and added comments.

jsonGetFloat4():
XXX Not sure assert is a sufficient protection against different
types of JSONB values to be passed in.

I have refactored this function by passing a default value for
non-numeric JSONB case.

jsonPathAppendEntryWithLen()
XXX Doesn't this need ecape_json too?

Comment is removed, because jsonPathAppendEntry() is called inside.

jsonPathStatsGetTypeFreq()
FIXME This is really hard to read/understand, with two nested ternary
operators.

I have refactored this place.

XXX Seems more like an error, no? Why ignore it?

It's not an error, it's request of non-numeric type. Length values
are always numeric, and frequence of non-numeric values is 0.

The possible case when it could happen is jsonpath '$.size() == "foo"'.
Estimator for operator == will check frequence of strings in .size()
and it will be 0.

jsonPathStatsFormTuple()
FIXME What does this mean?

There is no need to transform ordinary root path stats tuple, it can be
simply copied.

jsonb_typanalyze.c

XXX We need entry+lenth because JSON path elements may contain null
bytes, I guess?

'entry' can be not zero-terminated when it is pointing to JSONB keys,
so 'len' is necessary field. 'len' is also used for faster entry
comparison, to distinguish array entries ('len' == -1).

XXX Sould be JsonPathEntryMatch as it deals with JsonPathEntry nodes
not whole paths, no?
XXX Again, maybe JsonPathEntryHash would be a better name?

Functions renamed using JsonPathEntry prefix, JsonPath typedef removed.

JsonPathMatch()
XXX Seems a bit silly to return int, when the return statement only
really returns bool (because of how it compares paths). It's not really
a comparator for sorting, for example.

This function is implementation of HashCompareFunc and it needs to
return int.

jsonAnalyzeJson()
XXX The name seems a bit weird, with the two json bits.

Renamed to jsonAnalyzeCollectPaths(). The two similar functions were
renamed too.

jsonAnalyzeJson():
XXX The mix of break/return statements in this block is really
confusing.

I have refactored this place using only breaks.

XXX not sure why we're doing this?

Manual recursion into containers by creating child iterator together
with skipNested=true flag is used to give jsonAnalyzeJsonValue()
ability to access jbvBinary containers.

compute_json_stats()
XXX Not sure what the first branch is doing (or supposed to)?

XXX It's not immediately clear why this is (-1) and not simply
NULL. It crashes, so presumably it's used to tweak the behavior,
but it's not clear why/how, and it affects place that is pretty
far away, and so not obvious. We should use some sort of flag
with a descriptive name instead.

XXX If I understand correctly, we simply collect all paths first,
without accumulating any Values. And then in the next step we
process each path independently, probably to save memory (we
don't want to accumulate all values for all paths, with a lot
of duplicities).

There are two variants of stats collection:
* single-pass - collect all values for all paths
* multi-pass - collect only values for a one path at each pass

The first variant can consume too much memory (jsonb iteration produces
a lot of garbage etc.), but works faster than second.

The first 'if (false)' is used for manual selection of one of this
variants. This selection should be controlled by some user-specified
option (maybe GUC), or the first variant can be simply removed.

jsonAnalyzeJson()'s parameter of type JsonPathAnlStats * determines
which paths we need to consider for the value collection:
* NULL - collect values for all paths
* -1 - do not collect any values
* stats - collect values only for a given path

The last variant really is unused because we already have another
function jsonAnalyzeJsonPath(), which is optimized for selective path
values collection (used object key accessor JSONB functions instead of
full JSONB iteration). I have replaced this strange parameter with
simple boolean flag.

XXX Could the parameters be different on other platforms?

Used get_typlenbyvalalign(JSONBOID) instead of hardcoded values.

jsonAnalyzePathValues()
XXX Do we need to initialize all slots?

I have copied here the following comment from extended_stats.c:
/*
* The fields describing the stats->stavalues[n] element types default
* to the type of the data being analyzed, but the type-specific
* typanalyze function can change them if it wants to store something
* else.
*/

XXX Not sure why we divide it by the number of json values?

We divide counts of lengths by the total number of json values to
compute correct nullfrac. I.e. not all input jsons have lengths,
length of scalar jsons is NULL.

[1]: https://github.com/postgrespro/postgres/tree/jsonb_stats_20220122

--
Nikita Glukhov
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company

Attachments:

0001-Add-pg_operator.oprstat-for-derived-operator-statist.patchtext/x-patch; charset=UTF-8; name=0001-Add-pg_operator.oprstat-for-derived-operator-statist.patchDownload

From f334b5a210dd57659da2e202f398052158a5291a Mon Sep 17 00:00:00 2001
From: Nikita Glukhov <n.gluhov@postgrespro.ru>
Date: Sat, 12 Nov 2016 18:59:43 +0300
Subject: [PATCH 1/6] Add pg_operator.oprstat for derived operator statistics
 estimation

---
 src/backend/catalog/pg_operator.c      | 11 +++++
 src/backend/commands/operatorcmds.c    | 61 ++++++++++++++++++++++++++
 src/backend/utils/adt/selfuncs.c       | 38 ++++++++++++++++
 src/backend/utils/cache/lsyscache.c    | 24 ++++++++++
 src/include/catalog/pg_operator.h      |  4 ++
 src/include/utils/lsyscache.h          |  1 +
 src/test/regress/expected/oidjoins.out |  1 +
 7 files changed, 140 insertions(+)

diff --git a/src/backend/catalog/pg_operator.c b/src/backend/catalog/pg_operator.c
index 630bf3e56cc..9205e62c0eb 100644
--- a/src/backend/catalog/pg_operator.c
+++ b/src/backend/catalog/pg_operator.c
@@ -256,6 +256,7 @@ OperatorShellMake(const char *operatorName,
 	values[Anum_pg_operator_oprcode - 1] = ObjectIdGetDatum(InvalidOid);
 	values[Anum_pg_operator_oprrest - 1] = ObjectIdGetDatum(InvalidOid);
 	values[Anum_pg_operator_oprjoin - 1] = ObjectIdGetDatum(InvalidOid);
+	values[Anum_pg_operator_oprstat - 1] = ObjectIdGetDatum(InvalidOid);
 
 	/*
 	 * create a new operator tuple
@@ -301,6 +302,7 @@ OperatorShellMake(const char *operatorName,
  *		negatorName				X negator operator
  *		restrictionId			X restriction selectivity procedure ID
  *		joinId					X join selectivity procedure ID
+ *		statsId					X statistics derivation procedure ID
  *		canMerge				merge join can be used with this operator
  *		canHash					hash join can be used with this operator
  *
@@ -333,6 +335,7 @@ OperatorCreate(const char *operatorName,
 			   List *negatorName,
 			   Oid restrictionId,
 			   Oid joinId,
+			   Oid statsId,
 			   bool canMerge,
 			   bool canHash)
 {
@@ -505,6 +508,7 @@ OperatorCreate(const char *operatorName,
 	values[Anum_pg_operator_oprcode - 1] = ObjectIdGetDatum(procedureId);
 	values[Anum_pg_operator_oprrest - 1] = ObjectIdGetDatum(restrictionId);
 	values[Anum_pg_operator_oprjoin - 1] = ObjectIdGetDatum(joinId);
+	values[Anum_pg_operator_oprstat - 1] = ObjectIdGetDatum(statsId);
 
 	pg_operator_desc = table_open(OperatorRelationId, RowExclusiveLock);
 
@@ -855,6 +859,13 @@ makeOperatorDependencies(HeapTuple tuple,
 		add_exact_object_address(&referenced, addrs);
 	}
 
+	/* Dependency on statistics derivation function */
+	if (OidIsValid(oper->oprstat))
+	{
+		ObjectAddressSet(referenced, ProcedureRelationId, oper->oprstat);
+		add_exact_object_address(&referenced, addrs);
+	}
+
 	record_object_address_dependencies(&myself, addrs, DEPENDENCY_NORMAL);
 	free_object_addresses(addrs);
 
diff --git a/src/backend/commands/operatorcmds.c b/src/backend/commands/operatorcmds.c
index a5924d7d564..adf13e648a6 100644
--- a/src/backend/commands/operatorcmds.c
+++ b/src/backend/commands/operatorcmds.c
@@ -52,6 +52,7 @@
 
 static Oid	ValidateRestrictionEstimator(List *restrictionName);
 static Oid	ValidateJoinEstimator(List *joinName);
+static Oid	ValidateStatisticsDerivator(List *joinName);
 
 /*
  * DefineOperator
@@ -78,10 +79,12 @@ DefineOperator(List *names, List *parameters)
 	List	   *commutatorName = NIL;	/* optional commutator operator name */
 	List	   *negatorName = NIL;	/* optional negator operator name */
 	List	   *restrictionName = NIL;	/* optional restrict. sel. function */
+	List	   *statisticsName = NIL;	/* optional stats estimat. procedure */
 	List	   *joinName = NIL; /* optional join sel. function */
 	Oid			functionOid;	/* functions converted to OID */
 	Oid			restrictionOid;
 	Oid			joinOid;
+	Oid			statisticsOid;
 	Oid			typeId[2];		/* to hold left and right arg */
 	int			nargs;
 	ListCell   *pl;
@@ -131,6 +134,8 @@ DefineOperator(List *names, List *parameters)
 			restrictionName = defGetQualifiedName(defel);
 		else if (strcmp(defel->defname, "join") == 0)
 			joinName = defGetQualifiedName(defel);
+		else if (strcmp(defel->defname, "statistics") == 0)
+			statisticsName = defGetQualifiedName(defel);
 		else if (strcmp(defel->defname, "hashes") == 0)
 			canHash = defGetBoolean(defel);
 		else if (strcmp(defel->defname, "merges") == 0)
@@ -246,6 +251,10 @@ DefineOperator(List *names, List *parameters)
 		joinOid = ValidateJoinEstimator(joinName);
 	else
 		joinOid = InvalidOid;
+	if (statisticsName)
+		statisticsOid = ValidateStatisticsDerivator(statisticsName);
+	else
+		statisticsOid = InvalidOid;
 
 	/*
 	 * now have OperatorCreate do all the work..
@@ -260,6 +269,7 @@ DefineOperator(List *names, List *parameters)
 					   negatorName, /* optional negator operator name */
 					   restrictionOid,	/* optional restrict. sel. function */
 					   joinOid, /* optional join sel. function name */
+					   statisticsOid, /* optional stats estimation procedure */
 					   canMerge,	/* operator merges */
 					   canHash);	/* operator hashes */
 }
@@ -357,6 +367,40 @@ ValidateJoinEstimator(List *joinName)
 	return joinOid;
 }
 
+/*
+ * Look up a statistics estimator function by name, and verify that it has the
+ * correct signature and we have the permissions to attach it to an operator.
+ */
+static Oid
+ValidateStatisticsDerivator(List *statName)
+{
+	Oid			typeId[4];
+	Oid			statOid;
+	AclResult	aclresult;
+
+	typeId[0] = INTERNALOID;	/* PlannerInfo */
+	typeId[1] = INTERNALOID;	/* OpExpr */
+	typeId[2] = INT4OID;		/* varRelid */
+	typeId[3] = INTERNALOID;	/* VariableStatData */
+
+	statOid = LookupFuncName(statName, 4, typeId, false);
+
+	/* statistics estimators must return void */
+	if (get_func_rettype(statOid) != VOIDOID)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+				 errmsg("statistics estimator function %s must return type %s",
+						NameListToString(statName), "void")));
+
+	/* Require EXECUTE rights for the estimator */
+	aclresult = pg_proc_aclcheck(statOid, GetUserId(), ACL_EXECUTE);
+	if (aclresult != ACLCHECK_OK)
+		aclcheck_error(aclresult, OBJECT_FUNCTION,
+					   NameListToString(statName));
+
+	return statOid;
+}
+
 /*
  * Guts of operator deletion.
  */
@@ -424,6 +468,9 @@ AlterOperator(AlterOperatorStmt *stmt)
 	List	   *joinName = NIL; /* optional join sel. function */
 	bool		updateJoin = false;
 	Oid			joinOid;
+	List	   *statName = NIL; /* optional statistics estimation procedure */
+	bool		updateStat = false;
+	Oid			statOid;
 
 	/* Look up the operator */
 	oprId = LookupOperWithArgs(stmt->opername, false);
@@ -454,6 +501,11 @@ AlterOperator(AlterOperatorStmt *stmt)
 			joinName = param;
 			updateJoin = true;
 		}
+		else if (pg_strcasecmp(defel->defname, "stats") == 0)
+		{
+			statName = param;
+			updateStat = true;
+		}
 
 		/*
 		 * The rest of the options that CREATE accepts cannot be changed.
@@ -496,6 +548,10 @@ AlterOperator(AlterOperatorStmt *stmt)
 		joinOid = ValidateJoinEstimator(joinName);
 	else
 		joinOid = InvalidOid;
+	if (statName)
+		statOid = ValidateStatisticsDerivator(statName);
+	else
+		statOid = InvalidOid;
 
 	/* Perform additional checks, like OperatorCreate does */
 	if (!(OidIsValid(oprForm->oprleft) && OidIsValid(oprForm->oprright)))
@@ -536,6 +592,11 @@ AlterOperator(AlterOperatorStmt *stmt)
 		replaces[Anum_pg_operator_oprjoin - 1] = true;
 		values[Anum_pg_operator_oprjoin - 1] = joinOid;
 	}
+	if (updateStat)
+	{
+		replaces[Anum_pg_operator_oprstat - 1] = true;
+		values[Anum_pg_operator_oprstat - 1] = statOid;
+	}
 
 	tup = heap_modify_tuple(tup, RelationGetDescr(catalog),
 							values, nulls, replaces);
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 1fbb0b28c3b..d1dd049f1ae 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -4917,6 +4917,30 @@ ReleaseDummy(HeapTuple tuple)
 	pfree(tuple);
 }
 
+/*
+ * examine_operator_expression
+ *		Try to derive optimizer statistics for the operator expression using
+ *		operator's oprstat function.
+ *
+ * There can be another OpExpr in one of the arguments, and it will be called
+ * recursively from the oprstat procedure through the following chain:
+ * get_restriction_variable() => examine_variable() =>
+ * examine_operator_expression().
+ */
+static void
+examine_operator_expression(PlannerInfo *root, OpExpr *opexpr, int varRelid,
+							VariableStatData *vardata)
+{
+	RegProcedure oprstat = get_oprstat(opexpr->opno);
+
+	if (OidIsValid(oprstat))
+		OidFunctionCall4(oprstat,
+						 PointerGetDatum(root),
+						 PointerGetDatum(opexpr),
+						 Int32GetDatum(varRelid),
+						 PointerGetDatum(vardata));
+}
+
 /*
  * examine_variable
  *		Try to look up statistical data about an expression.
@@ -5332,6 +5356,20 @@ examine_variable(PlannerInfo *root, Node *node, int varRelid,
 				pos++;
 			}
 		}
+
+		/*
+		 * If there's no index or extended statistics matching the expression,
+		 * try deriving the statistics from statistics on arguments of the
+		 * operator expression (OpExpr). We do this last because it may be quite
+		 * expensive, and it's unclear how accurate the statistics will be.
+		 *
+		 * More restrictions on the OpExpr (e.g. that one of the arguments
+		 * has to be a Const or something) can be put by the operator itself
+		 * in its oprstat function.
+		 */
+		if (!vardata->statsTuple && IsA(basenode, OpExpr))
+			examine_operator_expression(root, (OpExpr *) basenode, varRelid,
+										vardata);
 	}
 }
 
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index feef9998634..b5440b596cd 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -1567,6 +1567,30 @@ get_oprjoin(Oid opno)
 		return (RegProcedure) InvalidOid;
 }
 
+/*
+ * get_oprstat
+ *
+ *		Returns procedure id for estimating statistics for an operator.
+ */
+RegProcedure
+get_oprstat(Oid opno)
+{
+	HeapTuple	tp;
+
+	tp = SearchSysCache1(OPEROID, ObjectIdGetDatum(opno));
+	if (HeapTupleIsValid(tp))
+	{
+		Form_pg_operator optup = (Form_pg_operator) GETSTRUCT(tp);
+		RegProcedure result;
+
+		result = optup->oprstat;
+		ReleaseSysCache(tp);
+		return result;
+	}
+	else
+		return (RegProcedure) InvalidOid;
+}
+
 /*				---------- FUNCTION CACHE ----------					 */
 
 /*
diff --git a/src/include/catalog/pg_operator.h b/src/include/catalog/pg_operator.h
index 51263f550fe..ff1bb339f75 100644
--- a/src/include/catalog/pg_operator.h
+++ b/src/include/catalog/pg_operator.h
@@ -73,6 +73,9 @@ CATALOG(pg_operator,2617,OperatorRelationId)
 
 	/* OID of join estimator, or 0 */
 	regproc		oprjoin BKI_DEFAULT(-) BKI_LOOKUP_OPT(pg_proc);
+
+	/* OID of statistics estimator, or 0 */
+	regproc		oprstat BKI_DEFAULT(-) BKI_LOOKUP_OPT(pg_proc);
 } FormData_pg_operator;
 
 /* ----------------
@@ -95,6 +98,7 @@ extern ObjectAddress OperatorCreate(const char *operatorName,
 									List *negatorName,
 									Oid restrictionId,
 									Oid joinId,
+									Oid statisticsId,
 									bool canMerge,
 									bool canHash);
 
diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h
index b8dd27d4a96..cc08fc50a50 100644
--- a/src/include/utils/lsyscache.h
+++ b/src/include/utils/lsyscache.h
@@ -118,6 +118,7 @@ extern Oid	get_commutator(Oid opno);
 extern Oid	get_negator(Oid opno);
 extern RegProcedure get_oprrest(Oid opno);
 extern RegProcedure get_oprjoin(Oid opno);
+extern RegProcedure get_oprstat(Oid opno);
 extern char *get_func_name(Oid funcid);
 extern Oid	get_func_namespace(Oid funcid);
 extern Oid	get_func_rettype(Oid funcid);
diff --git a/src/test/regress/expected/oidjoins.out b/src/test/regress/expected/oidjoins.out
index 215eb899be3..111ea99cdae 100644
--- a/src/test/regress/expected/oidjoins.out
+++ b/src/test/regress/expected/oidjoins.out
@@ -113,6 +113,7 @@ NOTICE:  checking pg_operator {oprnegate} => pg_operator {oid}
 NOTICE:  checking pg_operator {oprcode} => pg_proc {oid}
 NOTICE:  checking pg_operator {oprrest} => pg_proc {oid}
 NOTICE:  checking pg_operator {oprjoin} => pg_proc {oid}
+NOTICE:  checking pg_operator {oprstat} => pg_proc {oid}
 NOTICE:  checking pg_opfamily {opfmethod} => pg_am {oid}
 NOTICE:  checking pg_opfamily {opfnamespace} => pg_namespace {oid}
 NOTICE:  checking pg_opfamily {opfowner} => pg_authid {oid}
-- 
2.25.1

0002-Add-stats_form_tuple.patchtext/x-patch; charset=UTF-8; name=0002-Add-stats_form_tuple.patchDownload

From 08e99152f37eeb73a879b640c97e528630581f3b Mon Sep 17 00:00:00 2001
From: Nikita Glukhov <n.gluhov@postgrespro.ru>
Date: Wed, 7 Dec 2016 16:12:55 +0300
Subject: [PATCH 2/6] Add stats_form_tuple()

---
 src/backend/utils/adt/selfuncs.c | 55 ++++++++++++++++++++++++++++++++
 src/include/utils/selfuncs.h     | 22 +++++++++++++
 2 files changed, 77 insertions(+)

diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index d1dd049f1ae..f534e862079 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -7954,3 +7954,58 @@ brincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 
 	*indexPages = index->pages;
 }
+
+/*
+ * stats_form_tuple - Form pg_statistic tuple from StatsData.
+ *
+ * If 'data' parameter is NULL, form all-NULL tuple (nullfrac = 1.0).
+ */
+HeapTuple
+stats_form_tuple(StatsData *data)
+{
+	Relation	rel;
+	HeapTuple	tuple;
+	Datum		values[Natts_pg_statistic];
+	bool		nulls[Natts_pg_statistic];
+	int			i;
+
+	for (i = 0; i < Natts_pg_statistic; ++i)
+		nulls[i] = false;
+
+	values[Anum_pg_statistic_starelid - 1] = ObjectIdGetDatum(InvalidOid);
+	values[Anum_pg_statistic_staattnum - 1] = Int16GetDatum(0);
+	values[Anum_pg_statistic_stainherit - 1] = BoolGetDatum(false);
+	values[Anum_pg_statistic_stanullfrac - 1] =
+									Float4GetDatum(data ? data->nullfrac : 1.0);
+	values[Anum_pg_statistic_stawidth - 1] =
+									Int32GetDatum(data ? data->width : 0);
+	values[Anum_pg_statistic_stadistinct - 1] =
+									Float4GetDatum(data ? data->distinct : 0);
+
+	for (i = 0; i < STATISTIC_NUM_SLOTS; i++)
+	{
+		StatsSlot *slot = data ? &data->slots[i] : NULL;
+
+		values[Anum_pg_statistic_stakind1 + i - 1] =
+								Int16GetDatum(slot ? slot->kind : 0);
+
+		values[Anum_pg_statistic_staop1 + i - 1] =
+					ObjectIdGetDatum(slot ? slot->opid : InvalidOid);
+
+		if (slot && DatumGetPointer(slot->numbers))
+			values[Anum_pg_statistic_stanumbers1 + i - 1] = slot->numbers;
+		else
+			nulls[Anum_pg_statistic_stanumbers1 + i - 1] = true;
+
+		if (slot && DatumGetPointer(slot->values))
+			values[Anum_pg_statistic_stavalues1 + i - 1] = slot->values;
+		else
+			nulls[Anum_pg_statistic_stavalues1 + i - 1] = true;
+	}
+
+	rel = table_open(StatisticRelationId, AccessShareLock);
+	tuple = heap_form_tuple(RelationGetDescr(rel), values, nulls);
+	table_close(rel, NoLock);
+
+	return tuple;
+}
diff --git a/src/include/utils/selfuncs.h b/src/include/utils/selfuncs.h
index 8f3d73edfb2..49fc3d717b1 100644
--- a/src/include/utils/selfuncs.h
+++ b/src/include/utils/selfuncs.h
@@ -16,6 +16,7 @@
 #define SELFUNCS_H
 
 #include "access/htup.h"
+#include "catalog/pg_statistic.h"
 #include "fmgr.h"
 #include "nodes/pathnodes.h"
 
@@ -133,6 +134,24 @@ typedef struct
 	double		num_sa_scans;	/* # indexscans from ScalarArrayOpExprs */
 } GenericCosts;
 
+/* Single pg_statistic slot */
+typedef struct StatsSlot
+{
+	int16	kind;		/* stakindN: statistic kind (STATISTIC_KIND_XXX) */
+	Oid		opid;		/* staopN: associated operator, if needed */
+	Datum	numbers;	/* stanumbersN: float4 array of numbers */
+	Datum	values;		/* stavaluesN: anyarray of values */
+} StatsSlot;
+
+/* Deformed pg_statistic tuple */
+typedef struct StatsData
+{
+	float4		nullfrac;	/* stanullfrac: fraction of NULL values  */
+	float4		distinct;	/* stadistinct: number of distinct non-NULL values */
+	int32		width;		/* stawidth: average width in bytes of non-NULL values */
+	StatsSlot	slots[STATISTIC_NUM_SLOTS]; /* slots for different statistic types */
+} StatsData;
+
 /* Hooks for plugins to get control when we ask for stats */
 typedef bool (*get_relation_stats_hook_type) (PlannerInfo *root,
 											  RangeTblEntry *rte,
@@ -231,6 +250,9 @@ extern void genericcostestimate(PlannerInfo *root, IndexPath *path,
 								double loop_count,
 								GenericCosts *costs);
 
+extern HeapTuple stats_form_tuple(StatsData *data);
+
+
 /* Functions in array_selfuncs.c */
 
 extern Selectivity scalararraysel_containment(PlannerInfo *root,
-- 
2.25.1

0003-Add-symbolic-names-for-some-jsonb-operators.patchtext/x-patch; charset=UTF-8; name=0003-Add-symbolic-names-for-some-jsonb-operators.patchDownload

From 4735d6de9ed1f8f93d9592d9ffa1f662fcdafbe7 Mon Sep 17 00:00:00 2001
From: Nikita Glukhov <n.gluhov@postgrespro.ru>
Date: Sat, 12 Nov 2016 18:59:43 +0300
Subject: [PATCH 3/6] Add symbolic names for some jsonb operators

---
 src/include/catalog/pg_operator.dat | 45 +++++++++++++++++------------
 1 file changed, 26 insertions(+), 19 deletions(-)

diff --git a/src/include/catalog/pg_operator.dat b/src/include/catalog/pg_operator.dat
index bc5f8213f3a..8e0e65ad275 100644
--- a/src/include/catalog/pg_operator.dat
+++ b/src/include/catalog/pg_operator.dat
@@ -2067,7 +2067,7 @@
 { oid => '1751', descr => 'negate',
   oprname => '-', oprkind => 'l', oprleft => '0', oprright => 'numeric',
   oprresult => 'numeric', oprcode => 'numeric_uminus' },
-{ oid => '1752', descr => 'equal',
+{ oid => '1752', oid_symbol => 'NumericEqOperator', descr => 'equal',
   oprname => '=', oprcanmerge => 't', oprcanhash => 't', oprleft => 'numeric',
   oprright => 'numeric', oprresult => 'bool', oprcom => '=(numeric,numeric)',
   oprnegate => '<>(numeric,numeric)', oprcode => 'numeric_eq',
@@ -2077,7 +2077,7 @@
   oprresult => 'bool', oprcom => '<>(numeric,numeric)',
   oprnegate => '=(numeric,numeric)', oprcode => 'numeric_ne',
   oprrest => 'neqsel', oprjoin => 'neqjoinsel' },
-{ oid => '1754', descr => 'less than',
+{ oid => '1754', oid_symbol => 'NumericLtOperator', descr => 'less than',
   oprname => '<', oprleft => 'numeric', oprright => 'numeric',
   oprresult => 'bool', oprcom => '>(numeric,numeric)',
   oprnegate => '>=(numeric,numeric)', oprcode => 'numeric_lt',
@@ -3172,70 +3172,77 @@
 { oid => '3967', descr => 'get value from json as text with path elements',
   oprname => '#>>', oprleft => 'json', oprright => '_text', oprresult => 'text',
   oprcode => 'json_extract_path_text' },
-{ oid => '3211', descr => 'get jsonb object field',
+{ oid => '3211', oid_symbol => 'JsonbObjectFieldOperator',
+  descr => 'get jsonb object field',
   oprname => '->', oprleft => 'jsonb', oprright => 'text', oprresult => 'jsonb',
   oprcode => 'jsonb_object_field' },
-{ oid => '3477', descr => 'get jsonb object field as text',
+{ oid => '3477', oid_symbol => 'JsonbObjectFieldTextOperator',
+  descr => 'get jsonb object field as text',
   oprname => '->>', oprleft => 'jsonb', oprright => 'text', oprresult => 'text',
   oprcode => 'jsonb_object_field_text' },
-{ oid => '3212', descr => 'get jsonb array element',
+{ oid => '3212', oid_symbol => 'JsonbArrayElementOperator',
+  descr => 'get jsonb array element',
   oprname => '->', oprleft => 'jsonb', oprright => 'int4', oprresult => 'jsonb',
   oprcode => 'jsonb_array_element' },
-{ oid => '3481', descr => 'get jsonb array element as text',
+{ oid => '3481', oid_symbol => 'JsonbArrayElementTextOperator',
+  descr => 'get jsonb array element as text',
   oprname => '->>', oprleft => 'jsonb', oprright => 'int4', oprresult => 'text',
   oprcode => 'jsonb_array_element_text' },
-{ oid => '3213', descr => 'get value from jsonb with path elements',
+{ oid => '3213', oid_symbol => 'JsonbExtractPathOperator',
+  descr => 'get value from jsonb with path elements',
   oprname => '#>', oprleft => 'jsonb', oprright => '_text',
   oprresult => 'jsonb', oprcode => 'jsonb_extract_path' },
-{ oid => '3206', descr => 'get value from jsonb as text with path elements',
+{ oid => '3206', oid_symbol => 'JsonbExtractPathTextOperator',
+  descr => 'get value from jsonb as text with path elements',
   oprname => '#>>', oprleft => 'jsonb', oprright => '_text',
   oprresult => 'text', oprcode => 'jsonb_extract_path_text' },
-{ oid => '3240', descr => 'equal',
+{ oid => '3240', oid_symbol => 'JsonbEqOperator', descr => 'equal',
   oprname => '=', oprcanmerge => 't', oprcanhash => 't', oprleft => 'jsonb',
   oprright => 'jsonb', oprresult => 'bool', oprcom => '=(jsonb,jsonb)',
   oprnegate => '<>(jsonb,jsonb)', oprcode => 'jsonb_eq', oprrest => 'eqsel',
   oprjoin => 'eqjoinsel' },
-{ oid => '3241', descr => 'not equal',
+{ oid => '3241', oid_symbol => 'JsonbNeOperator', descr => 'not equal',
   oprname => '<>', oprleft => 'jsonb', oprright => 'jsonb', oprresult => 'bool',
   oprcom => '<>(jsonb,jsonb)', oprnegate => '=(jsonb,jsonb)',
   oprcode => 'jsonb_ne', oprrest => 'neqsel', oprjoin => 'neqjoinsel' },
-{ oid => '3242', descr => 'less than',
+{ oid => '3242', oid_symbol => 'JsonbLtOperator', descr => 'less than',
   oprname => '<', oprleft => 'jsonb', oprright => 'jsonb', oprresult => 'bool',
   oprcom => '>(jsonb,jsonb)', oprnegate => '>=(jsonb,jsonb)',
   oprcode => 'jsonb_lt', oprrest => 'scalarltsel',
   oprjoin => 'scalarltjoinsel' },
-{ oid => '3243', descr => 'greater than',
+{ oid => '3243', oid_symbol => 'JsonbGtOperator', descr => 'greater than',
   oprname => '>', oprleft => 'jsonb', oprright => 'jsonb', oprresult => 'bool',
   oprcom => '<(jsonb,jsonb)', oprnegate => '<=(jsonb,jsonb)',
   oprcode => 'jsonb_gt', oprrest => 'scalargtsel',
   oprjoin => 'scalargtjoinsel' },
-{ oid => '3244', descr => 'less than or equal',
+{ oid => '3244', oid_symbol => 'JsonbLeOperator', descr => 'less than or equal',
   oprname => '<=', oprleft => 'jsonb', oprright => 'jsonb', oprresult => 'bool',
   oprcom => '>=(jsonb,jsonb)', oprnegate => '>(jsonb,jsonb)',
   oprcode => 'jsonb_le', oprrest => 'scalarlesel',
   oprjoin => 'scalarlejoinsel' },
-{ oid => '3245', descr => 'greater than or equal',
+{ oid => '3245', oid_symbol => 'JsonbGeOperator',
+  descr => 'greater than or equal',
   oprname => '>=', oprleft => 'jsonb', oprright => 'jsonb', oprresult => 'bool',
   oprcom => '<=(jsonb,jsonb)', oprnegate => '<(jsonb,jsonb)',
   oprcode => 'jsonb_ge', oprrest => 'scalargesel',
   oprjoin => 'scalargejoinsel' },
-{ oid => '3246', descr => 'contains',
+{ oid => '3246', oid_symbol => 'JsonbContainsOperator', descr => 'contains',
   oprname => '@>', oprleft => 'jsonb', oprright => 'jsonb', oprresult => 'bool',
   oprcom => '<@(jsonb,jsonb)', oprcode => 'jsonb_contains',
   oprrest => 'matchingsel', oprjoin => 'matchingjoinsel' },
-{ oid => '3247', descr => 'key exists',
+{ oid => '3247', oid_symbol => 'JsonbExistsOperator', descr => 'key exists',
   oprname => '?', oprleft => 'jsonb', oprright => 'text', oprresult => 'bool',
   oprcode => 'jsonb_exists', oprrest => 'matchingsel',
   oprjoin => 'matchingjoinsel' },
-{ oid => '3248', descr => 'any key exists',
+{ oid => '3248', oid_symbol => 'JsonbExistsAnyOperator', descr => 'any key exists',
   oprname => '?|', oprleft => 'jsonb', oprright => '_text', oprresult => 'bool',
   oprcode => 'jsonb_exists_any', oprrest => 'matchingsel',
   oprjoin => 'matchingjoinsel' },
-{ oid => '3249', descr => 'all keys exist',
+{ oid => '3249', oid_symbol => 'JsonbExistsAllOperator', descr => 'all keys exist',
   oprname => '?&', oprleft => 'jsonb', oprright => '_text', oprresult => 'bool',
   oprcode => 'jsonb_exists_all', oprrest => 'matchingsel',
   oprjoin => 'matchingjoinsel' },
-{ oid => '3250', descr => 'is contained by',
+{ oid => '3250', oid_symbol => 'JsonbContainedOperator', descr => 'is contained by',
   oprname => '<@', oprleft => 'jsonb', oprright => 'jsonb', oprresult => 'bool',
   oprcom => '@>(jsonb,jsonb)', oprcode => 'jsonb_contained',
   oprrest => 'matchingsel', oprjoin => 'matchingjoinsel' },
-- 
2.25.1

0004-Add-helper-jsonb-functions-and-macros.patchtext/x-patch; charset=UTF-8; name=0004-Add-helper-jsonb-functions-and-macros.patchDownload

From 98fe4515c9494ed6f41e40c909b1458017049832 Mon Sep 17 00:00:00 2001
From: Nikita Glukhov <n.gluhov@postgrespro.ru>
Date: Fri, 16 Dec 2016 17:16:47 +0300
Subject: [PATCH 4/6] Add helper jsonb functions and macros

---
 src/backend/utils/adt/jsonb_util.c |  27 +++++
 src/backend/utils/adt/jsonfuncs.c  |  10 +-
 src/include/utils/jsonb.h          | 165 ++++++++++++++++++++++++++++-
 3 files changed, 195 insertions(+), 7 deletions(-)

diff --git a/src/backend/utils/adt/jsonb_util.c b/src/backend/utils/adt/jsonb_util.c
index 291fb722e2d..9c78bf3e903 100644
--- a/src/backend/utils/adt/jsonb_util.c
+++ b/src/backend/utils/adt/jsonb_util.c
@@ -388,6 +388,22 @@ findJsonbValueFromContainer(JsonbContainer *container, uint32 flags,
 	return NULL;
 }
 
+/*
+ * findJsonbValueFromContainer() wrapper that sets up JsonbValue key string.
+ */
+JsonbValue *
+findJsonbValueFromContainerLen(JsonbContainer *container, uint32 flags,
+							   const char *key, uint32 keylen)
+{
+	JsonbValue	k;
+
+	k.type = jbvString;
+	k.val.string.val = key;
+	k.val.string.len = keylen;
+
+	return findJsonbValueFromContainer(container, flags, &k);
+}
+
 /*
  * Find value by key in Jsonb object and fetch it into 'res', which is also
  * returned.
@@ -602,6 +618,17 @@ pushJsonbValue(JsonbParseState **pstate, JsonbIteratorToken seq,
 		return pushJsonbValueScalar(pstate, seq, jbval);
 	}
 
+	/*
+	 * XXX I'm not quite sure why we actually do this? Why do we need to change
+	 * how JsonbValue is converted to Jsonb for the statistics patch?
+	 */
+	/* push value from scalar container without its enclosing array */
+	if (*pstate && JsonbExtractScalar(jbval->val.binary.data, &v))
+	{
+		Assert(IsAJsonbScalar(&v));
+		return pushJsonbValueScalar(pstate, seq, &v);
+	}
+
 	/* unpack the binary and add each piece to the pstate */
 	it = JsonbIteratorInit(jbval->val.binary.data);
 
diff --git a/src/backend/utils/adt/jsonfuncs.c b/src/backend/utils/adt/jsonfuncs.c
index 0273f883d4f..ac03072839b 100644
--- a/src/backend/utils/adt/jsonfuncs.c
+++ b/src/backend/utils/adt/jsonfuncs.c
@@ -5371,7 +5371,8 @@ iterate_jsonb_values(Jsonb *jb, uint32 flags, void *state,
 		if (type == WJB_KEY)
 		{
 			if (flags & jtiKey)
-				action(state, v.val.string.val, v.val.string.len);
+				action(state, unconstify(char *, v.val.string.val),
+					   v.val.string.len);
 
 			continue;
 		}
@@ -5386,7 +5387,8 @@ iterate_jsonb_values(Jsonb *jb, uint32 flags, void *state,
 		{
 			case jbvString:
 				if (flags & jtiString)
-					action(state, v.val.string.val, v.val.string.len);
+					action(state, unconstify(char *, v.val.string.val),
+						   v.val.string.len);
 				break;
 			case jbvNumeric:
 				if (flags & jtiNumeric)
@@ -5508,7 +5510,9 @@ transform_jsonb_string_values(Jsonb *jsonb, void *action_state,
 	{
 		if ((type == WJB_VALUE || type == WJB_ELEM) && v.type == jbvString)
 		{
-			out = transform_action(action_state, v.val.string.val, v.val.string.len);
+			out = transform_action(action_state,
+								   unconstify(char *, v.val.string.val),
+								   v.val.string.len);
 			v.val.string.val = VARDATA_ANY(out);
 			v.val.string.len = VARSIZE_ANY_EXHDR(out);
 			res = pushJsonbValue(&st, type, type < WJB_BEGIN_ARRAY ? &v : NULL);
diff --git a/src/include/utils/jsonb.h b/src/include/utils/jsonb.h
index 4cbe6edf218..e1ed712f26c 100644
--- a/src/include/utils/jsonb.h
+++ b/src/include/utils/jsonb.h
@@ -14,6 +14,7 @@
 
 #include "lib/stringinfo.h"
 #include "utils/array.h"
+#include "utils/builtins.h"
 #include "utils/numeric.h"
 
 /* Tokens used when sequentially processing a jsonb value */
@@ -229,8 +230,7 @@ typedef struct
 #define JB_ROOT_IS_OBJECT(jbp_) ((*(uint32 *) VARDATA(jbp_) & JB_FOBJECT) != 0)
 #define JB_ROOT_IS_ARRAY(jbp_)	((*(uint32 *) VARDATA(jbp_) & JB_FARRAY) != 0)
 
-
-enum jbvType
+typedef enum jbvType
 {
 	/* Scalar types */
 	jbvNull = 0x0,
@@ -250,7 +250,7 @@ enum jbvType
 	 * into JSON strings when outputted to json/jsonb.
 	 */
 	jbvDatetime = 0x20,
-};
+} JsonbValueType;
 
 /*
  * JsonbValue:	In-memory representation of Jsonb.  This is a convenient
@@ -269,7 +269,7 @@ struct JsonbValue
 		struct
 		{
 			int			len;
-			char	   *val;	/* Not necessarily null-terminated */
+			const char *val;	/* Not necessarily null-terminated */
 		}			string;		/* String primitive type */
 
 		struct
@@ -382,6 +382,10 @@ extern int	compareJsonbContainers(JsonbContainer *a, JsonbContainer *b);
 extern JsonbValue *findJsonbValueFromContainer(JsonbContainer *sheader,
 											   uint32 flags,
 											   JsonbValue *key);
+extern JsonbValue *findJsonbValueFromContainerLen(JsonbContainer *container,
+												  uint32 flags,
+												  const char *key,
+												  uint32 keylen);
 extern JsonbValue *getKeyJsonValueFromContainer(JsonbContainer *container,
 												const char *keyVal, int keyLen,
 												JsonbValue *res);
@@ -412,4 +416,157 @@ extern Datum jsonb_set_element(Jsonb *jb, Datum *path, int path_len,
 							   JsonbValue *newval);
 extern Datum jsonb_get_element(Jsonb *jb, Datum *path, int npath,
 							   bool *isnull, bool as_text);
+
+/*
+ * XXX Not sure we want to add these functions to jsonb.h, which is the
+ * public API. Maybe it belongs rather to jsonb_typanalyze.c or elsewhere,
+ * closer to how it's used?
+ */
+
+/* helper inline functions for JsonbValue initialization */
+static inline JsonbValue *
+JsonValueInitObject(JsonbValue *val, int nPairs, int nPairsAllocated)
+{
+	val->type = jbvObject;
+	val->val.object.nPairs = nPairs;
+	val->val.object.pairs = nPairsAllocated ?
+							palloc(sizeof(JsonbPair) * nPairsAllocated) : NULL;
+
+	return val;
+}
+
+static inline JsonbValue *
+JsonValueInitArray(JsonbValue *val, int nElems, int nElemsAllocated,
+				   bool rawScalar)
+{
+	val->type = jbvArray;
+	val->val.array.nElems = nElems;
+	val->val.array.elems = nElemsAllocated ?
+							palloc(sizeof(JsonbValue) * nElemsAllocated) : NULL;
+	val->val.array.rawScalar = rawScalar;
+
+	return val;
+}
+
+static inline JsonbValue *
+JsonValueInitBinary(JsonbValue *val, Jsonb *jb)
+{
+	val->type = jbvBinary;
+	val->val.binary.data = &(jb)->root;
+	val->val.binary.len = VARSIZE_ANY_EXHDR(jb);
+	return val;
+}
+
+
+static inline JsonbValue *
+JsonValueInitString(JsonbValue *jbv, const char *str)
+{
+	jbv->type = jbvString;
+	jbv->val.string.len = strlen(str);
+	jbv->val.string.val = memcpy(palloc(jbv->val.string.len + 1), str,
+								 jbv->val.string.len + 1);
+	return jbv;
+}
+
+static inline JsonbValue *
+JsonValueInitStringWithLen(JsonbValue *jbv, const char *str, int len)
+{
+	jbv->type = jbvString;
+	jbv->val.string.val = str;
+	jbv->val.string.len = len;
+	return jbv;
+}
+
+static inline JsonbValue *
+JsonValueInitText(JsonbValue *jbv, text *txt)
+{
+	jbv->type = jbvString;
+	jbv->val.string.val = VARDATA_ANY(txt);
+	jbv->val.string.len = VARSIZE_ANY_EXHDR(txt);
+	return jbv;
+}
+
+static inline JsonbValue *
+JsonValueInitNumeric(JsonbValue *jbv, Numeric num)
+{
+	jbv->type = jbvNumeric;
+	jbv->val.numeric = num;
+	return jbv;
+}
+
+static inline JsonbValue *
+JsonValueInitInteger(JsonbValue *jbv, int64 i)
+{
+	jbv->type = jbvNumeric;
+	jbv->val.numeric = DatumGetNumeric(DirectFunctionCall1(
+											int8_numeric, Int64GetDatum(i)));
+	return jbv;
+}
+
+static inline JsonbValue *
+JsonValueInitFloat(JsonbValue *jbv, float4 f)
+{
+	jbv->type = jbvNumeric;
+	jbv->val.numeric = DatumGetNumeric(DirectFunctionCall1(
+											float4_numeric, Float4GetDatum(f)));
+	return jbv;
+}
+
+static inline JsonbValue *
+JsonValueInitDouble(JsonbValue *jbv, float8 f)
+{
+	jbv->type = jbvNumeric;
+	jbv->val.numeric = DatumGetNumeric(DirectFunctionCall1(
+											float8_numeric, Float8GetDatum(f)));
+	return jbv;
+}
+
+/* helper macros for jsonb building */
+#define pushJsonbKey(pstate, jbv, key) \
+		pushJsonbValue(pstate, WJB_KEY, JsonValueInitString(jbv, key))
+
+#define pushJsonbValueGeneric(Type, pstate, jbv, val) \
+		pushJsonbValue(pstate, WJB_VALUE, JsonValueInit##Type(jbv, val))
+
+#define pushJsonbElemGeneric(Type, pstate, jbv, val) \
+		pushJsonbValue(pstate, WJB_ELEM, JsonValueInit##Type(jbv, val))
+
+#define pushJsonbValueInteger(pstate, jbv, i) \
+		pushJsonbValueGeneric(Integer, pstate, jbv, i)
+
+#define pushJsonbValueFloat(pstate, jbv, f) \
+		pushJsonbValueGeneric(Float, pstate, jbv, f)
+
+#define pushJsonbElemFloat(pstate, jbv, f) \
+		pushJsonbElemGeneric(Float, pstate, jbv, f)
+
+#define pushJsonbElemString(pstate, jbv, txt) \
+		pushJsonbElemGeneric(String, pstate, jbv, txt)
+
+#define pushJsonbElemText(pstate, jbv, txt) \
+		pushJsonbElemGeneric(Text, pstate, jbv, txt)
+
+#define pushJsonbElemNumeric(pstate, jbv, num) \
+		pushJsonbElemGeneric(Numeric, pstate, jbv, num)
+
+#define pushJsonbElemInteger(pstate, jbv, num) \
+		pushJsonbElemGeneric(Integer, pstate, jbv, num)
+
+#define pushJsonbElemBinary(pstate, jbv, jbcont) \
+		pushJsonbElemGeneric(Binary, pstate, jbv, jbcont)
+
+#define pushJsonbKeyValueGeneric(Type, pstate, jbv, key, val) ( \
+		pushJsonbKey(pstate, jbv, key), \
+		pushJsonbValueGeneric(Type, pstate, jbv, val) \
+	)
+
+#define pushJsonbKeyValueString(pstate, jbv, key, val) \
+		pushJsonbKeyValueGeneric(String, pstate, jbv, key, val)
+
+#define pushJsonbKeyValueFloat(pstate, jbv, key, val) \
+		pushJsonbKeyValueGeneric(Float, pstate, jbv, key, val)
+
+#define pushJsonbKeyValueInteger(pstate, jbv, key, val) \
+		pushJsonbKeyValueGeneric(Integer, pstate, jbv, key, val)
+
 #endif							/* __JSONB_H__ */
-- 
2.25.1

0005-Export-scalarineqsel.patchtext/x-patch; charset=UTF-8; name=0005-Export-scalarineqsel.patchDownload

From 45b82feed6e866bd337d036f68cdf46b6827d0b8 Mon Sep 17 00:00:00 2001
From: Nikita Glukhov <n.gluhov@postgrespro.ru>
Date: Sat, 12 Nov 2016 19:01:43 +0300
Subject: [PATCH 5/6] Export scalarineqsel()

---
 src/backend/utils/adt/selfuncs.c | 2 +-
 src/include/utils/selfuncs.h     | 4 ++++
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index f534e862079..6ef31fdd79c 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -573,7 +573,7 @@ neqsel(PG_FUNCTION_ARGS)
  * it will return an approximate estimate based on assuming that the constant
  * value falls in the middle of the bin identified by binary search.
  */
-static double
+double
 scalarineqsel(PlannerInfo *root, Oid operator, bool isgt, bool iseq,
 			  Oid collation,
 			  VariableStatData *vardata, Datum constval, Oid consttype)
diff --git a/src/include/utils/selfuncs.h b/src/include/utils/selfuncs.h
index 49fc3d717b1..88a95401c5f 100644
--- a/src/include/utils/selfuncs.h
+++ b/src/include/utils/selfuncs.h
@@ -249,6 +249,10 @@ extern List *add_predicate_to_index_quals(IndexOptInfo *index,
 extern void genericcostestimate(PlannerInfo *root, IndexPath *path,
 								double loop_count,
 								GenericCosts *costs);
+extern double scalarineqsel(PlannerInfo *root, Oid operator, bool isgt,
+							bool iseq, Oid collation,
+							VariableStatData *vardata, Datum constval,
+							Oid consttype);
 
 extern HeapTuple stats_form_tuple(StatsData *data);
 
-- 
2.25.1

0006-Add-jsonb-statistics.patchtext/x-patch; charset=UTF-8; name=0006-Add-jsonb-statistics.patchDownload

From 485f4cc1c0e1d16cb809dedc3a2819780d1a2a23 Mon Sep 17 00:00:00 2001
From: Nikita Glukhov <n.gluhov@postgrespro.ru>
Date: Sat, 12 Nov 2016 19:19:33 +0300
Subject: [PATCH 6/6] Add jsonb statistics

---
 src/backend/catalog/system_functions.sql  |   36 +
 src/backend/catalog/system_views.sql      |   56 +
 src/backend/utils/adt/Makefile            |    2 +
 src/backend/utils/adt/jsonb_selfuncs.c    | 1582 +++++++++++++++++++++
 src/backend/utils/adt/jsonb_typanalyze.c  | 1360 ++++++++++++++++++
 src/backend/utils/adt/jsonpath_exec.c     |    2 +-
 src/include/catalog/pg_operator.dat       |   17 +-
 src/include/catalog/pg_proc.dat           |   11 +
 src/include/catalog/pg_statistic.h        |    2 +
 src/include/catalog/pg_type.dat           |    2 +-
 src/include/utils/json_selfuncs.h         |  113 ++
 src/test/regress/expected/jsonb_stats.out |  713 ++++++++++
 src/test/regress/expected/rules.out       |   32 +
 src/test/regress/parallel_schedule        |    2 +-
 src/test/regress/sql/jsonb_stats.sql      |  249 ++++
 15 files changed, 4168 insertions(+), 11 deletions(-)
 create mode 100644 src/backend/utils/adt/jsonb_selfuncs.c
 create mode 100644 src/backend/utils/adt/jsonb_typanalyze.c
 create mode 100644 src/include/utils/json_selfuncs.h
 create mode 100644 src/test/regress/expected/jsonb_stats.out
 create mode 100644 src/test/regress/sql/jsonb_stats.sql

diff --git a/src/backend/catalog/system_functions.sql b/src/backend/catalog/system_functions.sql
index fd1421788e6..fdf236e4edc 100644
--- a/src/backend/catalog/system_functions.sql
+++ b/src/backend/catalog/system_functions.sql
@@ -594,6 +594,42 @@ LANGUAGE internal
 STRICT IMMUTABLE PARALLEL SAFE
 AS 'unicode_is_normalized';
 
+-- XXX is this function immutable / parallel safe?
+-- XXX do we actually need to cast to text and then to jsonb?
+CREATE FUNCTION pg_json_path_stats(tab regclass, path_index integer) RETURNS text
+AS $$
+	SELECT jsonb_pretty((
+		CASE
+			WHEN stakind1 = 8 THEN stavalues1
+			WHEN stakind2 = 8 THEN stavalues2
+			WHEN stakind3 = 8 THEN stavalues3
+			WHEN stakind4 = 8 THEN stavalues4
+			WHEN stakind5 = 8 THEN stavalues5
+		END::text::jsonb[])[$2])
+	FROM pg_statistic
+	WHERE starelid = $1
+$$ LANGUAGE 'sql';
+
+-- XXX is this function immutable / parallel safe?
+-- XXX do we actually need to cast to text and then to jsonb?
+CREATE FUNCTION pg_json_path_stats(tab regclass, path text) RETURNS text
+AS $$
+	SELECT jsonb_pretty(pathstats)
+	FROM (
+		SELECT unnest(
+			CASE
+				WHEN stakind1 = 8 THEN stavalues1
+				WHEN stakind2 = 8 THEN stavalues2
+				WHEN stakind3 = 8 THEN stavalues3
+				WHEN stakind4 = 8 THEN stavalues4
+				WHEN stakind5 = 8 THEN stavalues5
+			END::text::jsonb[]) pathstats
+		FROM pg_statistic
+		WHERE starelid = $1
+	) paths
+	WHERE pathstats->>'path' = $2
+$$ LANGUAGE 'sql';
+
 --
 -- The default permissions for functions mean that anyone can execute them.
 -- A number of functions shouldn't be executable by just anyone, but rather
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 3cb69b1f87b..6fafa7aef31 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -364,6 +364,62 @@ CREATE VIEW pg_stats_ext_exprs WITH (security_barrier) AS
 -- unprivileged users may read pg_statistic_ext but not pg_statistic_ext_data
 REVOKE ALL ON pg_statistic_ext_data FROM public;
 
+-- XXX This probably needs to do the same checks as pg_stats, i.e.
+--    WHERE NOT attisdropped
+--    AND has_column_privilege(c.oid, a.attnum, 'select')
+--    AND (c.relrowsecurity = false OR NOT row_security_active(c.oid));
+CREATE VIEW pg_stats_json AS
+	SELECT
+		nspname AS schemaname,
+		relname AS tablename,
+		attname AS attname,
+
+		path->>'path' AS json_path,
+
+		stainherit AS inherited,
+
+		(path->'json'->>'nullfrac')::float4 AS null_frac,
+		(path->'json'->>'width')::float4 AS avg_width,
+		(path->'json'->>'distinct')::float4 AS n_distinct,
+
+		ARRAY(SELECT val FROM jsonb_array_elements(
+				path->'json'->'mcv'->'values') val)::anyarray
+			AS most_common_vals,
+
+		ARRAY(SELECT num::text::float4 FROM jsonb_array_elements(
+				path->'json'->'mcv'->'numbers') num)
+			AS most_common_freqs,
+
+		ARRAY(SELECT val FROM jsonb_array_elements(
+				path->'json'->'histogram'->'values') val)
+			AS histogram_bounds,
+
+		ARRAY(SELECT val::text::int FROM jsonb_array_elements(
+				path->'array_length'->'mcv'->'values') val)
+			AS most_common_array_lengths,
+
+		ARRAY(SELECT num::text::float4 FROM jsonb_array_elements(
+				path->'array_length'->'mcv'->'numbers') num)
+			AS most_common_array_length_freqs,
+
+		(path->'json'->>'correlation')::float4 AS correlation
+
+	FROM
+		pg_statistic s JOIN pg_class c ON (c.oid = s.starelid)
+		JOIN pg_attribute a ON (c.oid = attrelid AND attnum = s.staattnum)
+		LEFT JOIN pg_namespace n ON (n.oid = c.relnamespace),
+		LATERAL (
+			SELECT unnest((CASE
+					WHEN stakind1 = 8 THEN stavalues1
+					WHEN stakind2 = 8 THEN stavalues2
+					WHEN stakind3 = 8 THEN stavalues3
+					WHEN stakind4 = 8 THEN stavalues4
+					WHEN stakind5 = 8 THEN stavalues5
+				END ::text::jsonb[])[2:]) AS path
+		) paths;
+
+-- no need to revoke any privileges, we've already revoked accss to pg_statistic
+
 CREATE VIEW pg_publication_tables AS
     SELECT
         P.pubname AS pubname,
diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile
index 41b486bceff..5e359ccf4fb 100644
--- a/src/backend/utils/adt/Makefile
+++ b/src/backend/utils/adt/Makefile
@@ -50,6 +50,8 @@ OBJS = \
 	jsonb.o \
 	jsonb_gin.o \
 	jsonb_op.o \
+	jsonb_selfuncs.o \
+	jsonb_typanalyze.o \
 	jsonb_util.o \
 	jsonfuncs.o \
 	jsonbsubs.o \
diff --git a/src/backend/utils/adt/jsonb_selfuncs.c b/src/backend/utils/adt/jsonb_selfuncs.c
new file mode 100644
index 00000000000..3b6d1406566
--- /dev/null
+++ b/src/backend/utils/adt/jsonb_selfuncs.c
@@ -0,0 +1,1582 @@
+/*-------------------------------------------------------------------------
+ *
+ * jsonb_selfuncs.c
+ *	  Functions for selectivity estimation of jsonb operators
+ *
+ * Copyright (c) 2016-2022, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/adt/jsonb_selfuncs.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include <math.h>
+
+#include "fmgr.h"
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "catalog/pg_operator.h"
+#include "catalog/pg_statistic.h"
+#include "catalog/pg_type.h"
+#include "common/string.h"
+#include "nodes/primnodes.h"
+#include "utils/builtins.h"
+#include "utils/json.h"
+#include "utils/jsonb.h"
+#include "utils/json_selfuncs.h"
+#include "utils/lsyscache.h"
+#include "utils/rel.h"
+#include "utils/selfuncs.h"
+
+#define DEFAULT_JSON_CONTAINS_SEL	0.001
+
+/*
+ * jsonGetField
+ *		Given a JSONB document and a key, extract the JSONB value for the key.
+ */
+static inline Datum
+jsonGetField(Datum obj, const char *field)
+{
+	Jsonb 	   *jb = DatumGetJsonbP(obj);
+	JsonbValue *jbv = findJsonbValueFromContainerLen(&jb->root, JB_FOBJECT,
+													 field, strlen(field));
+	return jbv ? JsonbPGetDatum(JsonbValueToJsonb(jbv)) : PointerGetDatum(NULL);
+}
+
+/*
+ * jsonGetFloat4
+ *		Given a JSONB value, interpret it as a float4 value.
+ *
+ * This expects the JSONB value to be a numeric, because that's how we store
+ * floats in JSONB, and we cast it to float4.
+ */
+static inline float4
+jsonGetFloat4(Datum jsonb, float4 default_val)
+{
+	Jsonb	   *jb;
+	JsonbValue	jv;
+
+	if (!DatumGetPointer(jsonb))
+		return default_val;
+
+	jb = DatumGetJsonbP(jsonb);
+
+	if (!JsonbExtractScalar(&jb->root, &jv) || jv.type != jbvNumeric)
+		return default_val;
+
+	return DatumGetFloat4(DirectFunctionCall1(numeric_float4,
+											  NumericGetDatum(jv.val.numeric)));
+}
+
+/*
+ * jsonStatsInit
+ *		Given a pg_statistic tuple, expand STATISTIC_KIND_JSON into JsonStats.
+ */
+bool
+jsonStatsInit(JsonStats data, const VariableStatData *vardata)
+{
+	Jsonb	   *jb;
+	JsonbValue	prefix;
+
+	if (!vardata->statsTuple)
+		return false;
+
+	data->statsTuple = vardata->statsTuple;
+	memset(&data->attslot, 0, sizeof(data->attslot));
+
+	/* Were there just NULL values in the column? No JSON stats, but still useful. */
+	if (((Form_pg_statistic) GETSTRUCT(data->statsTuple))->stanullfrac >= 1.0)
+	{
+		data->nullfrac = 1.0;
+		return true;
+	}
+
+	/* Do we have the JSON stats built in the pg_statistic? */
+	if (!get_attstatsslot(&data->attslot, data->statsTuple,
+						  STATISTIC_KIND_JSON, InvalidOid,
+						  ATTSTATSSLOT_NUMBERS | ATTSTATSSLOT_VALUES))
+		return false;
+
+	/*
+	 * Valid JSON stats should have at least 2 elements in values:
+	 *  0th - root path prefix
+	 *  1st - root path stats
+	 */
+	if (data->attslot.nvalues < 2)
+	{
+		free_attstatsslot(&data->attslot);
+		return false;
+	}
+
+	/* XXX If the ACL check was not OK, would we even get here? */
+	data->acl_ok = vardata->acl_ok;
+	data->rel = vardata->rel;
+	data->nullfrac =
+		data->attslot.nnumbers > 0 ? data->attslot.numbers[0] : 0.0;
+	data->pathdatums = data->attslot.values + 1;
+	data->npaths = data->attslot.nvalues - 1;
+
+	/* Extract root path prefix */
+	jb = DatumGetJsonbP(data->attslot.values[0]);
+	if (!JsonbExtractScalar(&jb->root, &prefix) || prefix.type != jbvString)
+	{
+		free_attstatsslot(&data->attslot);
+		return false;
+	}
+
+	data->prefix = prefix.val.string.val;
+	data->prefixlen = prefix.val.string.len;
+
+	/* Create path cache, initialze only two fields that acting as flags */
+	data->paths = palloc(sizeof(*data->paths) * data->npaths);
+
+	for (int i = 0; i < data->npaths; i++)
+	{
+		data->paths[i].data = NULL;
+		data->paths[i].path = NULL;
+	}
+
+	return true;
+}
+
+/*
+ * jsonStatsRelease
+ *		Release resources (statistics slot) associated with the JsonStats value.
+ */
+void
+jsonStatsRelease(JsonStats data)
+{
+	free_attstatsslot(&data->attslot);
+}
+
+/*
+ * jsonPathStatsAllocSpecialStats
+ *		Allocate a copy of JsonPathStats for accessing special (length etc.)
+ *		stats for a given JSON path.
+ */
+static JsonPathStats
+jsonPathStatsAllocSpecialStats(JsonPathStats pstats, JsonPathStatsType type)
+{
+	JsonPathStats stats;
+
+	if (!pstats)
+		return NULL;
+
+	/* copy and replace stats type */
+	stats = palloc(sizeof(*stats));
+	*stats = *pstats;
+	stats->type = type;
+
+	return stats;
+}
+
+/*
+ * jsonPathStatsGetArrayLengthStats
+ *		Extract statistics of array lengths for the path.
+ */
+JsonPathStats
+jsonPathStatsGetArrayLengthStats(JsonPathStats pstats)
+{
+	/*
+	 * The array length statistics is relevant only for values that are arrays.
+	 * So if we observed no such values, we know there can't be such
+	 * statistics and so we simply return NULL.
+	 */
+	if (jsonPathStatsGetTypeFreq(pstats, jbvArray, 0.0) <= 0.0)
+		return NULL;
+
+	return jsonPathStatsAllocSpecialStats(pstats, JsonPathStatsArrayLength);
+}
+
+/*
+ * jsonPathStatsGetObjectLengthStats
+ *		Extract statistics of object length for the path.
+ */
+JsonPathStats
+jsonPathStatsGetObjectLengthStats(JsonPathStats pstats)
+{
+	/*
+	 * The object length statistics is relevant only for values that are arrays.
+	 * So if we observed no such values, we know there can't be such
+	 * statistics and so we simply return NULL.
+	 */
+	if (jsonPathStatsGetTypeFreq(pstats, jbvObject, 0.0) <= 0.0)
+		return NULL;
+
+	return jsonPathStatsAllocSpecialStats(pstats, JsonPathStatsObjectLength);
+}
+
+/*
+ * jsonPathStatsGetPath
+ *		Try to use cached path name or extract it from per-path stats datum.
+ *
+ * Returns true on succces, false on error.
+ */
+static inline bool
+jsonPathStatsGetPath(JsonPathStats stats, Datum pathdatum,
+					 const char **path, int *pathlen)
+{
+	*path = stats->path;
+
+	if (*path)
+		/* use cached path name */
+		*pathlen = stats->pathlen;
+	else
+	{
+		Jsonb	   *jsonb = DatumGetJsonbP(pathdatum);
+		JsonbValue	pathkey;
+		JsonbValue *pathval;
+
+		/* extract path from the statistics represented as jsonb document */
+		JsonValueInitStringWithLen(&pathkey, "path", 4);
+		pathval = findJsonbValueFromContainer(&jsonb->root, JB_FOBJECT, &pathkey);
+
+		if (!pathval || pathval->type != jbvString)
+			return false;	/* XXX invalid stats data, maybe throw error */
+
+		/* cache extracted path name */
+		*path = stats->path = pathval->val.string.val;
+		*pathlen = stats->pathlen = pathval->val.string.len;
+	}
+
+	return true;
+}
+
+/* Context for bsearch()ing paths */
+typedef struct JsonPathStatsSearchContext
+{
+	JsonStats	stats;
+	const char *path;
+	int			pathlen;
+} JsonPathStatsSearchContext;
+
+/*
+ * jsonPathStatsCompare
+ *		Compare two JsonPathStats structs, so that we can sort them.
+ *
+ * We do this so that we can search for stats for a given path simply by
+ * bsearch().
+ *
+ * XXX We never build two structs for the same path, so we know the paths
+ * are different - one may be a prefix of the other, but then we sort the
+ * strings by length.
+ */
+static int
+jsonPathStatsCompare(const void *pv1, const void *pv2)
+{
+	JsonPathStatsSearchContext const *cxt = pv1;
+	Datum const *pathdatum = (Datum const *) pv2;
+	int			index = pathdatum - cxt->stats->pathdatums;
+	JsonPathStats stats = &cxt->stats->paths[index];
+	const char *path;
+	int			pathlen;
+	int			res;
+
+	if (!jsonPathStatsGetPath(stats, *pathdatum, &path, &pathlen))
+		return 1;	/* XXX invalid stats data */
+
+	/* compare the shared part first, then compare by length */
+	res = strncmp(cxt->path, path, Min(cxt->pathlen, pathlen));
+
+	return res ? res : cxt->pathlen - pathlen;
+}
+
+/*
+ * jsonStatsFindPath
+ *		Find stats for a given path.
+ *
+ * The stats are sorted by path, so we can simply do bsearch().
+ * This is low-level function and jsdata->prefix is not considered, the caller
+ * should handle it by itself.
+ */
+static JsonPathStats
+jsonStatsFindPath(JsonStats jsdata, const char *path, int pathlen)
+{
+	JsonPathStatsSearchContext cxt;
+	JsonPathStats stats;
+	Datum	   *pdatum;
+	int			index;
+
+	cxt.stats = jsdata;
+	cxt.path = path;
+	cxt.pathlen = pathlen;
+
+	pdatum = bsearch(&cxt, jsdata->pathdatums, jsdata->npaths,
+					 sizeof(*jsdata->pathdatums), jsonPathStatsCompare);
+
+	if (!pdatum)
+		return NULL;
+
+	index = pdatum - jsdata->pathdatums;
+	stats = &jsdata->paths[index];
+
+	Assert(stats->path);
+	Assert(stats->pathlen == pathlen);
+
+	/* Init all fields if needed (stats->data == NULL means uninitialized) */
+	if (!stats->data)
+	{
+		stats->data = jsdata;
+		stats->datum = pdatum;
+		stats->type = JsonPathStatsValues;
+	}
+
+	return stats;
+}
+
+/*
+ * jsonStatsGetPathByStr
+ *		Find stats for a given path string considering jsdata->prefix.
+ */
+JsonPathStats
+jsonStatsGetPathByStr(JsonStats jsdata, const char *subpath, int subpathlen)
+{
+	JsonPathStats stats;
+	char	   *path;
+	int			pathlen;
+
+	if (jsdata->nullfrac >= 1.0)
+		return NULL;
+
+	pathlen = jsdata->prefixlen + subpathlen - 1;
+	path = palloc(pathlen);
+
+	memcpy(path, jsdata->prefix, jsdata->prefixlen);
+	memcpy(&path[jsdata->prefixlen], &subpath[1], subpathlen - 1);
+
+	stats = jsonStatsFindPath(jsdata, path, pathlen);
+
+	if (!stats)
+		pfree(path);
+
+	return stats;
+}
+
+/*
+ * jsonStatsGetRootPath
+ *		Find JSON stats for root prefix path.
+ */
+static JsonPathStats
+jsonStatsGetRootPath(JsonStats jsdata)
+{
+	if (jsdata->nullfrac >= 1.0)
+		return NULL;
+
+	return jsonStatsFindPath(jsdata, jsdata->prefix, jsdata->prefixlen);
+}
+
+#define jsonStatsGetRootArrayPath(jsdata) \
+		jsonStatsGetPathByStr(jsdata, JSON_PATH_ROOT_ARRAY, JSON_PATH_ROOT_ARRAY_LEN)
+
+/*
+ * jsonPathAppendEntry
+ *		Append entry (represented as simple string) to a path.
+ *
+ * NULL entry is treated as wildcard array accessor "[*]".
+ */
+void
+jsonPathAppendEntry(StringInfo path, const char *entry)
+{
+	if (entry)
+	{
+		appendStringInfoCharMacro(path, '.');
+		escape_json(path, entry);
+	}
+	else
+		appendStringInfoString(path, "[*]");
+}
+
+/*
+ * jsonPathAppendEntryWithLen
+ *		Append string (represented as string + length) to a path.
+ */
+static void
+jsonPathAppendEntryWithLen(StringInfo path, const char *entry, int len)
+{
+	char *tmpentry = pnstrdup(entry, len);
+	jsonPathAppendEntry(path, tmpentry);
+	pfree(tmpentry);
+}
+
+/*
+ * jsonPathStatsGetSubpath
+ *		Find JSON path stats for object key or array elements (if 'key' = NULL).
+ */
+JsonPathStats
+jsonPathStatsGetSubpath(JsonPathStats pstats, const char *key)
+{
+	JsonPathStats spstats;
+	StringInfoData str;
+
+	initStringInfo(&str);
+	appendBinaryStringInfo(&str, pstats->path, pstats->pathlen);
+	jsonPathAppendEntry(&str, key);
+
+	spstats = jsonStatsFindPath(pstats->data, str.data, str.len);
+	if (!spstats)
+		pfree(str.data);
+
+	return spstats;
+}
+
+/*
+ * jsonPathStatsGetArrayIndexSelectivity
+ *		Given stats for a path, determine selectivity for an array index.
+ */
+Selectivity
+jsonPathStatsGetArrayIndexSelectivity(JsonPathStats pstats, int index)
+{
+	JsonPathStats lenstats = jsonPathStatsGetArrayLengthStats(pstats);
+	JsonbValue	tmpjbv;
+	Jsonb	   *jb;
+
+	/*
+	 * If we have no array length stats, assume all documents match.
+	 *
+	 * XXX Shouldn't this use a default smaller than 1.0? What do the selfuncs
+	 * for regular arrays use?
+	 */
+	if (!lenstats)
+		return 1.0;
+
+	jb = JsonbValueToJsonb(JsonValueInitInteger(&tmpjbv, index));
+
+	/* calculate fraction of elements smaller than the index */
+	return jsonSelectivity(lenstats, JsonbPGetDatum(jb), JsonbGtOperator);
+}
+
+/*
+ * jsonStatsGetPath
+ *		Find JSON statistics for a given path.
+ *
+ * 'path' is an array of text datums of length 'pathlen' (can be zero).
+ */
+static JsonPathStats
+jsonStatsGetPath(JsonStats jsdata, Datum *path, int pathlen,
+				 bool try_arrays_indexes, float4 *nullfrac)
+{
+	JsonPathStats pstats = jsonStatsGetRootPath(jsdata);
+	Selectivity	sel = 1.0;
+
+	for (int i = 0; pstats && i < pathlen; i++)
+	{
+		char	   *key = TextDatumGetCString(path[i]);
+		char	   *tail;
+		int			index;
+
+		if (!try_arrays_indexes)
+		{
+			/* Find object key stats */
+			pstats = jsonPathStatsGetSubpath(pstats, key);
+			pfree(key);
+			continue;
+		}
+
+		/* Try to interpret path entry as integer array index */
+		errno = 0;
+		index = strtoint(key, &tail, 10);
+
+		if (tail == key || *tail != '\0' || errno != 0)
+		{
+			/* Find object key stats */
+			pstats = jsonPathStatsGetSubpath(pstats, key);
+		}
+		else
+		{
+			/* Find array index stats */
+			/* FIXME consider object key "index" also */
+			JsonPathStats arrstats = jsonPathStatsGetSubpath(pstats, NULL);
+
+			if (arrstats)
+			{
+				float4		arrfreq = jsonPathStatsGetFreq(pstats, 0.0);
+
+				sel *= jsonPathStatsGetArrayIndexSelectivity(pstats, index);
+
+				if (arrfreq > 0.0)
+					sel /= arrfreq;
+			}
+
+			pstats = arrstats;
+		}
+
+		pfree(key);
+	}
+
+	*nullfrac = 1.0 - sel;
+
+	return pstats;
+}
+
+/*
+ * jsonPathStatsGetNextSubpathStats
+ *		Iterate all collected subpaths of a given path.
+ *
+ * This function can be useful for estimation of selectivity of jsonpath
+ * '.*' and  '.**' operators.
+ *
+ * The next found subpath is written into *pkeystats, which should be set to
+ * NULL before the first call.
+ *
+ * If keysOnly is true, emit only top-level object-key subpaths.
+ *
+ * Returns false on the end of iteration and true otherwise.
+ */
+bool
+jsonPathStatsGetNextSubpathStats(JsonPathStats stats, JsonPathStats *pkeystats,
+								 bool keysOnly)
+{
+	JsonPathStats keystats = *pkeystats;
+	/* compute next index */
+	int			index =
+		(keystats ? keystats->datum : stats->datum) - stats->data->pathdatums + 1;
+
+	if (stats->type != JsonPathStatsValues)
+		return false;	/* length stats doe not have subpaths */
+
+	for (; index < stats->data->npaths; index++)
+	{
+		Datum	   *pathdatum = &stats->data->pathdatums[index];
+		const char *path;
+		int			pathlen;
+
+		keystats = &stats->data->paths[index];
+
+		if (!jsonPathStatsGetPath(keystats, *pathdatum, &path, &pathlen))
+			break;	/* invalid path stats */
+
+		/* Break, if subpath does not start from a desired prefix */
+		if (pathlen <= stats->pathlen ||
+			memcmp(path, stats->path, stats->pathlen))
+			break;
+
+		if (keysOnly)
+		{
+			const char *c = &path[stats->pathlen];
+
+			if (*c == '[')
+			{
+				Assert(c[1] == '*' && c[2] == ']');
+
+#if 0	/* TODO add separate flag for requesting top-level array accessors */
+				/* skip if it is not last key in the path */
+				if (pathlen > stats->pathlen + 3)
+#endif
+					continue;	/* skip array accessors */
+			}
+			else if (*c == '.')
+			{
+				/* find end of '."key"' */
+				const char *pathend = path + pathlen;
+
+				if (++c >= pathend || *c != '"')
+					break;		/* invalid path */
+
+				while (++c <= pathend && *c != '"')
+					if (*c == '\\')	/* handle escaped chars */
+						c++;
+
+				if (c > pathend)
+					break;		/* invalid path */
+
+				/* skip if it is not last key in the path */
+				if (c < pathend)
+					continue;
+			}
+			else
+				continue;	/* invalid path */
+		}
+
+		/* Init path stats if needed */
+		if (!stats->data)
+		{
+			keystats->data = stats->data;
+			keystats->datum = pathdatum;
+			keystats->type = JsonPathStatsValues;
+		}
+
+		*pkeystats = keystats;
+
+		return true;
+	}
+
+	return false;
+}
+
+/*
+ * jsonStatsConvertArray
+ *		Convert a JSONB array into an array of some regular data type.
+ *
+ * The "type" identifies what elements are in the input JSONB array, while
+ * typid determines the target type.
+ */
+static Datum
+jsonStatsConvertArray(Datum jsonbValueArray, JsonStatType type, Oid typid,
+					  float4 multiplier)
+{
+	Datum	   *values;
+	Jsonb	   *jbvals;
+	JsonbValue	jbv;
+	JsonbIterator *it;
+	JsonbIteratorToken r;
+	int			nvalues;
+	int			i;
+	int16		typlen;
+	bool		typbyval;
+	char		typalign;
+
+	if (!DatumGetPointer(jsonbValueArray))
+		return PointerGetDatum(NULL);
+
+	jbvals = DatumGetJsonbP(jsonbValueArray);
+
+	nvalues = JsonContainerSize(&jbvals->root);
+
+	values = palloc(sizeof(Datum) * nvalues);
+
+	for (i = 0, it = JsonbIteratorInit(&jbvals->root);
+		(r = JsonbIteratorNext(&it, &jbv, true)) != WJB_DONE;)
+	{
+		if (r == WJB_ELEM)
+		{
+			Datum value;
+
+			switch (type)
+			{
+				case JsonStatJsonb:
+				case JsonStatJsonbWithoutSubpaths:
+					value = JsonbPGetDatum(JsonbValueToJsonb(&jbv));
+					break;
+
+				case JsonStatText:
+				case JsonStatString:
+					Assert(jbv.type == jbvString);
+					value = PointerGetDatum(
+								cstring_to_text_with_len(jbv.val.string.val,
+														 jbv.val.string.len));
+					break;
+
+				case JsonStatNumeric:
+					Assert(jbv.type == jbvNumeric);
+					value = NumericGetDatum(jbv.val.numeric);
+					break;
+
+				case JsonStatFloat4:
+					Assert(jbv.type == jbvNumeric);
+					value = DirectFunctionCall1(numeric_float4,
+												NumericGetDatum(jbv.val.numeric));
+					value = Float4GetDatum(DatumGetFloat4(value) * multiplier);
+					break;
+
+				default:
+					elog(ERROR, "invalid json stat type %d", type);
+					value = (Datum) 0;
+					break;
+			}
+
+			Assert(i < nvalues);
+			values[i++] = value;
+		}
+	}
+
+	Assert(i == nvalues);
+
+	get_typlenbyvalalign(typid, &typlen, &typbyval, &typalign);
+
+	return PointerGetDatum(
+		construct_array(values, nvalues, typid, typlen, typbyval, typalign));
+}
+
+/*
+ * jsonPathStatsExtractData
+ *		Extract pg_statistics values from statistics for a single path.
+ *
+ * Extract ordinary MCV, Histogram, Correlation slots for a requested stats
+ * type. If requested stats for JSONB, include also transformed JSON slot for
+ * a path and possibly for its subpaths.
+ */
+static bool
+jsonPathStatsExtractData(JsonPathStats pstats, JsonStatType stattype,
+						 float4 nullfrac, StatsData *statdata)
+{
+	Datum		data;
+	Datum		nullf;
+	Datum		dist;
+	Datum		width;
+	Datum		mcv;
+	Datum		hst;
+	Datum		corr;
+	Oid			type;
+	Oid			eqop;
+	Oid			ltop;
+	const char *key;
+	StatsSlot  *slot = statdata->slots;
+
+	nullfrac = 1.0 - (1.0 - pstats->data->nullfrac) * (1.0 - nullfrac);
+
+	/*
+	 * Depending on requested statistics type, select:
+	 *	- stavalues data type
+	 *	- corresponding eq/lt operators
+	 *	- JSONB field, containing stats slots for this statistics type
+	 */
+	switch (stattype)
+	{
+		case JsonStatJsonb:
+		case JsonStatJsonbWithoutSubpaths:
+			key = pstats->type == JsonPathStatsArrayLength ? "array_length" :
+				  pstats->type == JsonPathStatsObjectLength ? "object_length" :
+				  "json";
+			type = JSONBOID;
+			eqop = JsonbEqOperator;
+			ltop = JsonbLtOperator;
+			break;
+		case JsonStatText:
+			key = "text";
+			type = TEXTOID;
+			eqop = TextEqualOperator;
+			ltop = TextLessOperator;
+			break;
+		case JsonStatString:
+			key = "string";
+			type = TEXTOID;
+			eqop = TextEqualOperator;
+			ltop = TextLessOperator;
+			break;
+		case JsonStatNumeric:
+			key = "numeric";
+			type = NUMERICOID;
+			eqop = NumericEqOperator;
+			ltop = NumericLtOperator;
+			break;
+		case JsonStatFloat4:	/* special internal stats type */
+		default:
+			elog(ERROR, "invalid json statistic type %d", stattype);
+			break;
+	}
+
+	/* Extract object containing slots */
+	data = jsonGetField(*pstats->datum, key);
+
+	if (!DatumGetPointer(data))
+		return false;
+
+	nullf = jsonGetField(data, "nullfrac");
+	dist = jsonGetField(data, "distinct");
+	width = jsonGetField(data, "width");
+	mcv = jsonGetField(data, "mcv");
+	hst = jsonGetField(data, "histogram");
+	corr = jsonGetField(data, "correlation");
+
+	statdata->nullfrac = jsonGetFloat4(nullf, 0);
+	statdata->distinct = jsonGetFloat4(dist, 0);
+	statdata->width = (int32) jsonGetFloat4(width, 0);
+
+	statdata->nullfrac += (1.0 - statdata->nullfrac) * nullfrac;
+
+	/* Include MCV slot if exists */
+	if (DatumGetPointer(mcv))
+	{
+		slot->kind = STATISTIC_KIND_MCV;
+		slot->opid = eqop;
+		slot->numbers = jsonStatsConvertArray(jsonGetField(mcv, "numbers"),
+											  JsonStatFloat4, FLOAT4OID,
+											  1.0 - nullfrac);
+		slot->values  = jsonStatsConvertArray(jsonGetField(mcv, "values"),
+											  stattype, type, 0);
+		slot++;
+	}
+
+	/* Include Histogram slot if exists */
+	if (DatumGetPointer(hst))
+	{
+		slot->kind = STATISTIC_KIND_HISTOGRAM;
+		slot->opid = ltop;
+		slot->numbers = jsonStatsConvertArray(jsonGetField(hst, "numbers"),
+											  JsonStatFloat4, FLOAT4OID, 1.0);
+		slot->values  = jsonStatsConvertArray(jsonGetField(hst, "values"),
+											  stattype, type, 0);
+		slot++;
+	}
+
+	/* Include Correlation slot if exists */
+	if (DatumGetPointer(corr))
+	{
+		Datum		correlation = Float4GetDatum(jsonGetFloat4(corr, 0));
+
+		slot->kind = STATISTIC_KIND_CORRELATION;
+		slot->opid = ltop;
+		slot->numbers = PointerGetDatum(construct_array(&correlation, 1,
+														FLOAT4OID, 4, true,
+														'i'));
+		slot++;
+	}
+
+	/* Include JSON statistics for a given path and possibly for its subpaths */
+	if ((stattype == JsonStatJsonb ||
+		 stattype == JsonStatJsonbWithoutSubpaths) &&
+		jsonAnalyzeBuildSubPathsData(pstats->data->pathdatums,
+									 pstats->data->npaths,
+									 pstats->datum - pstats->data->pathdatums,
+									 pstats->path,
+									 pstats->pathlen,
+									 stattype == JsonStatJsonb,
+									 nullfrac,
+									 &slot->values,
+									 &slot->numbers))
+	{
+		slot->kind = STATISTIC_KIND_JSON;
+		slot++;
+	}
+
+	return true;
+}
+
+static float4
+jsonPathStatsGetFloat(JsonPathStats pstats, const char *key, float4 defaultval)
+{
+	if (!pstats)
+		return defaultval;
+
+	return jsonGetFloat4(jsonGetField(*pstats->datum, key), defaultval);
+}
+
+float4
+jsonPathStatsGetFreq(JsonPathStats pstats, float4 defaultfreq)
+{
+	return jsonPathStatsGetFloat(pstats, "freq", defaultfreq);
+}
+
+float4
+jsonPathStatsGetAvgArraySize(JsonPathStats pstats)
+{
+	return jsonPathStatsGetFloat(pstats, "avg_array_length", 1.0);
+}
+
+/*
+ * jsonPathStatsGetTypeFreq
+ *		Get frequency of different JSON object types for a given path.
+ *
+ * JSON documents don't have any particular schema, and the same path may point
+ * to values with different types in multiple documents. Consider for example
+ * two documents {"a" : "b"} and {"a" : 100} which have both a string and int
+ * for the same path. So we track the frequency of different JSON types for
+ * each path, so that we can consider this later.
+ */
+float4
+jsonPathStatsGetTypeFreq(JsonPathStats pstats, JsonbValueType type,
+						 float4 defaultfreq)
+{
+	const char *key;
+
+	if (!pstats)
+		return defaultfreq;
+
+	/*
+	 * When dealing with (object/array) length stats, we only really care about
+	 * objects and arrays.
+	 *
+	 * Lengths are always numeric, so simply return 0 if requested frequency
+	 * of non-numeric values.
+	 */
+	if (pstats->type == JsonPathStatsArrayLength)
+	{
+		if (type != jbvNumeric)
+			return 0.0;
+
+		return jsonPathStatsGetFloat(pstats, "freq_array", defaultfreq);
+	}
+
+	if (pstats->type == JsonPathStatsObjectLength)
+	{
+		if (type != jbvNumeric)
+			return 0.0;
+
+		return jsonPathStatsGetFloat(pstats, "freq_object", defaultfreq);
+	}
+
+	/* Which JSON type are we interested in? Pick the right freq_type key. */
+	switch (type)
+	{
+		case jbvNull:
+			key = "freq_null";
+			break;
+		case jbvString:
+			key = "freq_string";
+			break;
+		case jbvNumeric:
+			key = "freq_numeric";
+			break;
+		case jbvBool:
+			key = "freq_boolean";
+			break;
+		case jbvObject:
+			key = "freq_object";
+			break;
+		case jbvArray:
+			key = "freq_array";
+			break;
+		default:
+			elog(ERROR, "Invalid jsonb value type: %d", type);
+			break;
+	}
+
+	return jsonPathStatsGetFloat(pstats, key, defaultfreq);
+}
+
+/*
+ * jsonPathStatsFormTuple
+ *		For a pg_statistic tuple representing JSON statistics.
+ *
+ * XXX Maybe it's a bit expensive to first build StatsData and then transform it
+ * again while building the tuple. Could it be done in a single step? Would it be
+ * more efficient? Not sure how expensive it actually is, though.
+ */
+static HeapTuple
+jsonPathStatsFormTuple(JsonPathStats pstats, JsonStatType type, float4 nullfrac)
+{
+	StatsData	statdata;
+
+	if (!pstats || !pstats->datum)
+		return NULL;
+
+	/*
+	 * If it is the ordinary root path stats, there is no need to transform
+	 * the tuple, it can be simply copied.
+	 */
+	if (pstats->datum == &pstats->data->pathdatums[0] &&
+		pstats->type == JsonPathStatsValues)
+		return heap_copytuple(pstats->data->statsTuple);
+
+	MemSet(&statdata, 0, sizeof(statdata));
+
+	if (!jsonPathStatsExtractData(pstats, type, nullfrac, &statdata))
+		return NULL;
+
+	return stats_form_tuple(&statdata);
+}
+
+/*
+ * jsonStatsGetPathTuple
+ *		Extract JSON statistics for a text[] path and form pg_statistics tuple.
+ */
+static HeapTuple
+jsonStatsGetPathTuple(JsonStats jsdata, JsonStatType type,
+					  Datum *path, int pathlen, bool try_arrays_indexes)
+{
+	float4			nullfrac;
+	JsonPathStats	pstats = jsonStatsGetPath(jsdata, path, pathlen,
+											  try_arrays_indexes, &nullfrac);
+
+	return jsonPathStatsFormTuple(pstats, type, nullfrac);
+}
+
+/*
+ * jsonStatsGetArrayIndexStatsTuple
+ *		Extract JSON statistics for a array index and form pg_statistics tuple.
+ */
+static HeapTuple
+jsonStatsGetArrayIndexStatsTuple(JsonStats jsdata, JsonStatType type, int32 index)
+{
+	/* Extract statistics for root array elements */
+	JsonPathStats arrstats = jsonStatsGetRootArrayPath(jsdata);
+	JsonPathStats rootstats;
+	Selectivity	index_sel;
+
+	if (!arrstats)
+		return NULL;
+
+	/* Compute relative selectivity of 'EXISTS($[index])' */
+	rootstats = jsonStatsGetRootPath(jsdata);
+	index_sel = jsonPathStatsGetArrayIndexSelectivity(rootstats, index);
+	index_sel /= jsonPathStatsGetFreq(arrstats, 0.0);
+
+	/* Form pg_statistics tuple, taking into account array index selectivity */
+	return jsonPathStatsFormTuple(arrstats, type, 1.0 - index_sel);
+}
+
+/*
+ * jsonStatsGetPathFreq
+ *		Return frequency of a path (fraction of documents containing it).
+ */
+static float4
+jsonStatsGetPathFreq(JsonStats jsdata, Datum *path, int pathlen,
+					 bool try_array_indexes)
+{
+	float4		nullfrac;
+	JsonPathStats pstats = jsonStatsGetPath(jsdata, path, pathlen,
+											try_array_indexes, &nullfrac);
+	float4		freq = (1.0 - nullfrac) * jsonPathStatsGetFreq(pstats, 0.0);
+
+	CLAMP_PROBABILITY(freq);
+	return freq;
+}
+
+/*
+ * jsonbStatsVarOpConst
+ *		Prepare optimizer statistics for a given operator, from JSON stats.
+ *
+ * This handles only OpExpr expressions, with variable and a constant. We get
+ * the constant as is, and the variable is represented by statistics fetched
+ * by get_restriction_variable().
+ *
+ * opid    - OID of the operator (input parameter)
+ * resdata - pointer to calculated statistics for result of operator
+ * vardata - statistics for the restriction variable
+ * cnst    - constant from the operator expression
+ *
+ * Returns true when useful optimizer statistics have been calculated.
+ */
+static bool
+jsonbStatsVarOpConst(Oid opid, VariableStatData *resdata,
+					 const VariableStatData *vardata, Const *cnst)
+{
+	JsonStatData jsdata;
+	JsonStatType statype = JsonStatJsonb;
+
+	if (!jsonStatsInit(&jsdata, vardata))
+		return false;
+
+	switch (opid)
+	{
+		case JsonbObjectFieldTextOperator:
+			statype = JsonStatText;
+			/* FALLTHROUGH */
+		case JsonbObjectFieldOperator:
+		{
+			if (cnst->consttype != TEXTOID)
+			{
+				jsonStatsRelease(&jsdata);
+				return false;
+			}
+
+			resdata->statsTuple = jsonStatsGetPathTuple(&jsdata, statype,
+														&cnst->constvalue, 1,
+														false);
+			break;
+		}
+
+		case JsonbArrayElementTextOperator:
+			statype = JsonStatText;
+			/* FALLTHROUGH */
+		case JsonbArrayElementOperator:
+		{
+			if (cnst->consttype != INT4OID)
+			{
+				jsonStatsRelease(&jsdata);
+				return false;
+			}
+
+			resdata->statsTuple =
+				jsonStatsGetArrayIndexStatsTuple(&jsdata, statype,
+												 DatumGetInt32(cnst->constvalue));
+			break;
+		}
+
+		case JsonbExtractPathTextOperator:
+			statype = JsonStatText;
+			/* FALLTHROUGH */
+		case JsonbExtractPathOperator:
+		{
+			Datum	   *path;
+			bool	   *nulls;
+			int			pathlen;
+			bool		have_nulls = false;
+
+			if (cnst->consttype != TEXTARRAYOID)
+			{
+				jsonStatsRelease(&jsdata);
+				return false;
+			}
+
+			deconstruct_array(DatumGetArrayTypeP(cnst->constvalue), TEXTOID,
+							  -1, false, 'i', &path, &nulls, &pathlen);
+
+			for (int i = 0; i < pathlen; i++)
+			{
+				if (nulls[i])
+				{
+					have_nulls = true;
+					break;
+				}
+			}
+
+			if (!have_nulls)
+				resdata->statsTuple = jsonStatsGetPathTuple(&jsdata, statype,
+															path, pathlen,
+															true);
+
+			pfree(path);
+			pfree(nulls);
+			break;
+		}
+
+		default:
+			jsonStatsRelease(&jsdata);
+			return false;
+	}
+
+	if (!resdata->statsTuple)
+		resdata->statsTuple = stats_form_tuple(NULL);	/* form all-NULL tuple */
+
+	resdata->acl_ok = vardata->acl_ok;
+	resdata->freefunc = heap_freetuple;
+	Assert(resdata->rel == vardata->rel);
+	Assert(resdata->atttype ==
+		(statype == JsonStatJsonb ? JSONBOID :
+		 statype == JsonStatText ? TEXTOID :
+		 /* statype == JsonStatFreq */ BOOLOID));
+
+	jsonStatsRelease(&jsdata);
+	return true;
+}
+
+/*
+ * jsonb_stats
+ *		Statistics estimation procedure for JSONB data type.
+ *
+ * This only supports OpExpr expressions, with (Var op Const) shape.
+ *
+ * Var really can be a chain of OpExprs with derived statistics
+ * (jsonb_column -> 'key1' -> key2'), because get_restriction_variable()
+ * already handles this case.
+ */
+Datum
+jsonb_stats(PG_FUNCTION_ARGS)
+{
+	PlannerInfo *root = (PlannerInfo *) PG_GETARG_POINTER(0);
+	OpExpr	   *opexpr = (OpExpr *) PG_GETARG_POINTER(1);
+	int			varRelid = PG_GETARG_INT32(2);
+	VariableStatData *resdata	= (VariableStatData *) PG_GETARG_POINTER(3);
+	VariableStatData vardata;
+	Node	   *constexpr;
+	bool		varonleft;
+
+	/* should only be called for OpExpr expressions */
+	Assert(IsA(opexpr, OpExpr));
+
+	/* Is the expression simple enough? (Var op Const) or similar? */
+	if (!get_restriction_variable(root, opexpr->args, varRelid,
+								  &vardata, &constexpr, &varonleft))
+		PG_RETURN_VOID();
+
+	/* XXX Could we also get varonleft=false in useful cases? */
+	if (IsA(constexpr, Const) && varonleft)
+		jsonbStatsVarOpConst(opexpr->opno, resdata, &vardata,
+							 (Const *) constexpr);
+
+	ReleaseVariableStats(vardata);
+
+	PG_RETURN_VOID();
+}
+
+/*
+ * jsonSelectivity
+ *		Use JSON statistics to estimate selectivity for (in)equalities.
+ *
+ * The statistics is represented as (arrays of) JSON values etc. so we
+ * need to pass the right operators to the functions.
+ */
+Selectivity
+jsonSelectivity(JsonPathStats stats, Datum scalar, Oid operator)
+{
+	VariableStatData vardata;
+	Selectivity sel;
+
+	if (!stats)
+		return 0.0;
+
+	vardata.atttype = JSONBOID;
+	vardata.atttypmod = -1;
+	vardata.isunique = false;
+	vardata.rel = stats->data->rel;
+	vardata.var = NULL;
+	vardata.vartype = JSONBOID;
+	vardata.acl_ok = stats->data->acl_ok;
+	vardata.statsTuple = jsonPathStatsFormTuple(stats,
+												JsonStatJsonbWithoutSubpaths, 0.0);
+
+	if (operator == JsonbEqOperator)
+		sel = var_eq_const(&vardata, operator, InvalidOid, scalar, false, true, false);
+	else
+		sel = scalarineqsel(NULL, operator,
+							/* is it greater or greater-or-equal? */
+							operator == JsonbGtOperator ||
+							operator == JsonbGeOperator,
+							/* is it equality? */
+							operator == JsonbLeOperator ||
+							operator == JsonbGeOperator,
+							InvalidOid,
+							&vardata, scalar, JSONBOID);
+
+	if (vardata.statsTuple)
+		heap_freetuple(vardata.statsTuple);
+
+	return sel;
+}
+
+/*
+ * jsonAccumulateSubPathSelectivity
+ *		Transform absolute subpath selectivity into relative and accumulate it
+ *		into parent path simply by multiplication of relative selectivities.
+ */
+static void
+jsonAccumulateSubPathSelectivity(Selectivity subpath_abs_sel,
+								 Selectivity path_freq,
+								 Selectivity *path_relative_sel,
+								 JsonPathStats array_path_stats)
+{
+	Selectivity sel = subpath_abs_sel / path_freq;	/* relative selectivity */
+
+	/* XXX Try to take into account array length */
+	if (array_path_stats)
+		sel = 1.0 - pow(1.0 - sel,
+						jsonPathStatsGetAvgArraySize(array_path_stats));
+
+	/* Accumulate selectivity of subpath into parent path */
+	*path_relative_sel *= sel;
+}
+
+/*
+ * jsonSelectivityContains
+ *		Estimate selectivity for containment operator on JSON.
+ *
+ * Iterate through query jsonb elements, build paths to its leaf elements,
+ * calculate selectivies of 'path == scalar' in leaves, multiply relative
+ * selectivities of subpaths at each path level, propagate computed
+ * selectivities to the root.
+ */
+static Selectivity
+jsonSelectivityContains(JsonStats stats, Jsonb *jb)
+{
+	JsonbValue		v;
+	JsonbIterator  *it;
+	JsonbIteratorToken r;
+	StringInfoData	pathstr;	/* path string */
+	struct Path					/* path stack entry */
+	{
+		struct Path *parent;	/* parent entry */
+		int			len;		/* associated length of pathstr */
+		Selectivity	freq;		/* absolute frequence of path */
+		Selectivity	sel;		/* relative selectivity of subpaths */
+		JsonPathStats stats;	/* statistics for the path */
+		bool		is_array_accesor;	/* is it '[*]' ? */
+	}			root,			/* root path entry */
+			   *path = &root;	/* path entry stack */
+	Selectivity	sel;			/* resulting selectivity */
+	Selectivity	scalarSel;		/* selectivity of 'jsonb == scalar' */
+
+	/* Initialize root path string */
+	initStringInfo(&pathstr);
+	appendBinaryStringInfo(&pathstr, stats->prefix, stats->prefixlen);
+
+	/* Initialize root path entry */
+	root.parent = NULL;
+	root.len = pathstr.len;
+	root.stats = jsonStatsFindPath(stats, pathstr.data, pathstr.len);
+	root.freq = jsonPathStatsGetFreq(root.stats, 0.0);
+	root.sel = 1.0;
+	root.is_array_accesor = pathstr.data[pathstr.len - 1] == ']';
+
+	/* Return 0, if NULL fraction is 1. */
+	if (root.freq <= 0.0)
+		return 0.0;
+
+	/*
+	 * Selectivity of query 'jsonb @> scalar' consists of  selectivities of
+	 * 'jsonb == scalar' and 'jsonb[*] == scalar'.  Selectivity of
+	 * 'jsonb[*] == scalar' will be computed in root.sel, but for
+	 * 'jsonb == scalar' we need additional computation.
+	 */
+	if (JsonContainerIsScalar(&jb->root))
+		scalarSel = jsonSelectivity(root.stats, JsonbPGetDatum(jb),
+									JsonbEqOperator);
+	else
+		scalarSel = 0.0;
+
+	it = JsonbIteratorInit(&jb->root);
+
+	while ((r = JsonbIteratorNext(&it, &v, false)) != WJB_DONE)
+	{
+		switch (r)
+		{
+			case WJB_BEGIN_OBJECT:
+			{
+				struct Path *p;
+				Selectivity freq =
+					jsonPathStatsGetTypeFreq(path->stats, jbvObject, 0.0);
+
+				/* If there are no objects, selectivity is 0. */
+				if (freq <= 0.0)
+					return 0.0;
+
+				/*
+				 * Push path entry for object keys, actual key names are
+				 * appended later in WJB_KEY case.
+				 */
+				p = palloc(sizeof(*p));
+				p->len = pathstr.len;
+				p->parent = path;
+				p->stats = NULL;
+				p->freq = freq;
+				p->sel = 1.0;
+				p->is_array_accesor = false;
+				path = p;
+				break;
+			}
+
+			case WJB_BEGIN_ARRAY:
+			{
+				struct Path *p;
+				JsonPathStats pstats;
+				Selectivity freq;
+
+				/*
+				 * First, find stats for the parent path if needed, it will be
+				 * used in jsonAccumulateSubPathSelectivity().
+				 */
+				if (!path->stats)
+					path->stats = jsonStatsFindPath(stats, pathstr.data,
+													pathstr.len);
+
+				/* Appeend path string entry for array elements, get stats. */
+				jsonPathAppendEntry(&pathstr, NULL);
+				pstats = jsonStatsFindPath(stats, pathstr.data, pathstr.len);
+				freq = jsonPathStatsGetFreq(pstats, 0.0);
+
+				/* If there are no arrays, return 0 or scalar selectivity */
+				if (freq <= 0.0)
+					return scalarSel;
+
+				/* Push path entry for array elements. */
+				p = palloc(sizeof(*p));
+				p->len = pathstr.len;
+				p->parent = path;
+				p->stats = pstats;
+				p->freq = freq;
+				p->sel = 1.0;
+				p->is_array_accesor = true;
+				path = p;
+				break;
+			}
+
+			case WJB_END_OBJECT:
+			case WJB_END_ARRAY:
+			{
+				struct Path *p = path;
+				/* Absoulte selectivity of the path with its all subpaths */
+				Selectivity abs_sel = p->sel * p->freq;
+
+				/* Pop last path entry */
+				path = path->parent;
+				pfree(p);
+				pathstr.len = path->len;
+				pathstr.data[pathstr.len] = '\0';
+
+				/* Accumulate selectivity into parent path */
+				jsonAccumulateSubPathSelectivity(abs_sel, path->freq,
+												 &path->sel,
+												 path->is_array_accesor ?
+												 path->parent->stats : NULL);
+				break;
+			}
+
+			case WJB_KEY:
+			{
+				/* Remove previous key in the path string */
+				pathstr.len = path->parent->len;
+				pathstr.data[pathstr.len] = '\0';
+
+				/* Append current key to path string */
+				jsonPathAppendEntryWithLen(&pathstr, v.val.string.val,
+										   v.val.string.len);
+				path->len = pathstr.len;
+				break;
+			}
+
+			case WJB_VALUE:
+			case WJB_ELEM:
+			{
+				/*
+				 * Extract statistics for a path.  Array elements share the
+				 * same statistics that was extracted in WJB_BEGIN_ARRAY.
+				 */
+				JsonPathStats pstats = r == WJB_ELEM ? path->stats :
+					jsonStatsFindPath(stats, pathstr.data, pathstr.len);
+				Selectivity abs_sel;	/* Absolute selectivity of 'path == scalar' */
+
+				if (pstats)
+				{
+					/* Make scalar jsonb datum and compute selectivity */
+					Datum		scalar = JsonbPGetDatum(JsonbValueToJsonb(&v));
+
+					abs_sel = jsonSelectivity(pstats, scalar, JsonbEqOperator);
+				}
+				else
+					abs_sel = 0.0;
+
+				/* Accumulate selectivity into parent path */
+				jsonAccumulateSubPathSelectivity(abs_sel, path->freq,
+												 &path->sel,
+												 path->is_array_accesor ?
+												 path->parent->stats : NULL);
+				break;
+			}
+
+			default:
+				break;
+		}
+	}
+
+	/* Compute absolute selectivity for root, including raw scalar case. */
+	sel = root.sel * root.freq + scalarSel;
+	CLAMP_PROBABILITY(sel);
+	return sel;
+}
+
+/*
+ * jsonSelectivityExists
+ *		Estimate selectivity for JSON "exists" operator.
+ */
+static Selectivity
+jsonSelectivityExists(JsonStats stats, Datum key)
+{
+	JsonPathStats rootstats;
+	JsonPathStats arrstats;
+	JsonbValue	jbvkey;
+	Datum		jbkey;
+	Selectivity keysel;
+	Selectivity scalarsel;
+	Selectivity arraysel;
+	Selectivity sel;
+
+	JsonValueInitStringWithLen(&jbvkey,
+							   VARDATA_ANY(key), VARSIZE_ANY_EXHDR(key));
+
+	jbkey = JsonbPGetDatum(JsonbValueToJsonb(&jbvkey));
+
+	keysel = jsonStatsGetPathFreq(stats, &key, 1, false);
+
+	rootstats = jsonStatsGetRootPath(stats);
+	scalarsel = jsonSelectivity(rootstats, jbkey, JsonbEqOperator);
+
+	arrstats = jsonStatsGetRootArrayPath(stats);
+	arraysel = jsonSelectivity(arrstats, jbkey, JsonbEqOperator);
+	arraysel = 1.0 - pow(1.0 - arraysel,
+						 jsonPathStatsGetAvgArraySize(rootstats));
+
+	sel = keysel + scalarsel + arraysel;
+	CLAMP_PROBABILITY(sel);
+	return sel;
+}
+
+static Selectivity
+jsonb_sel_internal(JsonStats stats, Oid operator, Const *cnst, bool varonleft)
+{
+	switch (operator)
+	{
+		case JsonbExistsOperator:
+			if (!varonleft || cnst->consttype != TEXTOID)
+				break;
+
+			return jsonSelectivityExists(stats, cnst->constvalue);
+
+		case JsonbExistsAnyOperator:
+		case JsonbExistsAllOperator:
+		{
+			Datum	   *keys;
+			bool	   *nulls;
+			Selectivity	freq = 1.0;
+			int			nkeys;
+			int			i;
+			bool		all = operator == JsonbExistsAllOperator;
+
+			if (!varonleft || cnst->consttype != TEXTARRAYOID)
+				break;
+
+			deconstruct_array(DatumGetArrayTypeP(cnst->constvalue), TEXTOID,
+							  -1, false, 'i', &keys, &nulls, &nkeys);
+
+			for (i = 0; i < nkeys; i++)
+				if (!nulls[i])
+				{
+					Selectivity pathfreq = jsonSelectivityExists(stats,
+																 keys[i]);
+					freq *= all ? pathfreq : (1.0 - pathfreq);
+				}
+
+			pfree(keys);
+			pfree(nulls);
+
+			if (!all)
+				freq = 1.0 - freq;
+
+			return freq;
+		}
+
+		case JsonbContainedOperator:
+			if (varonleft || cnst->consttype != JSONBOID)
+				break;
+
+			return jsonSelectivityContains(stats,
+										   DatumGetJsonbP(cnst->constvalue));
+
+		case JsonbContainsOperator:
+			if (!varonleft || cnst->consttype != JSONBOID)
+				break;
+
+			return jsonSelectivityContains(stats,
+										   DatumGetJsonbP(cnst->constvalue));
+
+		default:
+			break;
+	}
+
+	return DEFAULT_JSON_CONTAINS_SEL;
+}
+
+/*
+ * jsonb_sel
+ *		The main procedure estimating selectivity for all JSONB operators.
+ */
+Datum
+jsonb_sel(PG_FUNCTION_ARGS)
+{
+	PlannerInfo *root = (PlannerInfo *) PG_GETARG_POINTER(0);
+	Oid			operator = PG_GETARG_OID(1);
+	List	   *args = (List *) PG_GETARG_POINTER(2);
+	int			varRelid = PG_GETARG_INT32(3);
+	double		sel = DEFAULT_JSON_CONTAINS_SEL;
+	Node	   *other;
+	bool		varonleft;
+	VariableStatData vardata;
+
+	if (get_restriction_variable(root, args, varRelid,
+								  &vardata, &other, &varonleft))
+	{
+		if (IsA(other, Const))
+		{
+			Const	   *cnst = (Const *) other;
+
+			if (cnst->constisnull)
+				sel = 0.0;
+			else
+			{
+				JsonStatData stats;
+
+				if (jsonStatsInit(&stats, &vardata))
+				{
+					sel = jsonb_sel_internal(&stats, operator, cnst, varonleft);
+					jsonStatsRelease(&stats);
+				}
+			}
+		}
+
+		ReleaseVariableStats(vardata);
+	}
+
+	PG_RETURN_FLOAT8((float8) sel);
+}
diff --git a/src/backend/utils/adt/jsonb_typanalyze.c b/src/backend/utils/adt/jsonb_typanalyze.c
new file mode 100644
index 00000000000..65551f6ff93
--- /dev/null
+++ b/src/backend/utils/adt/jsonb_typanalyze.c
@@ -0,0 +1,1360 @@
+/*-------------------------------------------------------------------------
+ *
+ * jsonb_typanalyze.c
+ *	  Functions for gathering statistics from jsonb columns
+ *
+ * Copyright (c) 2016-2022, PostgreSQL Global Development Group
+ *
+ * Functions in this module are used to analyze contents of JSONB columns
+ * and build optimizer statistics. In principle we extract paths from all
+ * sampled documents and calculate the usual statistics (MCV, histogram)
+ * for each path - in principle each path is treated as a column.
+ *
+ * Because we're not enforcing any JSON schema, the documents may differ
+ * a lot - the documents may contain large number of different keys, the
+ * types of values may be entirely different, etc. This makes it more
+ * challenging than building stats for regular columns. For example not
+ * only do we need to decide which values to keep in the MCV, but also
+ * which paths to keep (in case the documents are so variable we can't
+ * keep all paths).
+ *
+ * The statistics is stored in pg_statistic, in a slot with a new stakind
+ * value (STATISTIC_KIND_JSON). The statistics is serialized as an array
+ * of JSONB values, eash element storing statistics for one path.
+ *
+ * For each path, we store the following keys:
+ *
+ * - path         - path this stats is for, serialized as jsonpath
+ * - freq         - frequency of documents containing this path
+ * - json         - the regular per-column stats (MCV, histogram, ...)
+ * - freq_null    - frequency of JSON null values
+ * - freq_array   - frequency of JSON array values
+ * - freq_object  - frequency of JSON object values
+ * - freq_string  - frequency of JSON string values
+ * - freq_numeric - frequency of JSON numeric values
+ *
+ * This is stored in the stavalues array.
+ *
+ * The first element of stavalues is a path prefix.  It is used for avoiding
+ * path transformations when the derived statistics for the chains of ->
+ * operators is computed.
+ *
+ * The per-column stats (stored in the "json" key) have additional internal
+ * structure, to allow storing multiple stakind types (histogram, mcv). See
+ * jsonAnalyzeMakeScalarStats for details.
+ *
+ *
+ * XXX It's a bit weird the "regular" stats are stored in the "json" key,
+ * while the JSON stats (frequencies of different JSON types) are right
+ * at the top level.
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/adt/jsonb_typanalyze.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "fmgr.h"
+#include "access/hash.h"
+#include "access/detoast.h"
+#include "catalog/pg_attribute.h"
+#include "catalog/pg_operator.h"
+#include "catalog/pg_type.h"
+#include "commands/vacuum.h"
+#include "utils/builtins.h"
+#include "utils/guc.h"
+#include "utils/hsearch.h"
+#include "utils/json.h"
+#include "utils/jsonb.h"
+#include "utils/json_selfuncs.h"
+#include "utils/lsyscache.h"
+#include "utils/memutils.h"
+
+typedef struct JsonPathEntry JsonPathEntry;
+
+/*
+ * Element of a path in the JSON document (i.e. not jsonpath). Elements
+ * are linked together to build longer paths.
+ *
+ * 'entry' can be not zero-terminated when it is pointing to JSONB keys, so
+ * 'len' is necessary.  'len' is also used for faster entry comparison, to
+ * distinguish array entries ('len' == -1).
+ */
+typedef struct JsonPathEntry
+{
+	JsonPathEntry  *parent;
+	const char	   *entry;		/* element of the path as a string */
+	int				len;		/* length of entry string (may be 0 or -1) */
+	uint32			hash;		/* hash of the whole path (with parent) */
+} JsonPathEntry;
+
+#define JsonPathEntryIsArray(entry) ((entry)->len == -1)
+
+/*
+ * An array containing a dynamic number of values extracted from JSON documents.
+ * All values should have the same data type:
+ *		jsonb   - ordinary path stats, values of different JSON types
+ *		int32   - array/object length stats
+ *		text    - separate stats fro strings
+ *		numeric - separate stats fro numbers
+ */
+typedef struct JsonValues
+{
+	Datum	   *buf;
+	int			count;
+	int			allocated;
+} JsonValues;
+
+/*
+ * Scalar statistics built for an array of values, extracted from a JSON
+ * document (for one particular path).
+ */
+typedef struct JsonScalarStats
+{
+	JsonValues		values;
+	VacAttrStats	stats;
+} JsonScalarStats;
+
+/*
+ * Statistics calculated for a set of values.
+ *
+ *
+ * XXX This seems rather complicated and needs simplification. We're not
+ * really using all the various JsonScalarStats bits, there's a lot of
+ * duplication (e.g. each JsonScalarStats contains it's own array, which
+ * has a copy of data from the one in "jsons").
+ */
+typedef struct JsonValueStats
+{
+	JsonScalarStats	jsons;		/* stats for all JSON types together */
+
+#ifdef JSON_ANALYZE_SCALARS		/* XXX */
+	JsonScalarStats	strings;	/* stats for JSON strings */
+	JsonScalarStats	numerics;	/* stats for JSON numerics */
+#endif
+
+	JsonScalarStats	arrlens;	/* stats of array lengths */
+	JsonScalarStats	objlens;	/* stats of object lengths */
+
+	int				nnulls;		/* number of JSON null values */
+	int				ntrue;		/* number of JSON true values */
+	int				nfalse;		/* number of JSON false values */
+	int				nobjects;	/* number of JSON objects */
+	int				narrays;	/* number of JSON arrays */
+	int				nstrings;	/* number of JSON strings */
+	int				nnumerics;	/* number of JSON numerics */
+
+	int64			narrelems;	/* total number of array elements
+								 * (for avg. array length) */
+} JsonValueStats;
+
+/* Main structure for analyzed JSON path  */
+typedef struct JsonPathAnlStats
+{
+	JsonPathEntry path;		/* path entry chain, used for hashing */
+	JsonValueStats vstats;	/* collected values and raw computed stats */
+	Jsonb	   *stats;		/* stats converted into jsonb form */
+	char	   *pathstr;	/* full path string */
+	double		freq;		/* frequence of the path */
+	int			depth;		/* nesting level, i.e. path length */
+} JsonPathAnlStats;
+
+/* various bits needed while analyzing JSON */
+typedef struct JsonAnalyzeContext
+{
+	VacAttrStats		   *stats;
+	MemoryContext			mcxt;
+	AnalyzeAttrFetchFunc	fetchfunc;
+	HTAB				   *pathshash;
+	JsonPathAnlStats	   *root;
+	double					totalrows;
+	double					total_width;
+	int						samplerows;
+	int						target;
+	int						null_cnt;
+	int						analyzed_cnt;
+	int						maxdepth;
+	bool					scalarsOnly;
+} JsonAnalyzeContext;
+
+/*
+ * JsonPathMatch
+ *		Determine when two JSON paths (list of JsonPathEntry) match.
+ *
+ * Returned int instead of bool, because it is an implementation of
+ * HashCompareFunc.
+ */
+static int
+JsonPathEntryMatch(const void *key1, const void *key2, Size keysize)
+{
+	const JsonPathEntry *path1 = key1;
+	const JsonPathEntry *path2 = key2;
+
+	return path1->parent != path2->parent ||
+		   path1->len != path2->len ||
+		   (path1->len > 0 &&
+			strncmp(path1->entry, path2->entry, path1->len));
+}
+
+/*
+ * JsonPathHash
+ *		Calculate hash of the path entry.
+ *
+ * Parent hash should be already calculated.
+ */
+static uint32
+JsonPathEntryHash(const void *key, Size keysize)
+{
+	const JsonPathEntry	   *path = key;
+	uint32					hash = path->parent ? path->parent->hash : 0;
+
+	hash = (hash << 1) | (hash >> 31);
+	hash ^= path->len < 0 ? 0 :
+		DatumGetUInt32(hash_any((const unsigned char *) path->entry, path->len));
+
+	return hash;
+}
+
+/*
+ * jsonAnalyzeAddPath
+ *		Add an entry for a JSON path to the working list of statistics.
+ *
+ * Returns a pointer to JsonPathAnlStats (which might have already existed
+ * if the path was in earlier document), which can then be populated or
+ * updated.
+ */
+static inline JsonPathAnlStats *
+jsonAnalyzeAddPath(JsonAnalyzeContext *ctx, JsonPathEntry *parent,
+				   const char *entry, int len)
+{
+	JsonPathEntry path;
+	JsonPathAnlStats *stats;
+	bool		found;
+
+	/* Init path entry */
+	path.parent = parent;
+	path.entry = entry;
+	path.len = len;
+	path.hash = JsonPathEntryHash(&path, 0);
+
+	/* See if we already saw this path earlier. */
+	stats = hash_search_with_hash_value(ctx->pathshash, &path, path.hash,
+										HASH_ENTER, &found);
+
+	/*
+	 * Nope, it's the first time we see this path, so initialize all the
+	 * fields (path string, counters, ...).
+	 */
+	if (!found)
+	{
+		JsonPathAnlStats *parent = (JsonPathAnlStats *) stats->path.parent;
+		JsonPathEntry *path = &stats->path;
+		const char *ppath = parent->pathstr;
+		StringInfoData si;
+		MemoryContext oldcxt = MemoryContextSwitchTo(ctx->mcxt);
+
+		/* NULL entries are treated as wildcard array accessors "[*]" */
+		if (path->entry)
+			/* Copy path entry name into the right MemoryContext */
+			path->entry = pnstrdup(path->entry, path->len);
+
+		/* Initialze full path string */
+		initStringInfo(&si);
+		appendStringInfoString(&si, ppath);
+		jsonPathAppendEntry(&si, path->entry);
+		stats->pathstr = si.data;
+
+		MemoryContextSwitchTo(oldcxt);
+
+		/* initialize the stats counter for this path entry */
+		memset(&stats->vstats, 0, sizeof(JsonValueStats));
+		stats->stats = NULL;
+		stats->freq = 0.0;
+		stats->depth = parent->depth + 1;
+
+		/* update maximal depth */
+		if (ctx->maxdepth < stats->depth)
+			ctx->maxdepth = stats->depth;
+	}
+
+	return stats;
+}
+
+/*
+ * JsonValuesAppend
+ *		Add a JSON value to the dynamic array (enlarge it if needed).
+ *
+ * XXX This is likely one of the problems - the documents may be pretty
+ * large, with a lot of different values for each path. At that point
+ * it's problematic to keep all of that in memory at once. So maybe we
+ * need to introduce some sort of compaction (e.g. we could try
+ * deduplicating the values), limit on size of the array or something.
+ */
+static inline void
+JsonValuesAppend(JsonValues *values, Datum value, int initialSize)
+{
+	if (values->count >= values->allocated)
+	{
+		if (values->allocated)
+		{
+			values->allocated = values->allocated * 2;
+			values->buf = repalloc(values->buf,
+									sizeof(values->buf[0]) * values->allocated);
+		}
+		else
+		{
+			values->allocated = initialSize;
+			values->buf = palloc(sizeof(values->buf[0]) * values->allocated);
+		}
+	}
+
+	values->buf[values->count++] = value;
+}
+
+/*
+ * jsonAnalyzeJsonValue
+ *		Process a value extracted from the document (for a given path).
+ */
+static inline void
+jsonAnalyzeJsonValue(JsonAnalyzeContext *ctx, JsonValueStats *vstats,
+					 JsonbValue *jv)
+{
+	JsonbValue		   *jbv;
+	JsonbValue			jbvtmp;
+	Datum				value;
+
+	/* XXX if analyzing only scalar values, make containers empty */
+	if (ctx->scalarsOnly && jv->type == jbvBinary)
+	{
+		if (JsonContainerIsObject(jv->val.binary.data))
+			jbv = JsonValueInitObject(&jbvtmp, 0, 0);
+		else
+		{
+			Assert(JsonContainerIsArray(jv->val.binary.data));
+			jbv = JsonValueInitArray(&jbvtmp, 0, 0, false);
+		}
+	}
+	else
+		jbv = jv;
+
+	/* always add it to the "global" JSON stats, shared by all types */
+	JsonValuesAppend(&vstats->jsons.values,
+					 JsonbPGetDatum(JsonbValueToJsonb(jbv)),
+					 ctx->target);
+
+	/* also update the type-specific counters */
+	switch (jv->type)
+	{
+		case jbvNull:
+			vstats->nnulls++;
+			break;
+
+		case jbvBool:
+			if (jv->val.boolean)
+				vstats->ntrue++;
+			else
+				vstats->nfalse++;
+			break;
+
+		case jbvString:
+			vstats->nstrings++;
+#ifdef JSON_ANALYZE_SCALARS
+			value = PointerGetDatum(
+						cstring_to_text_with_len(jv->val.string.val,
+												 jv->val.string.len));
+			JsonValuesAppend(&vstats->strings.values, value, ctx->target);
+#endif
+			break;
+
+		case jbvNumeric:
+			vstats->nnumerics++;
+#ifdef JSON_ANALYZE_SCALARS
+			value = PointerGetDatum(jv->val.numeric);
+			JsonValuesAppend(&vstats->numerics.values, value, ctx->target);
+#endif
+			break;
+
+		case jbvBinary:
+			if (JsonContainerIsObject(jv->val.binary.data))
+			{
+				uint32		size = JsonContainerSize(jv->val.binary.data);
+
+				value = DatumGetInt32(size);
+				vstats->nobjects++;
+				JsonValuesAppend(&vstats->objlens.values, value, ctx->target);
+			}
+			else if (JsonContainerIsArray(jv->val.binary.data))
+			{
+				uint32		size = JsonContainerSize(jv->val.binary.data);
+
+				value = DatumGetInt32(size);
+				vstats->narrays++;
+				JsonValuesAppend(&vstats->arrlens.values, value, ctx->target);
+				vstats->narrelems += size;
+			}
+			break;
+
+		default:
+			elog(ERROR, "invalid scalar json value type %d", jv->type);
+			break;
+	}
+}
+
+/*
+ * jsonAnalyzeCollectPaths
+ *		Parse the JSON document and collect all paths and their values.
+ */
+static void
+jsonAnalyzeCollectPaths(JsonAnalyzeContext *ctx, Jsonb *jb, void *param)
+{
+	JsonbValue			jv;
+	JsonbIterator	   *it;
+	JsonbIteratorToken	tok;
+	JsonPathAnlStats   *stats = ctx->root;
+	bool				collect_values = (bool)(intptr_t) param;
+	bool				scalar = false;
+
+	if (collect_values && !JB_ROOT_IS_SCALAR(jb))
+		jsonAnalyzeJsonValue(ctx, &stats->vstats, JsonValueInitBinary(&jv, jb));
+
+	it = JsonbIteratorInit(&jb->root);
+
+	while ((tok = JsonbIteratorNext(&it, &jv, true)) != WJB_DONE)
+	{
+		switch (tok)
+		{
+			case WJB_BEGIN_OBJECT:
+				/*
+				 * Read next token to see if the object is empty or not.
+				 * If not, make stats for the first key.  Subsequent WJB_KEYs
+				 * and WJB_END_OBJECT will expect that stats will be pointing
+				 * to the key of current object.
+				 */
+				tok = JsonbIteratorNext(&it, &jv, true);
+
+				if (tok == WJB_END_OBJECT)
+					/* Empty object, simply skip stats initialization. */
+					break;
+
+				if (tok != WJB_KEY)
+					elog(ERROR, "unexpected jsonb iterator token: %d", tok);
+
+				stats = jsonAnalyzeAddPath(ctx, &stats->path,
+										   jv.val.string.val,
+										   jv.val.string.len);
+				break;
+
+			case WJB_BEGIN_ARRAY:
+				/* Make stats for non-scalar array and use it for all elements */
+				if (!(scalar = jv.val.array.rawScalar))
+					stats = jsonAnalyzeAddPath(ctx, &stats->path, NULL, -1);
+				break;
+
+			case WJB_END_ARRAY:
+				if (scalar)
+					break;
+				/* FALLTHROUGH */
+			case WJB_END_OBJECT:
+				/* Reset to parent stats */
+				stats = (JsonPathAnlStats *) stats->path.parent;
+				break;
+
+			case WJB_KEY:
+				/*
+				 * Stats should point to the previous key of current object,
+				 * use its parent path as a base path.
+				 */
+				stats = jsonAnalyzeAddPath(ctx, stats->path.parent,
+										   jv.val.string.val,
+										   jv.val.string.len);
+				break;
+
+			case WJB_VALUE:
+			case WJB_ELEM:
+				if (collect_values)
+					jsonAnalyzeJsonValue(ctx, &stats->vstats, &jv);
+
+				/*
+				 * Manually recurse into container by creating child iterator.
+				 * We use skipNested=true to give jsonAnalyzeJsonValue()
+				 * ability to access jbvBinary containers.
+				 */
+				if (jv.type == jbvBinary)
+				{
+					JsonbIterator *it2 = JsonbIteratorInit(jv.val.binary.data);
+
+					it2->parent = it;
+					it = it2;
+				}
+				break;
+
+			default:
+				break;
+		}
+	}
+}
+
+/*
+ * jsonAnalyzeCollectSubpath
+ *		Recursively extract trailing part of a path and collect its values.
+ */
+static void
+jsonAnalyzeCollectSubpath(JsonAnalyzeContext *ctx, JsonPathAnlStats *pstats,
+						  JsonbValue *jbv, JsonPathEntry **entries,
+						  int start_entry)
+{
+	JsonbValue	scalar;
+	int			i;
+
+	for (i = start_entry; i < pstats->depth; i++)
+	{
+		JsonPathEntry  *entry = entries[i];
+		JsonbContainer *jbc = jbv->val.binary.data;
+		JsonbValueType	type = jbv->type;
+
+		if (i > start_entry)
+			pfree(jbv);
+
+		if (type != jbvBinary)
+			return;
+
+		if (JsonPathEntryIsArray(entry))
+		{
+			JsonbIterator	   *it;
+			JsonbIteratorToken	r;
+			JsonbValue			elem;
+
+			if (!JsonContainerIsArray(jbc) || JsonContainerIsScalar(jbc))
+				return;
+
+			it = JsonbIteratorInit(jbc);
+
+			while ((r = JsonbIteratorNext(&it, &elem, true)) != WJB_DONE)
+			{
+				if (r == WJB_ELEM)
+					jsonAnalyzeCollectSubpath(ctx, pstats, &elem, entries, i + 1);
+			}
+
+			return;
+		}
+		else
+		{
+			if (!JsonContainerIsObject(jbc))
+				return;
+
+			jbv = findJsonbValueFromContainerLen(jbc, JB_FOBJECT,
+												 entry->entry, entry->len);
+
+			if (!jbv)
+				return;
+		}
+	}
+
+	if (i == start_entry &&
+		jbv->type == jbvBinary &&
+		JsonbExtractScalar(jbv->val.binary.data, &scalar))
+		jbv = &scalar;
+
+	jsonAnalyzeJsonValue(ctx, &pstats->vstats, jbv);
+
+	if (i > start_entry)
+		pfree(jbv);
+}
+
+/*
+ * jsonAnalyzeCollectPath
+ *		Extract a single path from JSON documents and collect its values.
+ */
+static void
+jsonAnalyzeCollectPath(JsonAnalyzeContext *ctx, Jsonb *jb, void *param)
+{
+	JsonPathAnlStats *pstats = (JsonPathAnlStats *) param;
+	JsonbValue	jbvtmp;
+	JsonbValue *jbv = JsonValueInitBinary(&jbvtmp, jb);
+	JsonPathEntry *path;
+	JsonPathEntry **entries;
+	int			i;
+
+	entries = palloc(sizeof(*entries) * pstats->depth);
+
+	/* Build entry array in direct order */
+	for (path = &pstats->path, i = pstats->depth - 1;
+		 path->parent && i >= 0;
+		 path = path->parent, i--)
+		entries[i] = path;
+
+	jsonAnalyzeCollectSubpath(ctx, pstats, jbv, entries, 0);
+
+	pfree(entries);
+}
+
+static Datum
+jsonAnalyzePathFetch(VacAttrStatsP stats, int rownum, bool *isnull)
+{
+	*isnull = false;
+	return stats->exprvals[rownum];
+}
+
+/*
+ * jsonAnalyzePathValues
+ *		Calculate per-column statistics for values for a single path.
+ *
+ * We have already accumulated all the values for the path, so we simply
+ * call the typanalyze function for the proper data type, and then
+ * compute_stats (which points to compute_scalar_stats or so).
+ */
+static void
+jsonAnalyzePathValues(JsonAnalyzeContext *ctx, JsonScalarStats *sstats,
+					  Oid typid, double freq)
+{
+	JsonValues			   *values = &sstats->values;
+	VacAttrStats		   *stats = &sstats->stats;
+	FormData_pg_attribute	attr;
+	FormData_pg_type		type;
+	int						i;
+
+	if (!sstats->values.count)
+		return;
+
+	get_typlenbyvalalign(typid, &type.typlen, &type.typbyval, &type.typalign);
+
+	attr.attstattarget = ctx->target;
+
+	stats->attr = &attr;
+	stats->attrtypid = typid;
+	stats->attrtypmod = -1;
+	stats->attrtype = &type;
+	stats->anl_context = ctx->stats->anl_context;
+
+	stats->exprvals = values->buf;
+
+	/*
+	 * The fields describing the stats->stavalues[n] element types default to
+	 * the type of the data being analyzed, but the type-specific typanalyze
+	 * function can change them if it wants to store something else.
+	 */
+	for (i = 0; i < STATISTIC_NUM_SLOTS; i++)
+	{
+		stats->statypid[i] = stats->attrtypid;
+		stats->statyplen[i] = stats->attrtype->typlen;
+		stats->statypbyval[i] = stats->attrtype->typbyval;
+		stats->statypalign[i] = stats->attrtype->typalign;
+	}
+
+	std_typanalyze(stats);
+
+	stats->compute_stats(stats, jsonAnalyzePathFetch,
+						 values->count,
+						 ctx->totalrows / ctx->samplerows * values->count);
+
+	/*
+	 * We've only kept the non-null values, so compute_stats will always
+	 * leave this as 1.0. But we have enough info to calculate the correct
+	 * value.
+	 */
+	stats->stanullfrac = (float4)(1.0 - freq);
+
+	/*
+	 * Similarly, we need to correct the MCV frequencies, becuse those are
+	 * also calculated only from the non-null values. All we need to do is
+	 * simply multiply that with the non-NULL frequency.
+	 */
+	for (i = 0; i < STATISTIC_NUM_SLOTS; i++)
+	{
+		if (stats->stakind[i] == STATISTIC_KIND_MCV)
+		{
+			int j;
+			for (j = 0; j < stats->numnumbers[i]; j++)
+				stats->stanumbers[i][j] *= freq;
+		}
+	}
+}
+
+/*
+ * jsonAnalyzeMakeScalarStats
+ *		Serialize scalar stats into a JSON representation.
+ *
+ * We simply produce a JSON document with a list of predefined keys:
+ *
+ * - nullfrac
+ * - distinct
+ * - width
+ * - correlation
+ * - mcv or histogram
+ *
+ * For the mcv / histogram, we store a nested values / numbers.
+ */
+static JsonbValue *
+jsonAnalyzeMakeScalarStats(JsonbParseState **ps, const char *name,
+							const VacAttrStats *stats)
+{
+	JsonbValue	val;
+	int			i;
+	int			j;
+
+	pushJsonbKey(ps, &val, name);
+
+	pushJsonbValue(ps, WJB_BEGIN_OBJECT, NULL);
+
+	pushJsonbKeyValueFloat(ps, &val, "nullfrac", stats->stanullfrac);
+	pushJsonbKeyValueFloat(ps, &val, "distinct", stats->stadistinct);
+	pushJsonbKeyValueInteger(ps, &val, "width", stats->stawidth);
+
+	for (i = 0; i < STATISTIC_NUM_SLOTS; i++)
+	{
+		if (!stats->stakind[i])
+			break;
+
+		switch (stats->stakind[i])
+		{
+			case STATISTIC_KIND_MCV:
+				pushJsonbKey(ps, &val, "mcv");
+				break;
+
+			case STATISTIC_KIND_HISTOGRAM:
+				pushJsonbKey(ps, &val, "histogram");
+				break;
+
+			case STATISTIC_KIND_CORRELATION:
+				pushJsonbKeyValueFloat(ps, &val, "correlation",
+									   stats->stanumbers[i][0]);
+				continue;
+
+			default:
+				elog(ERROR, "unexpected stakind %d", stats->stakind[i]);
+				break;
+		}
+
+		pushJsonbValue(ps, WJB_BEGIN_OBJECT, NULL);
+
+		if (stats->numvalues[i] > 0)
+		{
+			pushJsonbKey(ps, &val, "values");
+			pushJsonbValue(ps, WJB_BEGIN_ARRAY, NULL);
+			for (j = 0; j < stats->numvalues[i]; j++)
+			{
+				Datum v = stats->stavalues[i][j];
+				if (stats->attrtypid == JSONBOID)
+					pushJsonbElemBinary(ps, &val, DatumGetJsonbP(v));
+				else if (stats->attrtypid == TEXTOID)
+					pushJsonbElemText(ps, &val, DatumGetTextP(v));
+				else if (stats->attrtypid == NUMERICOID)
+					pushJsonbElemNumeric(ps, &val, DatumGetNumeric(v));
+				else if (stats->attrtypid == INT4OID)
+					pushJsonbElemInteger(ps, &val, DatumGetInt32(v));
+				else
+					elog(ERROR, "unexpected stat value type %d",
+						 stats->attrtypid);
+			}
+			pushJsonbValue(ps, WJB_END_ARRAY, NULL);
+		}
+
+		if (stats->numnumbers[i] > 0)
+		{
+			pushJsonbKey(ps, &val, "numbers");
+			pushJsonbValue(ps, WJB_BEGIN_ARRAY, NULL);
+			for (j = 0; j < stats->numnumbers[i]; j++)
+				pushJsonbElemFloat(ps, &val, stats->stanumbers[i][j]);
+			pushJsonbValue(ps, WJB_END_ARRAY, NULL);
+		}
+
+		pushJsonbValue(ps, WJB_END_OBJECT, NULL);
+	}
+
+	return pushJsonbValue(ps, WJB_END_OBJECT, NULL);
+}
+
+/*
+ * jsonAnalyzeBuildPathStats
+ *		Serialize statistics for a particular json path.
+ *
+ * This includes both the per-column stats (stored in "json" key) and the
+ * JSON specific stats (like frequencies of different object types).
+ */
+static Jsonb *
+jsonAnalyzeBuildPathStats(JsonPathAnlStats *pstats)
+{
+	const JsonValueStats *vstats = &pstats->vstats;
+	float4				freq = pstats->freq;
+	bool				fullstats = true;	/* pstats->path.parent != NULL */
+	JsonbValue			val;
+	JsonbValue		   *jbv;
+	JsonbParseState	   *ps = NULL;
+
+	pushJsonbValue(&ps, WJB_BEGIN_OBJECT, NULL);
+
+	pushJsonbKeyValueString(&ps, &val, "path", pstats->pathstr);
+
+	pushJsonbKeyValueFloat(&ps, &val, "freq", freq);
+
+	pushJsonbKeyValueFloat(&ps, &val, "freq_null",
+						   freq * vstats->nnulls /
+								  vstats->jsons.values.count);
+
+	pushJsonbKeyValueFloat(&ps, &val, "freq_boolean",
+						   freq * (vstats->nfalse + vstats->ntrue) /
+								  vstats->jsons.values.count);
+
+	pushJsonbKeyValueFloat(&ps, &val, "freq_string",
+						   freq * vstats->nstrings /
+								  vstats->jsons.values.count);
+
+	pushJsonbKeyValueFloat(&ps, &val, "freq_numeric",
+						   freq * vstats->nnumerics /
+								  vstats->jsons.values.count);
+
+	pushJsonbKeyValueFloat(&ps, &val, "freq_array",
+						   freq * vstats->narrays /
+								  vstats->jsons.values.count);
+
+	pushJsonbKeyValueFloat(&ps, &val, "freq_object",
+						   freq * vstats->nobjects /
+								  vstats->jsons.values.count);
+
+	/*
+	 * We keep array length stats here for queries like jsonpath '$.size() > 5'.
+	 * Object lengths stats can be useful for other query lanuages.
+	 */
+	if (vstats->arrlens.values.count)
+		jsonAnalyzeMakeScalarStats(&ps, "array_length", &vstats->arrlens.stats);
+
+	if (vstats->objlens.values.count)
+		jsonAnalyzeMakeScalarStats(&ps, "object_length", &vstats->objlens.stats);
+
+	if (vstats->narrays)
+		pushJsonbKeyValueFloat(&ps, &val, "avg_array_length",
+							   (float4) vstats->narrelems / vstats->narrays);
+
+	if (fullstats)
+	{
+#ifdef JSON_ANALYZE_SCALARS
+		jsonAnalyzeMakeScalarStats(&ps, "string", &vstats->strings.stats);
+		jsonAnalyzeMakeScalarStats(&ps, "numeric", &vstats->numerics.stats);
+#endif
+		jsonAnalyzeMakeScalarStats(&ps, "json", &vstats->jsons.stats);
+	}
+
+	jbv = pushJsonbValue(&ps, WJB_END_OBJECT, NULL);
+
+	return JsonbValueToJsonb(jbv);
+}
+
+/*
+ * jsonAnalyzeCalcPathFreq
+ *		Calculate path frequency, i.e. how many documents contain this path.
+ */
+static void
+jsonAnalyzeCalcPathFreq(JsonAnalyzeContext *ctx, JsonPathAnlStats *pstats)
+{
+	JsonPathAnlStats  *parent = (JsonPathAnlStats *) pstats->path.parent;
+
+	if (parent)
+	{
+		int			count = JsonPathEntryIsArray(&pstats->path)	?
+			parent->vstats.narrays : pstats->vstats.jsons.values.count;
+
+		pstats->freq = parent->freq * count / parent->vstats.jsons.values.count;
+
+		CLAMP_PROBABILITY(pstats->freq);
+	}
+	else
+		pstats->freq = (double) ctx->analyzed_cnt / ctx->samplerows;
+}
+
+/*
+ * jsonAnalyzePath
+ *		Build statistics for values accumulated for this path.
+ *
+ * We're done with accumulating values for this path, so calculate the
+ * statistics for the various arrays.
+ *
+ * XXX I wonder if we could introduce some simple heuristict on which
+ * paths to keep, similarly to what we do for MCV lists. For example a
+ * path that occurred just once is not very interesting, so we could
+ * decide to ignore it and not build the stats. Although that won't
+ * save much, because there'll be very few values accumulated.
+ */
+static void
+jsonAnalyzePath(JsonAnalyzeContext *ctx, JsonPathAnlStats *pstats)
+{
+	MemoryContext		oldcxt;
+	JsonValueStats	   *vstats = &pstats->vstats;
+
+	jsonAnalyzeCalcPathFreq(ctx, pstats);
+
+	/* values combining all object types */
+	jsonAnalyzePathValues(ctx, &vstats->jsons, JSONBOID, pstats->freq);
+
+	/*
+	 * Lengths and array lengths.  We divide counts by the total number of json
+	 * values to compute correct nullfrac (i.e. not all jsons have lengths).
+	 */
+	jsonAnalyzePathValues(ctx, &vstats->arrlens, INT4OID,
+						  pstats->freq * vstats->arrlens.values.count /
+										 vstats->jsons.values.count);
+	jsonAnalyzePathValues(ctx, &vstats->objlens, INT4OID,
+						  pstats->freq * vstats->objlens.values.count /
+										 vstats->jsons.values.count);
+
+#ifdef JSON_ANALYZE_SCALARS
+	/* stats for values of string/numeric types only */
+	jsonAnalyzePathValues(ctx, &vstats->strings, TEXTOID, pstats->freq);
+	jsonAnalyzePathValues(ctx, &vstats->numerics, NUMERICOID, pstats->freq);
+#endif
+
+	oldcxt = MemoryContextSwitchTo(ctx->stats->anl_context);
+	pstats->stats = jsonAnalyzeBuildPathStats(pstats);
+	MemoryContextSwitchTo(oldcxt);
+}
+
+/*
+ * JsonPathStatsCompare
+ *		Compare two path stats (by path string).
+ *
+ * We store the stats sorted by path string, and this is the comparator.
+ */
+static int
+JsonPathStatsCompare(const void *pv1, const void *pv2)
+{
+	return strcmp((*((const JsonPathAnlStats **) pv1))->pathstr,
+				  (*((const JsonPathAnlStats **) pv2))->pathstr);
+}
+
+/*
+ * jsonAnalyzeSortPaths
+ *		Reads all stats stored in the hash table and sorts them.
+ */
+static JsonPathAnlStats **
+jsonAnalyzeSortPaths(JsonAnalyzeContext *ctx, int *p_npaths)
+{
+	HASH_SEQ_STATUS	hseq;
+	JsonPathAnlStats *path;
+	JsonPathAnlStats **paths;
+	int			npaths;
+
+	npaths = hash_get_num_entries(ctx->pathshash) + 1;
+	paths = MemoryContextAlloc(ctx->mcxt, sizeof(*paths) * npaths);
+
+	paths[0] = ctx->root;
+
+	hash_seq_init(&hseq, ctx->pathshash);
+
+	for (int i = 1; (path = hash_seq_search(&hseq)) != NULL; i++)
+		paths[i] = path;
+
+	pg_qsort(paths, npaths, sizeof(*paths), JsonPathStatsCompare);
+
+	*p_npaths = npaths;
+	return paths;
+}
+
+/*
+ * jsonAnalyzeBuildPathStatsArray
+ *		Build jsonb datum array for path stats, that will be used as stavalues.
+ *
+ * The first element is a path prefix.
+ */
+static Datum *
+jsonAnalyzeBuildPathStatsArray(JsonPathAnlStats **paths, int npaths, int *nvals,
+							   const char *prefix, int prefixlen)
+{
+	Datum	   *values = palloc(sizeof(Datum) * (npaths + 1));
+	JsonbValue *jbvprefix = palloc(sizeof(JsonbValue));
+	int			i;
+
+	JsonValueInitStringWithLen(jbvprefix,
+							   memcpy(palloc(prefixlen), prefix, prefixlen),
+							   prefixlen);
+
+	values[0] = JsonbPGetDatum(JsonbValueToJsonb(jbvprefix));
+
+	for (i = 0; i < npaths; i++)
+		values[i + 1] = JsonbPGetDatum(paths[i]->stats);
+
+	*nvals = npaths + 1;
+
+	return values;
+}
+
+/*
+ * jsonAnalyzeMakeStats
+ *		Build stavalues jsonb array for the root path prefix.
+ */
+static Datum *
+jsonAnalyzeMakeStats(JsonAnalyzeContext *ctx, JsonPathAnlStats **paths,
+					 int npaths, int *numvalues)
+{
+	Datum	   *values;
+	MemoryContext oldcxt = MemoryContextSwitchTo(ctx->stats->anl_context);
+
+	values = jsonAnalyzeBuildPathStatsArray(paths, npaths, numvalues,
+											JSON_PATH_ROOT, JSON_PATH_ROOT_LEN);
+
+	MemoryContextSwitchTo(oldcxt);
+
+	return values;
+}
+
+/*
+ * jsonAnalyzeBuildSubPathsData
+ *		Build statvalues and stanumbers arrays for the subset of paths starting
+ *		from a given prefix.
+ *
+ * pathsDatums[index] should point to the desired path.
+ */
+bool
+jsonAnalyzeBuildSubPathsData(Datum *pathsDatums, int npaths, int index,
+							 const char	*path, int pathlen,
+							 bool includeSubpaths, float4 nullfrac,
+							 Datum *pvals, Datum *pnums)
+{
+	JsonPathAnlStats  **pvalues = palloc(sizeof(*pvalues) * npaths);
+	Datum	   *values;
+	Datum		numbers[1];
+	JsonbValue	pathkey;
+	int			nsubpaths = 0;
+	int			nvalues;
+	int			i;
+
+	JsonValueInitStringWithLen(&pathkey, "path", 4);
+
+	for (i = index; i < npaths; i++)
+	{
+		/* Extract path name */
+		Jsonb	   *jb = DatumGetJsonbP(pathsDatums[i]);
+		JsonbValue *jbv = findJsonbValueFromContainer(&jb->root, JB_FOBJECT,
+													  &pathkey);
+
+		/* Check if path name starts with a given prefix */
+		if (!jbv || jbv->type != jbvString ||
+			jbv->val.string.len < pathlen ||
+			memcmp(jbv->val.string.val, path, pathlen))
+			break;
+
+		pfree(jbv);
+
+		/* Collect matching path */
+		pvalues[nsubpaths] = palloc(sizeof(**pvalues));
+		pvalues[nsubpaths]->stats = jb;
+
+		nsubpaths++;
+
+		/*
+		 * The path should go before its subpaths, so if subpaths are not
+		 * needed the loop is broken after the first matching path.
+		 */
+		if (!includeSubpaths)
+			break;
+	}
+
+	if (!nsubpaths)
+	{
+		pfree(pvalues);
+		return false;
+	}
+
+	/* Construct new array from the selected paths */
+	values = jsonAnalyzeBuildPathStatsArray(pvalues, nsubpaths, &nvalues,
+											path, pathlen);
+	*pvals = PointerGetDatum(construct_array(values, nvalues, JSONBOID, -1,
+											 false, 'i'));
+
+	pfree(pvalues);
+	pfree(values);
+
+	numbers[0] = Float4GetDatum(nullfrac);
+	*pnums = PointerGetDatum(construct_array(numbers, 1, FLOAT4OID, 4,
+											 true /*FLOAT4PASSBYVAL*/, 'i'));
+
+	return true;
+}
+
+/*
+ * jsonAnalyzeInit
+ *		Initialize the analyze context so that we can start adding paths.
+ */
+static void
+jsonAnalyzeInit(JsonAnalyzeContext *ctx, VacAttrStats *stats,
+				AnalyzeAttrFetchFunc fetchfunc,
+				int samplerows, double totalrows)
+{
+	HASHCTL	hash_ctl;
+
+	memset(ctx, 0, sizeof(*ctx));
+
+	ctx->stats = stats;
+	ctx->fetchfunc = fetchfunc;
+	ctx->mcxt = CurrentMemoryContext;
+	ctx->samplerows = samplerows;
+	ctx->totalrows = totalrows;
+	ctx->target = stats->attr->attstattarget;
+	ctx->scalarsOnly = false;
+
+	MemSet(&hash_ctl, 0, sizeof(hash_ctl));
+	hash_ctl.keysize = sizeof(JsonPathEntry);
+	hash_ctl.entrysize = sizeof(JsonPathAnlStats);
+	hash_ctl.hash = JsonPathEntryHash;
+	hash_ctl.match = JsonPathEntryMatch;
+	hash_ctl.hcxt = ctx->mcxt;
+
+	ctx->pathshash = hash_create("JSON analyze path table", 100, &hash_ctl,
+					HASH_ELEM | HASH_FUNCTION | HASH_COMPARE | HASH_CONTEXT);
+
+	ctx->root = MemoryContextAllocZero(ctx->mcxt, sizeof(JsonPathAnlStats));
+	ctx->root->pathstr = JSON_PATH_ROOT;
+}
+
+/*
+ * jsonAnalyzePass
+ *		One analysis pass over the JSON column.
+ *
+ * Performs one analysis pass on the JSON documents, and passes them to the
+ * custom analyzefunc.
+ */
+static void
+jsonAnalyzePass(JsonAnalyzeContext *ctx,
+				void (*analyzefunc)(JsonAnalyzeContext *, Jsonb *, void *),
+				void *analyzearg)
+{
+	int	row_num;
+
+	MemoryContext	tmpcxt = AllocSetContextCreate(CurrentMemoryContext,
+												"Json Analyze Pass Context",
+												ALLOCSET_DEFAULT_MINSIZE,
+												ALLOCSET_DEFAULT_INITSIZE,
+												ALLOCSET_DEFAULT_MAXSIZE);
+
+	MemoryContext	oldcxt = MemoryContextSwitchTo(tmpcxt);
+
+	ctx->null_cnt = 0;
+	ctx->analyzed_cnt = 0;
+	ctx->total_width = 0;
+
+	/* Loop over the arrays. */
+	for (row_num = 0; row_num < ctx->samplerows; row_num++)
+	{
+		Datum		value;
+		Jsonb	   *jb;
+		Size		width;
+		bool		isnull;
+
+		vacuum_delay_point();
+
+		value = ctx->fetchfunc(ctx->stats, row_num, &isnull);
+
+		if (isnull)
+		{
+			/* json is null, just count that */
+			ctx->null_cnt++;
+			continue;
+		}
+
+		width = toast_raw_datum_size(value);
+
+		ctx->total_width += VARSIZE_ANY(DatumGetPointer(value)); /* FIXME raw width? */
+
+		/* Skip too-large values. */
+#define JSON_WIDTH_THRESHOLD (100 * 1024)
+
+		if (width > JSON_WIDTH_THRESHOLD)
+			continue;
+
+		ctx->analyzed_cnt++;
+
+		jb = DatumGetJsonbP(value);
+
+		MemoryContextSwitchTo(oldcxt);
+
+		analyzefunc(ctx, jb, analyzearg);
+
+		oldcxt = MemoryContextSwitchTo(tmpcxt);
+		MemoryContextReset(tmpcxt);
+	}
+
+	MemoryContextSwitchTo(oldcxt);
+}
+
+/*
+ * compute_json_stats() -- compute statistics for a json column
+ */
+static void
+compute_json_stats(VacAttrStats *stats, AnalyzeAttrFetchFunc fetchfunc,
+				   int samplerows, double totalrows)
+{
+	JsonAnalyzeContext	ctx;
+	JsonPathAnlStats **paths;
+	int			npaths;
+	bool		sigle_pass = false;	/* FIXME make GUC or simply remove */
+
+	jsonAnalyzeInit(&ctx, stats, fetchfunc, samplerows, totalrows);
+
+	/*
+	 * Collect and analyze JSON path values in single or multiple passes.
+	 * Sigle-pass collection is faster but consumes much more memory than
+	 * collecting and analyzing by the one path at pass.
+	 */
+	if (sigle_pass)
+	{
+		/* Collect all values of all paths */
+		jsonAnalyzePass(&ctx, jsonAnalyzeCollectPaths, (void *)(intptr_t) true);
+
+		/*
+		 * Now that we're done with processing the documents, we sort the paths
+		 * we extracted and calculate stats for each of them.
+		 *
+		 * XXX I wonder if we could do this in two phases, to maybe not collect
+		 * (or even accumulate) values for paths that are not interesting.
+		 */
+		paths = jsonAnalyzeSortPaths(&ctx, &npaths);
+
+		for (int i = 0; i < npaths; i++)
+			jsonAnalyzePath(&ctx, paths[i]);
+	}
+	else
+	{
+		MemoryContext	oldcxt;
+		MemoryContext	tmpcxt = AllocSetContextCreate(CurrentMemoryContext,
+													"Json Analyze Tmp Context",
+													ALLOCSET_DEFAULT_MINSIZE,
+													ALLOCSET_DEFAULT_INITSIZE,
+													ALLOCSET_DEFAULT_MAXSIZE);
+
+		elog(DEBUG1, "analyzing %s attribute \"%s\"",
+			stats->attrtypid == JSONBOID ? "jsonb" : "json",
+			NameStr(stats->attr->attname));
+
+		elog(DEBUG1, "collecting json paths");
+
+		oldcxt = MemoryContextSwitchTo(tmpcxt);
+
+		/* Collect all paths first without accumulating any Values, sort them */
+		jsonAnalyzePass(&ctx, jsonAnalyzeCollectPaths, (void *)(intptr_t) false);
+		paths = jsonAnalyzeSortPaths(&ctx, &npaths);
+
+		/*
+		 * Next, process each path independently to save memory (we don't want
+		 * to accumulate all values for all paths, with a lot of duplicities).
+		 */
+		MemoryContextReset(tmpcxt);
+
+		for (int i = 0; i < npaths; i++)
+		{
+			JsonPathAnlStats *path = paths[i];
+
+			elog(DEBUG1, "analyzing json path (%d/%d) %s",
+				 i + 1, npaths, path->pathstr);
+
+			jsonAnalyzePass(&ctx, jsonAnalyzeCollectPath, path);
+			jsonAnalyzePath(&ctx, path);
+
+			MemoryContextReset(tmpcxt);
+		}
+
+		MemoryContextSwitchTo(oldcxt);
+
+		MemoryContextDelete(tmpcxt);
+	}
+
+	/* We can only compute real stats if we found some non-null values. */
+	if (ctx.null_cnt >= samplerows)
+	{
+		/* We found only nulls; assume the column is entirely null */
+		stats->stats_valid = true;
+		stats->stanullfrac = 1.0;
+		stats->stawidth = 0;		/* "unknown" */
+		stats->stadistinct = 0.0;	/* "unknown" */
+	}
+	else if (!ctx.analyzed_cnt)
+	{
+		int	nonnull_cnt = samplerows - ctx.null_cnt;
+
+		/* We found some non-null values, but they were all too wide */
+		stats->stats_valid = true;
+		/* Do the simple null-frac and width stats */
+		stats->stanullfrac = (double) ctx.null_cnt / (double) samplerows;
+		stats->stawidth = ctx.total_width / (double) nonnull_cnt;
+		/* Assume all too-wide values are distinct, so it's a unique column */
+		stats->stadistinct = -1.0 * (1.0 - stats->stanullfrac);
+	}
+	else
+	{
+		VacAttrStats   *jsstats = &ctx.root->vstats.jsons.stats;
+		int				i;
+		int				empty_slot = -1;
+
+		stats->stats_valid = true;
+
+		stats->stanullfrac	= jsstats->stanullfrac;
+		stats->stawidth		= jsstats->stawidth;
+		stats->stadistinct	= jsstats->stadistinct;
+
+		/*
+		 * We need to store the statistics the statistics slots. We simply
+		 * store the regular stats in the first slots, and then we put the
+		 * JSON stats into the first empty slot.
+		 */
+		for (i = 0; i < STATISTIC_NUM_SLOTS; i++)
+		{
+			/* once we hit an empty slot, we're done */
+			if (!jsstats->staop[i])
+			{
+				empty_slot = i;		/* remember the empty slot */
+				break;
+			}
+
+			stats->stakind[i] 		= jsstats->stakind[i];
+			stats->staop[i] 		= jsstats->staop[i];
+			stats->stanumbers[i] 	= jsstats->stanumbers[i];
+			stats->stavalues[i] 	= jsstats->stavalues[i];
+			stats->statypid[i] 		= jsstats->statypid[i];
+			stats->statyplen[i] 	= jsstats->statyplen[i];
+			stats->statypbyval[i] 	= jsstats->statypbyval[i];
+			stats->statypalign[i] 	= jsstats->statypalign[i];
+			stats->numnumbers[i] 	= jsstats->numnumbers[i];
+			stats->numvalues[i] 	= jsstats->numvalues[i];
+		}
+
+		Assert((empty_slot >= 0) && (empty_slot < STATISTIC_NUM_SLOTS));
+
+		stats->stakind[empty_slot] = STATISTIC_KIND_JSON;
+		stats->staop[empty_slot] = InvalidOid;
+		stats->numnumbers[empty_slot] = 1;
+		stats->stanumbers[empty_slot] = MemoryContextAlloc(stats->anl_context,
+														   sizeof(float4));
+		stats->stanumbers[empty_slot][0] = 0.0; /* nullfrac */
+		stats->stavalues[empty_slot] =
+			jsonAnalyzeMakeStats(&ctx, paths, npaths,
+								 &stats->numvalues[empty_slot]);
+
+		/* We are storing jsonb values */
+		stats->statypid[empty_slot] = JSONBOID;
+		get_typlenbyvalalign(stats->statypid[empty_slot],
+							 &stats->statyplen[empty_slot],
+							 &stats->statypbyval[empty_slot],
+							 &stats->statypalign[empty_slot]);
+	}
+}
+
+/*
+ * json_typanalyze -- typanalyze function for jsonb
+ */
+Datum
+jsonb_typanalyze(PG_FUNCTION_ARGS)
+{
+	VacAttrStats *stats = (VacAttrStats *) PG_GETARG_POINTER(0);
+	Form_pg_attribute attr = stats->attr;
+
+	/* If the attstattarget column is negative, use the default value */
+	/* NB: it is okay to scribble on stats->attr since it's a copy */
+	if (attr->attstattarget < 0)
+		attr->attstattarget = default_statistics_target;
+
+	stats->compute_stats = compute_json_stats;
+	/* see comment about the choice of minrows in commands/analyze.c */
+	stats->minrows = 300 * attr->attstattarget;
+
+	PG_RETURN_BOOL(true);
+}
diff --git a/src/backend/utils/adt/jsonpath_exec.c b/src/backend/utils/adt/jsonpath_exec.c
index eff3734b6ab..94ae6e1f385 100644
--- a/src/backend/utils/adt/jsonpath_exec.c
+++ b/src/backend/utils/adt/jsonpath_exec.c
@@ -1723,7 +1723,7 @@ executeLikeRegex(JsonPathItem *jsp, JsonbValue *str, JsonbValue *rarg,
 		cxt->cflags = jspConvertRegexFlags(jsp->content.like_regex.flags);
 	}
 
-	if (RE_compile_and_execute(cxt->regex, str->val.string.val,
+	if (RE_compile_and_execute(cxt->regex, unconstify(char *, str->val.string.val),
 							   str->val.string.len,
 							   cxt->cflags, DEFAULT_COLLATION_OID, 0, NULL))
 		return jpbTrue;
diff --git a/src/include/catalog/pg_operator.dat b/src/include/catalog/pg_operator.dat
index 8e0e65ad275..9805bb15038 100644
--- a/src/include/catalog/pg_operator.dat
+++ b/src/include/catalog/pg_operator.dat
@@ -3175,7 +3175,7 @@
 { oid => '3211', oid_symbol => 'JsonbObjectFieldOperator',
   descr => 'get jsonb object field',
   oprname => '->', oprleft => 'jsonb', oprright => 'text', oprresult => 'jsonb',
-  oprcode => 'jsonb_object_field' },
+  oprcode => 'jsonb_object_field', oprstat => 'jsonb_stats' },
 { oid => '3477', oid_symbol => 'JsonbObjectFieldTextOperator',
   descr => 'get jsonb object field as text',
   oprname => '->>', oprleft => 'jsonb', oprright => 'text', oprresult => 'text',
@@ -3183,7 +3183,7 @@
 { oid => '3212', oid_symbol => 'JsonbArrayElementOperator',
   descr => 'get jsonb array element',
   oprname => '->', oprleft => 'jsonb', oprright => 'int4', oprresult => 'jsonb',
-  oprcode => 'jsonb_array_element' },
+  oprcode => 'jsonb_array_element', oprstat => 'jsonb_stats' },
 { oid => '3481', oid_symbol => 'JsonbArrayElementTextOperator',
   descr => 'get jsonb array element as text',
   oprname => '->>', oprleft => 'jsonb', oprright => 'int4', oprresult => 'text',
@@ -3191,7 +3191,8 @@
 { oid => '3213', oid_symbol => 'JsonbExtractPathOperator',
   descr => 'get value from jsonb with path elements',
   oprname => '#>', oprleft => 'jsonb', oprright => '_text',
-  oprresult => 'jsonb', oprcode => 'jsonb_extract_path' },
+  oprresult => 'jsonb', oprcode => 'jsonb_extract_path',
+  oprstat => 'jsonb_stats' },
 { oid => '3206', oid_symbol => 'JsonbExtractPathTextOperator',
   descr => 'get value from jsonb as text with path elements',
   oprname => '#>>', oprleft => 'jsonb', oprright => '_text',
@@ -3229,23 +3230,23 @@
 { oid => '3246', oid_symbol => 'JsonbContainsOperator', descr => 'contains',
   oprname => '@>', oprleft => 'jsonb', oprright => 'jsonb', oprresult => 'bool',
   oprcom => '<@(jsonb,jsonb)', oprcode => 'jsonb_contains',
-  oprrest => 'matchingsel', oprjoin => 'matchingjoinsel' },
+  oprrest => 'jsonb_sel', oprjoin => 'matchingjoinsel' },
 { oid => '3247', oid_symbol => 'JsonbExistsOperator', descr => 'key exists',
   oprname => '?', oprleft => 'jsonb', oprright => 'text', oprresult => 'bool',
-  oprcode => 'jsonb_exists', oprrest => 'matchingsel',
+  oprcode => 'jsonb_exists', oprrest => 'jsonb_sel',
   oprjoin => 'matchingjoinsel' },
 { oid => '3248', oid_symbol => 'JsonbExistsAnyOperator', descr => 'any key exists',
   oprname => '?|', oprleft => 'jsonb', oprright => '_text', oprresult => 'bool',
-  oprcode => 'jsonb_exists_any', oprrest => 'matchingsel',
+  oprcode => 'jsonb_exists_any', oprrest => 'jsonb_sel',
   oprjoin => 'matchingjoinsel' },
 { oid => '3249', oid_symbol => 'JsonbExistsAllOperator', descr => 'all keys exist',
   oprname => '?&', oprleft => 'jsonb', oprright => '_text', oprresult => 'bool',
-  oprcode => 'jsonb_exists_all', oprrest => 'matchingsel',
+  oprcode => 'jsonb_exists_all', oprrest => 'jsonb_sel',
   oprjoin => 'matchingjoinsel' },
 { oid => '3250', oid_symbol => 'JsonbContainedOperator', descr => 'is contained by',
   oprname => '<@', oprleft => 'jsonb', oprright => 'jsonb', oprresult => 'bool',
   oprcom => '@>(jsonb,jsonb)', oprcode => 'jsonb_contained',
-  oprrest => 'matchingsel', oprjoin => 'matchingjoinsel' },
+  oprrest => 'jsonb_sel', oprjoin => 'matchingjoinsel' },
 { oid => '3284', descr => 'concatenate',
   oprname => '||', oprleft => 'jsonb', oprright => 'jsonb',
   oprresult => 'jsonb', oprcode => 'jsonb_concat' },
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 0859dc81cac..2dc3f134ea5 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11731,4 +11731,15 @@
   prorettype => 'bytea', proargtypes => 'pg_brin_minmax_multi_summary',
   prosrc => 'brin_minmax_multi_summary_send' },
 
+# jsonb statistics
+{ oid => '8526', descr => 'jsonb typanalyze',
+  proname => 'jsonb_typanalyze', provolatile => 's', prorettype => 'bool',
+  proargtypes => 'internal', prosrc => 'jsonb_typanalyze' },
+{ oid => '8527', descr => 'jsonb selectivity estimation',
+  proname => 'jsonb_sel', provolatile => 's', prorettype => 'float8',
+  proargtypes => 'internal oid internal int4', prosrc => 'jsonb_sel' },
+{ oid => '8528', descr => 'jsonb statsistics estimation',
+  proname => 'jsonb_stats', provolatile => 's', prorettype => 'void',
+  proargtypes => 'internal internal int4 internal', prosrc => 'jsonb_stats' },
+
 ]
diff --git a/src/include/catalog/pg_statistic.h b/src/include/catalog/pg_statistic.h
index cdf74481398..c4f53ebd1ba 100644
--- a/src/include/catalog/pg_statistic.h
+++ b/src/include/catalog/pg_statistic.h
@@ -277,6 +277,8 @@ DECLARE_FOREIGN_KEY((starelid, staattnum), pg_attribute, (attrelid, attnum));
  */
 #define STATISTIC_KIND_BOUNDS_HISTOGRAM  7
 
+#define STATISTIC_KIND_JSON 8
+
 #endif							/* EXPOSE_TO_CLIENT_CODE */
 
 #endif							/* PG_STATISTIC_H */
diff --git a/src/include/catalog/pg_type.dat b/src/include/catalog/pg_type.dat
index df458794635..b867db42f45 100644
--- a/src/include/catalog/pg_type.dat
+++ b/src/include/catalog/pg_type.dat
@@ -445,7 +445,7 @@
   typname => 'jsonb', typlen => '-1', typbyval => 'f', typcategory => 'U',
   typsubscript => 'jsonb_subscript_handler', typinput => 'jsonb_in',
   typoutput => 'jsonb_out', typreceive => 'jsonb_recv', typsend => 'jsonb_send',
-  typalign => 'i', typstorage => 'x' },
+  typanalyze => 'jsonb_typanalyze', typalign => 'i', typstorage => 'x' },
 { oid => '4072', array_type_oid => '4073', descr => 'JSON path',
   typname => 'jsonpath', typlen => '-1', typbyval => 'f', typcategory => 'U',
   typinput => 'jsonpath_in', typoutput => 'jsonpath_out',
diff --git a/src/include/utils/json_selfuncs.h b/src/include/utils/json_selfuncs.h
new file mode 100644
index 00000000000..9a36567ae65
--- /dev/null
+++ b/src/include/utils/json_selfuncs.h
@@ -0,0 +1,113 @@
+/*-------------------------------------------------------------------------
+ *
+ * json_selfuncs.h
+ *	  JSON cost estimation functions.
+ *
+ *
+ * Portions Copyright (c) 2016-2022, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *    src/include/utils/json_selfuncs.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef JSON_SELFUNCS_H_
+#define JSON_SELFUNCS_H 1
+
+#include "postgres.h"
+#include "access/htup.h"
+#include "utils/jsonb.h"
+#include "utils/lsyscache.h"
+#include "utils/selfuncs.h"
+
+#define JSON_PATH_ROOT "$"
+#define JSON_PATH_ROOT_LEN 1
+
+#define JSON_PATH_ROOT_ARRAY "$[*]"
+#define JSON_PATH_ROOT_ARRAY_LEN 4
+
+typedef enum
+{
+	JsonPathStatsValues,
+	JsonPathStatsArrayLength,
+	JsonPathStatsObjectLength
+} JsonPathStatsType;
+
+typedef struct JsonStatData JsonStatData, *JsonStats;
+
+/* Per-path JSON stats */
+typedef struct JsonPathStatsData
+{
+	JsonStats	data;			/* pointer to per-column control structure */
+	Datum	   *datum;			/* pointer to JSONB datum with stats data */
+	const char *path;			/* path string, points directly to JSONB data */
+	int			pathlen;		/* path length */
+	JsonPathStatsType type;		/* type of stats (values, lengths etc.) */
+} JsonPathStatsData, *JsonPathStats;
+
+/* Per-column JSON stats */
+struct JsonStatData
+{
+	HeapTuple	statsTuple;		/* original pg_statistic tuple */
+	AttStatsSlot attslot;		/* data extracted from STATISTIC_KIND_JSON
+								 * slot of statsTuple */
+	RelOptInfo *rel;			/* Relation, or NULL if not identifiable */
+	Datum	   *pathdatums;		/* path JSONB datums */
+	JsonPathStatsData *paths;	/* cached paths */
+	int			npaths;			/* number of paths */
+	float4		nullfrac;		/* NULL fraction */
+	const char *prefix;			/* global path prefix which needs to be used
+								 * for searching in pathdatums */
+	int			prefixlen;		/* path prefix length */
+	bool		acl_ok;			/* ACL check is Ok */
+};
+
+typedef enum JsonStatType
+{
+	JsonStatJsonb,
+	JsonStatJsonbWithoutSubpaths,
+	JsonStatText,
+	JsonStatFloat4,
+	JsonStatString,
+	JsonStatNumeric,
+	JsonStatFreq,
+} JsonStatType;
+
+extern bool jsonStatsInit(JsonStats stats, const VariableStatData *vardata);
+extern void jsonStatsRelease(JsonStats data);
+
+extern JsonPathStats jsonStatsGetPathByStr(JsonStats stats,
+										   const char *path, int pathlen);
+
+extern JsonPathStats jsonPathStatsGetSubpath(JsonPathStats stats,
+											 const char *subpath);
+
+extern bool jsonPathStatsGetNextSubpathStats(JsonPathStats stats,
+											 JsonPathStats *keystats,
+											 bool keysOnly);
+
+extern JsonPathStats jsonPathStatsGetArrayLengthStats(JsonPathStats pstats);
+extern JsonPathStats jsonPathStatsGetObjectLengthStats(JsonPathStats pstats);
+
+extern float4 jsonPathStatsGetFreq(JsonPathStats pstats, float4 defaultfreq);
+
+extern float4 jsonPathStatsGetTypeFreq(JsonPathStats pstats,
+									JsonbValueType type, float4 defaultfreq);
+
+extern float4 jsonPathStatsGetAvgArraySize(JsonPathStats pstats);
+
+extern Selectivity jsonPathStatsGetArrayIndexSelectivity(JsonPathStats pstats,
+														 int index);
+
+extern Selectivity jsonSelectivity(JsonPathStats stats, Datum scalar, Oid oper);
+
+extern void jsonPathAppendEntry(StringInfo path, const char *entry);
+
+extern bool jsonAnalyzeBuildSubPathsData(Datum *pathsDatums,
+										 int npaths, int index,
+										 const char	*path, int pathlen,
+										 bool includeSubpaths, float4 nullfrac,
+										 Datum *pvals, Datum *pnums);
+
+#endif /* JSON_SELFUNCS_H */
diff --git a/src/test/regress/expected/jsonb_stats.out b/src/test/regress/expected/jsonb_stats.out
new file mode 100644
index 00000000000..c7b1e644099
--- /dev/null
+++ b/src/test/regress/expected/jsonb_stats.out
@@ -0,0 +1,713 @@
+CREATE OR REPLACE FUNCTION explain_jsonb(sql_query text)
+RETURNS TABLE(explain_line json) AS
+$$
+BEGIN
+	RETURN QUERY EXECUTE 'EXPLAIN (ANALYZE, FORMAT json) ' || sql_query;
+END;
+$$ LANGUAGE plpgsql;
+CREATE OR REPLACE FUNCTION get_plan_and_actual_rows(sql_query text)
+RETURNS TABLE(plan integer, actual integer) AS
+$$
+	SELECT
+		(plan->>'Plan Rows')::integer plan,
+		(plan->>'Actual Rows')::integer actual
+	FROM (
+		SELECT explain_jsonb(sql_query) #> '{0,Plan,Plans,0}'
+	) p(plan)
+$$ LANGUAGE sql;
+CREATE OR REPLACE FUNCTION check_estimate(sql_query text, accuracy real)
+RETURNS boolean AS
+$$
+	SELECT plan BETWEEN actual / (1 + accuracy) AND (actual + 1) * (1 + accuracy)
+	FROM (SELECT * FROM get_plan_and_actual_rows(sql_query)) x
+$$ LANGUAGE sql;
+CREATE OR REPLACE FUNCTION check_estimate2(sql_query text, accuracy real)
+RETURNS TABLE(min integer, max integer) AS
+$$
+	SELECT (actual * (1 - accuracy))::integer, ((actual + 1) * (1 + accuracy))::integer
+	FROM (SELECT * FROM get_plan_and_actual_rows(sql_query)) x
+$$ LANGUAGE sql;
+CREATE TABLE jsonb_stats_test(js jsonb);
+INSERT INTO jsonb_stats_test SELECT NULL FROM generate_series(1, 1000);
+INSERT INTO jsonb_stats_test SELECT 'null' FROM generate_series(1, 200);
+INSERT INTO jsonb_stats_test SELECT 'true' FROM generate_series(1, 300);
+INSERT INTO jsonb_stats_test SELECT 'false' FROM generate_series(1, 500);
+INSERT INTO jsonb_stats_test SELECT '12345' FROM generate_series(1, 100);
+INSERT INTO jsonb_stats_test SELECT (1000 * (i % 10))::text::jsonb FROM generate_series(1, 400) i;
+INSERT INTO jsonb_stats_test SELECT i::text::jsonb FROM generate_series(1, 500) i;
+INSERT INTO jsonb_stats_test SELECT '"foo"' FROM generate_series(1, 100);
+INSERT INTO jsonb_stats_test SELECT format('"bar%s"', i % 10)::jsonb FROM generate_series(1, 400) i;
+INSERT INTO jsonb_stats_test SELECT format('"baz%s"', i)::jsonb FROM generate_series(1, 500) i;
+INSERT INTO jsonb_stats_test SELECT '{}' FROM generate_series(1, 100);
+INSERT INTO jsonb_stats_test SELECT jsonb_build_object('foo', 'bar') FROM generate_series(1, 100);
+INSERT INTO jsonb_stats_test SELECT jsonb_build_object('foo', 'baz' || (i % 10)) FROM generate_series(1, 300) i;
+INSERT INTO jsonb_stats_test SELECT jsonb_build_object('foo', i % 10) FROM generate_series(1, 200) i;
+INSERT INTO jsonb_stats_test SELECT jsonb_build_object('"foo \"bar"', i % 10) FROM generate_series(1, 200) i;
+INSERT INTO jsonb_stats_test SELECT '[]' FROM generate_series(1, 100);
+INSERT INTO jsonb_stats_test SELECT '["foo"]' FROM generate_series(1, 200);
+INSERT INTO jsonb_stats_test SELECT '[12345]' FROM generate_series(1, 300);
+INSERT INTO jsonb_stats_test SELECT '[["foo"]]' FROM generate_series(1, 200);
+INSERT INTO jsonb_stats_test SELECT '[{"key": "foo"}]' FROM generate_series(1, 200);
+INSERT INTO jsonb_stats_test SELECT '[null, "foo"]' FROM generate_series(1, 200);
+INSERT INTO jsonb_stats_test SELECT '[null, 12345]' FROM generate_series(1, 300);
+INSERT INTO jsonb_stats_test SELECT '[null, ["foo"]]' FROM generate_series(1, 200);
+INSERT INTO jsonb_stats_test SELECT '[null, {"key": "foo"}]' FROM generate_series(1, 200);
+-- Build random variable-length integer arrays
+SELECT setseed(0.0);
+ setseed 
+---------
+ 
+(1 row)
+
+INSERT INTO jsonb_stats_test
+SELECT jsonb_build_object('array',
+	jsonb_build_array())
+FROM generate_series(1, 1000);
+INSERT INTO jsonb_stats_test
+SELECT jsonb_build_object('array',
+	jsonb_build_array(
+		floor(random() * 10)::int))
+FROM generate_series(1, 4000);
+INSERT INTO jsonb_stats_test
+SELECT jsonb_build_object('array',
+	jsonb_build_array(
+		floor(random() * 10)::int,
+		floor(random() * 10)::int))
+FROM generate_series(1, 3000);
+INSERT INTO jsonb_stats_test
+SELECT jsonb_build_object('array',
+	jsonb_build_array(
+		floor(random() * 10)::int,
+		floor(random() * 10)::int,
+		floor(random() * 10)::int))
+FROM generate_series(1, 2000);
+ANALYZE jsonb_stats_test;
+CREATE OR REPLACE FUNCTION check_jsonb_stats_test_estimate(sql_condition text, accuracy real)
+RETURNS boolean AS
+$$
+	SELECT check_estimate('SELECT count(*) FROM jsonb_stats_test WHERE ' || sql_condition, accuracy)
+$$ LANGUAGE sql;
+DROP FUNCTION IF EXISTS check_jsonb_stats_test_estimate2(text, real);
+NOTICE:  function check_jsonb_stats_test_estimate2(text,pg_catalog.float4) does not exist, skipping
+CREATE OR REPLACE FUNCTION check_jsonb_stats_test_estimate2(sql_condition text, accuracy real)
+RETURNS TABLE(plan integer, actual integer) AS
+$$
+	SELECT get_plan_and_actual_rows('SELECT count(*) FROM jsonb_stats_test WHERE ' || sql_condition)
+$$ LANGUAGE sql;
+-- Check NULL estimate
+SELECT check_jsonb_stats_test_estimate($$js IS NULL$$, 0.03);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'bad_key' IS NULL$$, 0.01);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js #> '{bad_key}' IS NULL$$, 0.01);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 1000000 IS NULL$$, 0.01);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js #> '{1000000}' IS NULL$$, 0.01);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'bad_key1' -> 'bad_key2' IS NULL$$, 0.01);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js #> '{bad_key1,bad_key2}' IS NULL$$, 0.01);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'bad_key1' -> 1 IS NULL$$, 0.01);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js #> '{bad_key1,1}' IS NULL$$, 0.01);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 1000000 -> 'foo' IS NULL$$, 0.01);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js #> '{1000000,foo}' IS NULL$$, 0.01);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'bad_key' = '123'$$, 0.01);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 1000000 = '123'$$, 0.01);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+-- Check null eq estimate
+SELECT check_jsonb_stats_test_estimate($$js =  'null'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> 'null'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+-- Check boolean eq estimate
+SELECT check_jsonb_stats_test_estimate($$js =  'true'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> 'true'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js =  'false'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> 'false'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+-- Check numeric eq estimate
+SELECT check_jsonb_stats_test_estimate($$js = '12345'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js#>'{}' = '12345'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js = '3000'$$, 0.3);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js = '1234'$$, 1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> '6000'$$, 0.2);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+-- Check numeric range estimate
+SELECT check_jsonb_stats_test_estimate($$js < '0'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js < '100'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js < '1000'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js < '3456'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js < '10000'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js < '100000'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js > '100' AND js < '600'$$, 0.5);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js > '6800' AND js < '12000'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+-- Check string eq estimate
+SELECT check_jsonb_stats_test_estimate($$js = '"foo"'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js = '"bar7"'$$, 0.2);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js = '"baz1234"'$$, 10);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> '"bar4"'$$, 0.3);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+-- Check string range estimate
+SELECT check_jsonb_stats_test_estimate($$js > '"foo"'$$, 0.01);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js > '"bar"'$$, 0.01);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js > '"baz"'$$, 0.01);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+-- Check object eq estimate
+SELECT check_jsonb_stats_test_estimate($$js = '{}'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js > '{}'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+-- Check object key eq estimate
+SELECT check_jsonb_stats_test_estimate($$js -> 'foo' = '"bar"'$$, 0.2);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'foo' = '"baz"'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'foo' = '"baz5"'$$, 0.3);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js #> '{foo}' = '"bar"'$$, 0.2);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+-- Check object key range estimate
+SELECT check_jsonb_stats_test_estimate($$js -> 'foo' >= '"baz2"'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'foo' <  '"baz9"'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'foo' >= '"baz2"' AND js -> 'foo' < '"baz9"'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+-- Check array eq estimate
+SELECT check_jsonb_stats_test_estimate($$js = '[]'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js >= '[]' AND js < '{}'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+-- Check variable-length array element eq estimate
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 0 = '1'$$, 0.2);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 1 = '6'$$, 0.2);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 2 = '8'$$, 0.2);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 3 = '1'$$, 0.2);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+-- Check variable-length array element range estimate
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 0 < '7'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 1 < '7'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 2 < '7'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 3 < '7'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+-- Check variable-length array containment estimate
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' @> '[]'$$, 0.2);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' @> '[1]'$$, 0.2);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' @> '[100]'$$, 0.2);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' @> '[1, 2]'$$, 1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' @> '[1, 100]'$$, 1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' @> '[1, 2, 100]'$$, 1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' @> '[1, 2, 3]'$$, 5);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' @> '1'$$, 0.3);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' @> '100'$$, 10);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 0 @> '1'$$, 0.3);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 1 @> '1'$$, 0.3);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 2 @> '1'$$, 0.3);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 3 @> '1'$$, 0.3);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 0 @> '[1]'$$, 0.3);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> '{"array": []}'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> '{"array": [1]}'$$, 0.3);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> '{"array": [100]}'$$, 0.3);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> '{"array": [1, 2]}'$$, 1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> '{"array": [1, 100]}'$$, 1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> '{"array": [1, 2, 100]}'$$, 1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> '{"array": [1, 2, 3]}'$$, 3);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+-- check misc containment
+SELECT check_jsonb_stats_test_estimate($$js @> '"foo"'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> '12345'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> '[]'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> '[12345]'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> '["foo"]'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> '[["foo", "bar"]]'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> '[["foo"]]'$$, 0.2);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> '[{"key": "foo"}]'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> '[null]'$$, 0.3);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+-- Check object key null estimate
+SELECT check_jsonb_stats_test_estimate($$js -> 'foo' IS NULL$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'foo' IS NOT NULL$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> '"foo \"bar"' IS NOT NULL$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'bad_key' IS NULL$$, 0.01);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'bad_key' IS NOT NULL$$, 0.01);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+-- Check object key existence
+SELECT check_jsonb_stats_test_estimate($$js ? 'bad_key'$$, 10);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js ? 'foo'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js ? 'array'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js ?| '{foo,bad_key}'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js ?| '{foo,array}'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js ?& '{foo,bad_key}'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js ?& '{foo,bar}'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index d652f7b5fb4..9b633baa179 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2537,6 +2537,38 @@ pg_stats_ext_exprs| SELECT cn.nspname AS schemaname,
      LEFT JOIN pg_namespace sn ON ((sn.oid = s.stxnamespace)))
      JOIN LATERAL ( SELECT unnest(pg_get_statisticsobjdef_expressions(s.oid)) AS expr,
             unnest(sd.stxdexpr) AS a) stat ON ((stat.expr IS NOT NULL)));
+pg_stats_json| SELECT n.nspname AS schemaname,
+    c.relname AS tablename,
+    a.attname,
+    (paths.path ->> 'path'::text) AS json_path,
+    s.stainherit AS inherited,
+    (((paths.path -> 'json'::text) ->> 'nullfrac'::text))::real AS null_frac,
+    (((paths.path -> 'json'::text) ->> 'width'::text))::real AS avg_width,
+    (((paths.path -> 'json'::text) ->> 'distinct'::text))::real AS n_distinct,
+    ARRAY( SELECT val.value AS val
+           FROM jsonb_array_elements((((paths.path -> 'json'::text) -> 'mcv'::text) -> 'values'::text)) val(value)) AS most_common_vals,
+    ARRAY( SELECT ((num.value)::text)::real AS num
+           FROM jsonb_array_elements((((paths.path -> 'json'::text) -> 'mcv'::text) -> 'numbers'::text)) num(value)) AS most_common_freqs,
+    ARRAY( SELECT val.value AS val
+           FROM jsonb_array_elements((((paths.path -> 'json'::text) -> 'histogram'::text) -> 'values'::text)) val(value)) AS histogram_bounds,
+    ARRAY( SELECT ((val.value)::text)::integer AS val
+           FROM jsonb_array_elements((((paths.path -> 'array_length'::text) -> 'mcv'::text) -> 'values'::text)) val(value)) AS most_common_array_lengths,
+    ARRAY( SELECT ((num.value)::text)::real AS num
+           FROM jsonb_array_elements((((paths.path -> 'array_length'::text) -> 'mcv'::text) -> 'numbers'::text)) num(value)) AS most_common_array_length_freqs,
+    (((paths.path -> 'json'::text) ->> 'correlation'::text))::real AS correlation
+   FROM (((pg_statistic s
+     JOIN pg_class c ON ((c.oid = s.starelid)))
+     JOIN pg_attribute a ON (((c.oid = a.attrelid) AND (a.attnum = s.staattnum))))
+     LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace))),
+    LATERAL ( SELECT unnest((((
+                CASE
+                    WHEN (s.stakind1 = 8) THEN s.stavalues1
+                    WHEN (s.stakind2 = 8) THEN s.stavalues2
+                    WHEN (s.stakind3 = 8) THEN s.stavalues3
+                    WHEN (s.stakind4 = 8) THEN s.stavalues4
+                    WHEN (s.stakind5 = 8) THEN s.stavalues5
+                    ELSE NULL::anyarray
+                END)::text)::jsonb[])[2:]) AS path) paths;
 pg_tables| SELECT n.nspname AS schemaname,
     c.relname AS tablename,
     pg_get_userbyid(c.relowner) AS tableowner,
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 5b0c73d7e37..d108cc62107 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -112,7 +112,7 @@ test: select_views portals_p2 foreign_key cluster dependency guc bitmapops combo
 # ----------
 # Another group of parallel tests (JSON related)
 # ----------
-test: json jsonb json_encoding jsonpath jsonpath_encoding jsonb_jsonpath
+test: json jsonb json_encoding jsonpath jsonpath_encoding jsonb_jsonpath jsonb_stats
 
 # ----------
 # Another group of parallel tests
diff --git a/src/test/regress/sql/jsonb_stats.sql b/src/test/regress/sql/jsonb_stats.sql
new file mode 100644
index 00000000000..fac71d09ded
--- /dev/null
+++ b/src/test/regress/sql/jsonb_stats.sql
@@ -0,0 +1,249 @@
+CREATE OR REPLACE FUNCTION explain_jsonb(sql_query text)
+RETURNS TABLE(explain_line json) AS
+$$
+BEGIN
+	RETURN QUERY EXECUTE 'EXPLAIN (ANALYZE, FORMAT json) ' || sql_query;
+END;
+$$ LANGUAGE plpgsql;
+
+CREATE OR REPLACE FUNCTION get_plan_and_actual_rows(sql_query text)
+RETURNS TABLE(plan integer, actual integer) AS
+$$
+	SELECT
+		(plan->>'Plan Rows')::integer plan,
+		(plan->>'Actual Rows')::integer actual
+	FROM (
+		SELECT explain_jsonb(sql_query) #> '{0,Plan,Plans,0}'
+	) p(plan)
+$$ LANGUAGE sql;
+
+CREATE OR REPLACE FUNCTION check_estimate(sql_query text, accuracy real)
+RETURNS boolean AS
+$$
+	SELECT plan BETWEEN actual / (1 + accuracy) AND (actual + 1) * (1 + accuracy)
+	FROM (SELECT * FROM get_plan_and_actual_rows(sql_query)) x
+$$ LANGUAGE sql;
+
+CREATE OR REPLACE FUNCTION check_estimate2(sql_query text, accuracy real)
+RETURNS TABLE(min integer, max integer) AS
+$$
+	SELECT (actual * (1 - accuracy))::integer, ((actual + 1) * (1 + accuracy))::integer
+	FROM (SELECT * FROM get_plan_and_actual_rows(sql_query)) x
+$$ LANGUAGE sql;
+
+CREATE TABLE jsonb_stats_test(js jsonb);
+
+INSERT INTO jsonb_stats_test SELECT NULL FROM generate_series(1, 1000);
+
+INSERT INTO jsonb_stats_test SELECT 'null' FROM generate_series(1, 200);
+INSERT INTO jsonb_stats_test SELECT 'true' FROM generate_series(1, 300);
+INSERT INTO jsonb_stats_test SELECT 'false' FROM generate_series(1, 500);
+
+INSERT INTO jsonb_stats_test SELECT '12345' FROM generate_series(1, 100);
+INSERT INTO jsonb_stats_test SELECT (1000 * (i % 10))::text::jsonb FROM generate_series(1, 400) i;
+INSERT INTO jsonb_stats_test SELECT i::text::jsonb FROM generate_series(1, 500) i;
+
+INSERT INTO jsonb_stats_test SELECT '"foo"' FROM generate_series(1, 100);
+INSERT INTO jsonb_stats_test SELECT format('"bar%s"', i % 10)::jsonb FROM generate_series(1, 400) i;
+INSERT INTO jsonb_stats_test SELECT format('"baz%s"', i)::jsonb FROM generate_series(1, 500) i;
+
+INSERT INTO jsonb_stats_test SELECT '{}' FROM generate_series(1, 100);
+INSERT INTO jsonb_stats_test SELECT jsonb_build_object('foo', 'bar') FROM generate_series(1, 100);
+INSERT INTO jsonb_stats_test SELECT jsonb_build_object('foo', 'baz' || (i % 10)) FROM generate_series(1, 300) i;
+INSERT INTO jsonb_stats_test SELECT jsonb_build_object('foo', i % 10) FROM generate_series(1, 200) i;
+INSERT INTO jsonb_stats_test SELECT jsonb_build_object('"foo \"bar"', i % 10) FROM generate_series(1, 200) i;
+
+INSERT INTO jsonb_stats_test SELECT '[]' FROM generate_series(1, 100);
+INSERT INTO jsonb_stats_test SELECT '["foo"]' FROM generate_series(1, 200);
+INSERT INTO jsonb_stats_test SELECT '[12345]' FROM generate_series(1, 300);
+INSERT INTO jsonb_stats_test SELECT '[["foo"]]' FROM generate_series(1, 200);
+INSERT INTO jsonb_stats_test SELECT '[{"key": "foo"}]' FROM generate_series(1, 200);
+INSERT INTO jsonb_stats_test SELECT '[null, "foo"]' FROM generate_series(1, 200);
+INSERT INTO jsonb_stats_test SELECT '[null, 12345]' FROM generate_series(1, 300);
+INSERT INTO jsonb_stats_test SELECT '[null, ["foo"]]' FROM generate_series(1, 200);
+INSERT INTO jsonb_stats_test SELECT '[null, {"key": "foo"}]' FROM generate_series(1, 200);
+
+-- Build random variable-length integer arrays
+SELECT setseed(0.0);
+
+INSERT INTO jsonb_stats_test
+SELECT jsonb_build_object('array',
+	jsonb_build_array())
+FROM generate_series(1, 1000);
+
+INSERT INTO jsonb_stats_test
+SELECT jsonb_build_object('array',
+	jsonb_build_array(
+		floor(random() * 10)::int))
+FROM generate_series(1, 4000);
+
+INSERT INTO jsonb_stats_test
+SELECT jsonb_build_object('array',
+	jsonb_build_array(
+		floor(random() * 10)::int,
+		floor(random() * 10)::int))
+FROM generate_series(1, 3000);
+
+INSERT INTO jsonb_stats_test
+SELECT jsonb_build_object('array',
+	jsonb_build_array(
+		floor(random() * 10)::int,
+		floor(random() * 10)::int,
+		floor(random() * 10)::int))
+FROM generate_series(1, 2000);
+
+
+ANALYZE jsonb_stats_test;
+
+CREATE OR REPLACE FUNCTION check_jsonb_stats_test_estimate(sql_condition text, accuracy real)
+RETURNS boolean AS
+$$
+	SELECT check_estimate('SELECT count(*) FROM jsonb_stats_test WHERE ' || sql_condition, accuracy)
+$$ LANGUAGE sql;
+
+DROP FUNCTION IF EXISTS check_jsonb_stats_test_estimate2(text, real);
+
+CREATE OR REPLACE FUNCTION check_jsonb_stats_test_estimate2(sql_condition text, accuracy real)
+RETURNS TABLE(plan integer, actual integer) AS
+$$
+	SELECT get_plan_and_actual_rows('SELECT count(*) FROM jsonb_stats_test WHERE ' || sql_condition)
+$$ LANGUAGE sql;
+
+-- Check NULL estimate
+SELECT check_jsonb_stats_test_estimate($$js IS NULL$$, 0.03);
+SELECT check_jsonb_stats_test_estimate($$js -> 'bad_key' IS NULL$$, 0.01);
+SELECT check_jsonb_stats_test_estimate($$js #> '{bad_key}' IS NULL$$, 0.01);
+SELECT check_jsonb_stats_test_estimate($$js -> 1000000 IS NULL$$, 0.01);
+SELECT check_jsonb_stats_test_estimate($$js #> '{1000000}' IS NULL$$, 0.01);
+SELECT check_jsonb_stats_test_estimate($$js -> 'bad_key1' -> 'bad_key2' IS NULL$$, 0.01);
+SELECT check_jsonb_stats_test_estimate($$js #> '{bad_key1,bad_key2}' IS NULL$$, 0.01);
+SELECT check_jsonb_stats_test_estimate($$js -> 'bad_key1' -> 1 IS NULL$$, 0.01);
+SELECT check_jsonb_stats_test_estimate($$js #> '{bad_key1,1}' IS NULL$$, 0.01);
+SELECT check_jsonb_stats_test_estimate($$js -> 1000000 -> 'foo' IS NULL$$, 0.01);
+SELECT check_jsonb_stats_test_estimate($$js #> '{1000000,foo}' IS NULL$$, 0.01);
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'bad_key' = '123'$$, 0.01);
+SELECT check_jsonb_stats_test_estimate($$js -> 1000000 = '123'$$, 0.01);
+
+-- Check null eq estimate
+SELECT check_jsonb_stats_test_estimate($$js =  'null'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js @> 'null'$$, 0.1);
+
+-- Check boolean eq estimate
+SELECT check_jsonb_stats_test_estimate($$js =  'true'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js @> 'true'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js =  'false'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js @> 'false'$$, 0.1);
+
+-- Check numeric eq estimate
+SELECT check_jsonb_stats_test_estimate($$js = '12345'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js#>'{}' = '12345'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js = '3000'$$, 0.3);
+SELECT check_jsonb_stats_test_estimate($$js = '1234'$$, 1);
+SELECT check_jsonb_stats_test_estimate($$js @> '6000'$$, 0.2);
+
+-- Check numeric range estimate
+SELECT check_jsonb_stats_test_estimate($$js < '0'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js < '100'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js < '1000'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js < '3456'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js < '10000'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js < '100000'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js > '100' AND js < '600'$$, 0.5);
+SELECT check_jsonb_stats_test_estimate($$js > '6800' AND js < '12000'$$, 0.1);
+
+-- Check string eq estimate
+SELECT check_jsonb_stats_test_estimate($$js = '"foo"'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js = '"bar7"'$$, 0.2);
+SELECT check_jsonb_stats_test_estimate($$js = '"baz1234"'$$, 10);
+SELECT check_jsonb_stats_test_estimate($$js @> '"bar4"'$$, 0.3);
+
+-- Check string range estimate
+SELECT check_jsonb_stats_test_estimate($$js > '"foo"'$$, 0.01);
+SELECT check_jsonb_stats_test_estimate($$js > '"bar"'$$, 0.01);
+SELECT check_jsonb_stats_test_estimate($$js > '"baz"'$$, 0.01);
+
+-- Check object eq estimate
+SELECT check_jsonb_stats_test_estimate($$js = '{}'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js > '{}'$$, 0.1);
+
+-- Check object key eq estimate
+SELECT check_jsonb_stats_test_estimate($$js -> 'foo' = '"bar"'$$, 0.2);
+SELECT check_jsonb_stats_test_estimate($$js -> 'foo' = '"baz"'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js -> 'foo' = '"baz5"'$$, 0.3);
+SELECT check_jsonb_stats_test_estimate($$js #> '{foo}' = '"bar"'$$, 0.2);
+
+-- Check object key range estimate
+SELECT check_jsonb_stats_test_estimate($$js -> 'foo' >= '"baz2"'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js -> 'foo' <  '"baz9"'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js -> 'foo' >= '"baz2"' AND js -> 'foo' < '"baz9"'$$, 0.1);
+
+-- Check array eq estimate
+SELECT check_jsonb_stats_test_estimate($$js = '[]'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js >= '[]' AND js < '{}'$$, 0.1);
+
+-- Check variable-length array element eq estimate
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 0 = '1'$$, 0.2);
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 1 = '6'$$, 0.2);
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 2 = '8'$$, 0.2);
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 3 = '1'$$, 0.2);
+
+-- Check variable-length array element range estimate
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 0 < '7'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 1 < '7'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 2 < '7'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 3 < '7'$$, 0.1);
+
+-- Check variable-length array containment estimate
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' @> '[]'$$, 0.2);
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' @> '[1]'$$, 0.2);
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' @> '[100]'$$, 0.2);
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' @> '[1, 2]'$$, 1);
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' @> '[1, 100]'$$, 1);
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' @> '[1, 2, 100]'$$, 1);
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' @> '[1, 2, 3]'$$, 5);
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' @> '1'$$, 0.3);
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' @> '100'$$, 10);
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 0 @> '1'$$, 0.3);
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 1 @> '1'$$, 0.3);
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 2 @> '1'$$, 0.3);
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 3 @> '1'$$, 0.3);
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 0 @> '[1]'$$, 0.3);
+
+SELECT check_jsonb_stats_test_estimate($$js @> '{"array": []}'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js @> '{"array": [1]}'$$, 0.3);
+SELECT check_jsonb_stats_test_estimate($$js @> '{"array": [100]}'$$, 0.3);
+SELECT check_jsonb_stats_test_estimate($$js @> '{"array": [1, 2]}'$$, 1);
+SELECT check_jsonb_stats_test_estimate($$js @> '{"array": [1, 100]}'$$, 1);
+SELECT check_jsonb_stats_test_estimate($$js @> '{"array": [1, 2, 100]}'$$, 1);
+SELECT check_jsonb_stats_test_estimate($$js @> '{"array": [1, 2, 3]}'$$, 3);
+
+-- check misc containment
+SELECT check_jsonb_stats_test_estimate($$js @> '"foo"'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js @> '12345'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js @> '[]'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js @> '[12345]'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js @> '["foo"]'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js @> '[["foo", "bar"]]'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js @> '[["foo"]]'$$, 0.2);
+SELECT check_jsonb_stats_test_estimate($$js @> '[{"key": "foo"}]'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js @> '[null]'$$, 0.3);
+
+-- Check object key null estimate
+SELECT check_jsonb_stats_test_estimate($$js -> 'foo' IS NULL$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js -> 'foo' IS NOT NULL$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js -> '"foo \"bar"' IS NOT NULL$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js -> 'bad_key' IS NULL$$, 0.01);
+SELECT check_jsonb_stats_test_estimate($$js -> 'bad_key' IS NOT NULL$$, 0.01);
+
+-- Check object key existence
+SELECT check_jsonb_stats_test_estimate($$js ? 'bad_key'$$, 10);
+SELECT check_jsonb_stats_test_estimate($$js ? 'foo'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js ? 'array'$$, 0.1);
+
+SELECT check_jsonb_stats_test_estimate($$js ?| '{foo,bad_key}'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js ?| '{foo,array}'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js ?& '{foo,bad_key}'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js ?& '{foo,bar}'$$, 0.1);
-- 
2.25.1

#11

Tomas Vondra

tomas.vondra@enterprisedb.com

almost 4 years ago

In reply to: Nikita Glukhov (#10)

Re: Collecting statistics about contents of JSONB columns

On 1/23/22 01:24, Nikita Glukhov wrote:

Hi!

I am glad that you found my very old patch interesting and started to
work on it. We failed to post it in 2016 mostly because we were not
satisfied with JSONB storage. Also we decided to wait for completion
of work on extended statistics as we thought that it could help us.
But in early 2017 we switched to SQL/JSON and forgot about this patch.

Understood. Let's see how feasible this idea is and if we can move this
forward.

I think custom datatype is necessary for better performance. With a
plain JSONB we need to do a lot of work for extraction of path stats:
- iterate through MCV/Histogram JSONB arrays
- cast numeric values to float, string to text etc.
- build PG arrays from extracted datums
- form pg_statistic tuple.

With a custom data type we could store pg_statistic tuple unmodified
and use it without any overhead. But then we need modify a bit
VariableStatData and several functions to pass additional nullfrac
corrections.

I'm not against evaluating/exploring alternative storage formats, but my
feeling is the real impact on performance will be fairly low. At least I
haven't seen this as very expensive while profiling the patch so far. Of
course, I may be wrong, and it may be more significant in some cases.

Maybe simple record type (path text, data pg_statistic, ext_data jsonb)
would be enough.

Maybe, but then you still need to store a bunch of those, right? So
either an array (likely toasted) or 1:M table. I'm not sure it's goiing
to be much cheaper than JSONB.

I'd suggest we focus on what we need to store first, which seems like
tha primary question, and worry about the exact storage format then.

Also there is an idea to store per-path separately in pg_statistic_ext
rows using expression like (jb #> '{foo,bar}') as stxexprs. This could
also help user to select which paths to analyze simply by using some
sort of CREATE STATISTICS. But it is really unclear how to:
* store pg_statistic_ext rows from typanalyze
* form path expressions for array elements (maybe add new jsonpath
operator)
* match various -> operator chains to stxexprs
* jsonpath-like languages need simple API for searching by stxexprs

Sure, you can do statistics on expressions, right? Of course, if that
expression produces JSONB value, that's not very useful at the moment.
Maybe we could have two typanalyze functions - one for regular analyze,
one for extended statistics?

That being said, I'm not sure extended stats are a good match for this.
My feeling was we should collect these stats for all JSONB columns,
which is why I argued for putting that in pg_statistic.

Per-path statistics should only be collected for scalars. This can be
enabled by flag JsonAnalyzeContext.scalarsOnly. But there are is a
problem: computed MCVs and histograms will be broken and we will not be
able to use them for queries like (jb > const) in general case. Also
we will not be and use internally in scalarineqsel() and var_eq_const()
(see jsonSelectivity()). Instead, we will have to implement own
estimator functions for JSONB comparison operators that will correctly
use our hacked MCVs and histograms (and maybe not all cases will be
supported; for example, comparison to scalars only).

Yeah, but maybe that's an acceptable trade-off? I mean, if we can
improve estimates for most clauses, and there's a some number of clauses
that are estimated just like without stats, that's still an improvement,
right?

It's possible to replace objects and arrays with empty ones when
scalarsOnly is set to keep correct frequencies of non-scalars.
But there is an old bug in JSONB comparison: empty arrays are placed
before other values in the JSONB sort order, although they should go
after all scalars. So we need also to distinguish empty and non-empty
arrays here.

Hmmm ...

I tried to fix a major part of places marked as XXX and FIXME, the new
version of the patches is attached. There are a lot of changes, you
can see them in a step-by-step form in the corresponding branch
jsonb_stats_20220122 in our GitHub repo [1].

Thanks! I'll go through the changes soon.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#12

Mahendra Singh Thalor

mahi6run@gmail.com

almost 4 years ago

In reply to: Tomas Vondra (#11)

1 attachment(s)

Re: Collecting statistics about contents of JSONB columns

On Tue, 25 Jan 2022 at 03:50, Tomas Vondra <tomas.vondra@enterprisedb.com>
wrote:

On 1/23/22 01:24, Nikita Glukhov wrote:

Hi!

I am glad that you found my very old patch interesting and started to
work on it. We failed to post it in 2016 mostly because we were not
satisfied with JSONB storage. Also we decided to wait for completion
of work on extended statistics as we thought that it could help us.
But in early 2017 we switched to SQL/JSON and forgot about this patch.

Understood. Let's see how feasible this idea is and if we can move this
forward.

I think custom datatype is necessary for better performance. With a
plain JSONB we need to do a lot of work for extraction of path stats:
- iterate through MCV/Histogram JSONB arrays
- cast numeric values to float, string to text etc.
- build PG arrays from extracted datums
- form pg_statistic tuple.

With a custom data type we could store pg_statistic tuple unmodified
and use it without any overhead. But then we need modify a bit
VariableStatData and several functions to pass additional nullfrac
corrections.

I'm not against evaluating/exploring alternative storage formats, but my
feeling is the real impact on performance will be fairly low. At least I
haven't seen this as very expensive while profiling the patch so far. Of
course, I may be wrong, and it may be more significant in some cases.

Maybe simple record type (path text, data pg_statistic, ext_data jsonb)
would be enough.

Maybe, but then you still need to store a bunch of those, right? So
either an array (likely toasted) or 1:M table. I'm not sure it's goiing
to be much cheaper than JSONB.

I'd suggest we focus on what we need to store first, which seems like
tha primary question, and worry about the exact storage format then.

Also there is an idea to store per-path separately in pg_statistic_ext
rows using expression like (jb #> '{foo,bar}') as stxexprs. This could
also help user to select which paths to analyze simply by using some
sort of CREATE STATISTICS. But it is really unclear how to:
* store pg_statistic_ext rows from typanalyze
* form path expressions for array elements (maybe add new jsonpath
operator)
* match various -> operator chains to stxexprs
* jsonpath-like languages need simple API for searching by stxexprs

Sure, you can do statistics on expressions, right? Of course, if that
expression produces JSONB value, that's not very useful at the moment.
Maybe we could have two typanalyze functions - one for regular analyze,
one for extended statistics?

That being said, I'm not sure extended stats are a good match for this.
My feeling was we should collect these stats for all JSONB columns,
which is why I argued for putting that in pg_statistic.

Per-path statistics should only be collected for scalars. This can be
enabled by flag JsonAnalyzeContext.scalarsOnly. But there are is a
problem: computed MCVs and histograms will be broken and we will not be
able to use them for queries like (jb > const) in general case. Also
we will not be and use internally in scalarineqsel() and var_eq_const()
(see jsonSelectivity()). Instead, we will have to implement own
estimator functions for JSONB comparison operators that will correctly
use our hacked MCVs and histograms (and maybe not all cases will be
supported; for example, comparison to scalars only).

Yeah, but maybe that's an acceptable trade-off? I mean, if we can
improve estimates for most clauses, and there's a some number of clauses
that are estimated just like without stats, that's still an improvement,
right?

It's possible to replace objects and arrays with empty ones when
scalarsOnly is set to keep correct frequencies of non-scalars.
But there is an old bug in JSONB comparison: empty arrays are placed
before other values in the JSONB sort order, although they should go
after all scalars. So we need also to distinguish empty and non-empty
arrays here.

Hmmm ...

I tried to fix a major part of places marked as XXX and FIXME, the new
version of the patches is attached. There are a lot of changes, you
can see them in a step-by-step form in the corresponding branch
jsonb_stats_20220122 in our GitHub repo [1].

Thanks! I'll go through the changes soon.

Thanks, Nikita and Tomas for these patches.

For the last few days, I was trying to understand these patches, and based
on Tomas's suggestion, I was doing some performance tests.

With the attached .SQL file, I can see that analyze is taking more time
with these patches.

*Setup: *
autovacuum=off
rest all are default settings.

Insert attached file with and without the patch to compare the time taken
by analyze.

*With json patches:*
postgres=# analyze test ;
ANALYZE
Time: *28464.062 ms (00:28.464)*
postgres=#
postgres=# SELECT pg_size_pretty(
pg_total_relation_size('pg_catalog.pg_statistic') );
pg_size_pretty
----------------
328 kB
(1 row)
--

*Without json patches:*
postgres=# analyze test ;
ANALYZE
*Time: 120.864* ms
postgres=# SELECT pg_size_pretty(
pg_total_relation_size('pg_catalog.pg_statistic') );
pg_size_pretty
----------------
272 kB

I haven't found the root cause of this but I feel that this time is due to
a loop of all the paths.
In my test data, there is a total of 951 different-2 paths. While doing
analysis, first we check all the sample rows(30000) and we collect all the
different-2 paths (here 951), and after that for every single path, we loop
over all the sample rows again to collect stats for a particular path. I
feel that these loops might be taking time.

I will run perf and will try to find out the root cause of this.

Apart from this performance issue, I haven't found any crashes or issues.

Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com

#13

Greg Stark

stark@mit.edu

almost 4 years ago

In reply to: Tomas Vondra (#9)

Re: Collecting statistics about contents of JSONB columns

On Thu, 6 Jan 2022 at 14:56, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:

Not sure I understand. I wasn't suggesting any user-defined filtering,
but something done by default, similarly to what we do for regular MCV
lists, based on frequency. We'd include frequent paths while excluding
rare ones.

So no need for a user interface.

Not sure but I think he was agreeing with you. That we should figure
out the baseline behaviour and get it as useful as possible first then
later look at adding some way to customize it. I agree -- I don't
think the user interface will be hard technically but I think it will
require some good ideas and there could be lots of bikeshedding. And a
lot of users will never even use it anyways so it's important to get
the defaults as useful as possible.

Similarly for the non-scalar values - I don't think we can really keep
regular statistics on such values (for the same reason why it's not
enough for whole JSONB columns), so why to build/store that anyway.

For a default behaviour I wonder if it wouldn't be better to just
flatten and extract all the scalars. So if there's no matching path
then at least we have some way to estimate how often a scalar appears
anywhere in the json document.

That amounts to assuming the user knows the right path to find a given
scalar and there isn't a lot of overlap between keys. So it would at
least do something useful if you have something like {gender: female,
name: {first: nancy, last: reagan], state: california, country: usa}.
It might get things slightly wrong if you have some people named
"georgia" or have names that can be first or last names.

But it would generally be doing something more or less useful as long
as they look for "usa" in the country field and "male" in the gender
field. If they looked for "male" in $.name.first path it would give
bad estimates but assuming they know their data structure they won't
be doing that.

--
greg

#14

Tomas Vondra

tomas.vondra@enterprisedb.com

almost 4 years ago

In reply to: Mahendra Singh Thalor (#12)

3 attachment(s)

Re: Collecting statistics about contents of JSONB columns

On 1/25/22 17:56, Mahendra Singh Thalor wrote:

...

For the last few days, I was trying to understand these patches, and
based on Tomas's suggestion, I was doing some performance tests.

With the attached .SQL file, I can see that analyze is taking more time
with these patches.

*Setup: *
autovacuum=off
rest all are default settings.

Insert attached file with and without the patch to compare the time
taken by analyze.

*With json patches:*
postgres=# analyze test ;
ANALYZE
Time: *28464.062 ms (00:28.464)*
postgres=#
postgres=# SELECT pg_size_pretty(
pg_total_relation_size('pg_catalog.pg_statistic') );
pg_size_pretty
----------------
328 kB
(1 row)
--

*Without json patches:*
postgres=# analyze test ;
ANALYZE
*Time: 120.864* ms
postgres=# SELECT pg_size_pretty(
pg_total_relation_size('pg_catalog.pg_statistic') );
pg_size_pretty
----------------
272 kB

I haven't found the root cause of this but I feel that this time is due
to a loop of all the paths.
In my test data, there is a total of 951 different-2 paths. While doing
analysis, first we check all the sample rows(30000) and we collect all
the different-2 paths (here 951), and after that for every single path,
we loop over all the sample rows again to collect stats for a particular
path. I feel that these loops might be taking time.

I will run perf and will try to find out the root cause of this.

Thanks, I've been doing some performance tests too, and you're right it
takes quite a bit of time. I wanted to compare how the timing changes
with complexity of the JSON documents (nesting, number of keys, ...) so
I wrote a simple python script to generate random JSON documents with
different parameters - see the attached json-generate.py script.

It's a bit crude, but it generates synthetic documents with a chosen
number of levels, keys per level, distinct key values, etc. The
generated documents are loaded directly into a "json_test" database,
into a table "test_table" with a single jsonb column called "v".
Tweaking this to connect to a different database, or just dump the
generated documents to a file, should be trivial.

The attached bash script runs the data generator for a couple of
combinations, and them measures how long it takes to analyze the table,
how large the statistics are (in a rather crude way), etc.

The results look like this (the last two columns are analyze duration in
milliseconds, for master and with the patch):

levels keys unique keys paths master patched
----------------------------------------------------------
1 1 1 2 153 122
1 1 1000 1001 134 1590
1 8 8 9 157 367
1 8 1000 1001 155 1838
1 64 64 65 189 2298
1 64 1000 1001 46 9322
2 1 1 3 237 197
2 1 1000 30580 152 46468
2 8 8 73 245 1780

So yeah, it's significantly slower - in most cases not as much as you
observed, but an order of magnitude slower than master. For size of the
statistics, it's similar:

levels keys unique keys paths table size master patched
------------------------------------------------------------------
1 1 1 2 1843200 16360 24325
1 1 1000 1001 1843200 16819 1425400
1 8 8 9 4710400 28948 88837
1 8 1000 1001 6504448 42694 3915802
1 64 64 65 30154752 209713 689842
1 64 1000 1001 49086464 1093 7755214
2 1 1 3 2572288 24883 48727
2 1 1000 30580 2572288 11422 26396365
2 8 8 73 23068672 164785 862258

This is measured by by dumping pg_statistic for the column, so in
database it might be compressed etc. It's larger, but that's somewhat
expected because we simply store more detailed stats. The size grows
with the number of paths extracted - which is expected, of course.

If you noticed why this doesn't show data for additional combinations
(e.g. 2 levels 8 keys and 1000 distinct key values), then that's the bad
news - that takes ages (multiple minutes) and then it gets killed by OOM
killer because it eats gigabytes of memory.

I agree the slowness is largely due to extracting all paths and then
processing them one by one - which means we have to loop over the tuples
over and over. In this case there's about 850k distinct paths extracted,
so we do ~850k loops over 30k tuples. That's gotta take time.

I don't know what exactly to do about this, but I already mentioned we
may need to pick a subset of paths to keep, similarly to how we pick
items for MCV. I mean, if we only saw a path once or twice, it's
unlikely to be interesting enough to build stats for it. I haven't
tried, but I'd bet most of the 850k paths might be ignored like this.

The other thing we might do is making it the loops more efficient. For
example, we might track which documents contain each path (by a small
bitmap or something), so that in the loop we can skip rows that don't
contain the path we're currently processing. Or something like that.

Of course, this can't eliminate all the overhead - we've building more
stats and that has a cost. In the "common" case of stable "fixed" schema
with the same paths in all documents we'll still need to do loop for
each of them. So it's bound to be slower than master.

Which probably means it's a bad idea to do this for all JSONB columns,
because in many cases the extra stats are not needed so the extra
analyze time would be a waste. So I guess we'll need some way to enable
this only for selected columns ... I argued against the idea to
implement this as extended statistics in the first message, but it's a
reasonably nice way to do such stuff (expression stats are a precedent).

Apart from this performance issue, I haven't found any crashes or issues.

Well, I haven't seen any crashes either, but as I mentioned for complex
documents (2 levels, many distinct keys) the ANALYZE starts consuming a
lot of memory and may get killed by OOM. For example if you generate
documents like this

./json-generate.py 30000 2 8 1000 6 1000

and then run ANALYZE, that'll take ages and it very quickly gets into a
situation like this (generated from gdb by calling MemoryContextStats on
TopMemoryContext):

-------------------------------------------------------------------------
TopMemoryContext: 80776 total in 6 blocks; 13992 free (18 chunks); 66784
used
...
TopPortalContext: 8192 total in 1 blocks; 7656 free (0 chunks); 536 used
PortalContext: 1024 total in 1 blocks; 488 free (0 chunks); 536
used: <unnamed>
Analyze: 472726496 total in 150 blocks; 3725776 free (4 chunks);
469000720 used
Analyze Column: 921177696 total in 120 blocks; 5123256 free
(238 chunks); 916054440 used
Json Analyze Tmp Context: 8192 total in 1 blocks; 5720 free
(1 chunks); 2472 used
Json Analyze Pass Context: 8192 total in 1 blocks; 7928
free (0 chunks); 264 used
JSON analyze path table: 1639706040 total in 25084 blocks;
1513640 free (33 chunks); 1638192400 used
Vacuum: 8192 total in 1 blocks; 7448 free (0 chunks); 744 used
...
Grand total: 3035316184 bytes in 25542 blocks; 10971120 free (352
chunks); 3024345064 used
-------------------------------------------------------------------------

Yes, that's backend 3GB of memory, out of which 1.6GB is in "analyze
path table" context, 400MB in "analyze" and 900MB in "analyze column"
contexts. I mean, that seems a bit excessive. And it grows over time, so
after a while my laptop gives up and kills the backend.

I'm not sure if it's a memory leak (which would be fixable), or it's due
to keeping stats for all the extracted paths. I mean, in this particular
case we have 850k paths - even if stats are just 1kB per path, that's
850MB. This requires more investigation.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#15

Tomas Vondra

tomas.vondra@enterprisedb.com

almost 4 years ago

In reply to: Tomas Vondra (#14)

Re: Collecting statistics about contents of JSONB columns

On 2/4/22 03:47, Tomas Vondra wrote:

./json-generate.py 30000 2 8 1000 6 1000

Sorry, this should be (different order of parameters):

./json-generate.py 30000 2 1000 8 6 1000

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#16

Nikita Glukhov

n.gluhov@postgrespro.ru

almost 4 years ago

In reply to: Tomas Vondra (#14)

6 attachment(s)

Re: Collecting statistics about contents of JSONB columns

On 04.02.2022 05:47, Tomas Vondra wrote:

On 1/25/22 17:56, Mahendra Singh Thalor wrote:

...

For the last few days, I was trying to understand these patches, and
based on Tomas's suggestion, I was doing some performance tests.

With the attached .SQL file, I can see that analyze is taking more
time with these patches.

I haven't found the root cause of this but I feel that this time is
due to a loop of all the paths.
In my test data, there is a total of 951 different-2 paths. While
doing analysis, first we check all the sample rows(30000) and we
collect all the different-2 paths (here 951), and after that for
every single path, we loop over all the sample rows again to collect
stats for a particular path. I feel that these loops might be taking
time.

Thanks, I've been doing some performance tests too, and you're right
it takes quite a bit of time.

That is absolutely not surprising, I have warned about poor performance
in cases with a large number of paths.

I agree the slowness is largely due to extracting all paths and then
processing them one by one - which means we have to loop over the
tuples over and over. In this case there's about 850k distinct paths
extracted, so we do ~850k loops over 30k tuples. That's gotta take time.

I don't know what exactly to do about this, but I already mentioned we
may need to pick a subset of paths to keep, similarly to how we pick
items for MCV. I mean, if we only saw a path once or twice, it's
unlikely to be interesting enough to build stats for it. I haven't
tried, but I'd bet most of the 850k paths might be ignored like this.

The other thing we might do is making it the loops more efficient. For
example, we might track which documents contain each path (by a small
bitmap or something), so that in the loop we can skip rows that don't
contain the path we're currently processing. Or something like that.

Apart from this performance issue, I haven't found any crashes or
issues.

Well, I haven't seen any crashes either, but as I mentioned for
complex documents (2 levels, many distinct keys) the ANALYZE starts
consuming a lot of memory and may get killed by OOM. For example if
you generate documents like this

./json-generate.py 30000 2 8 1000 6 1000

and then run ANALYZE, that'll take ages and it very quickly gets into
a situation like this (generated from gdb by calling
MemoryContextStats on TopMemoryContext): and then run ANALYZE, that'll
take ages and it very quickly gets into a situation like this
(generated from gdb by calling MemoryContextStats on TopMemoryContext):

-------------------------------------------------------------------------
TopMemoryContext: 80776 total in 6 blocks; 13992 free (18 chunks);
66784 used
...
TopPortalContext: 8192 total in 1 blocks; 7656 free (0 chunks); 536
used
    PortalContext: 1024 total in 1 blocks; 488 free (0 chunks); 536
used: <unnamed>
      Analyze: 472726496 total in 150 blocks; 3725776 free (4 chunks);
469000720 used
        Analyze Column: 921177696 total in 120 blocks; 5123256 free
(238 chunks); 916054440 used
          Json Analyze Tmp Context: 8192 total in 1 blocks; 5720 free
(1 chunks); 2472 used
            Json Analyze Pass Context: 8192 total in 1 blocks; 7928
free (0 chunks); 264 used
          JSON analyze path table: 1639706040 total in 25084 blocks;
1513640 free (33 chunks); 1638192400 used
      Vacuum: 8192 total in 1 blocks; 7448 free (0 chunks); 744 used
...
Grand total: 3035316184 bytes in 25542 blocks; 10971120 free (352
chunks); 3024345064 used
-------------------------------------------------------------------------

Yes, that's backend 3GB of memory, out of which 1.6GB is in "analyze
path table" context, 400MB in "analyze" and 900MB in "analyze column"
contexts. I mean, that seems a bit excessive. And it grows over time,
so after a while my laptop gives up and kills the backend.

I'm not sure if it's a memory leak (which would be fixable), or it's
due to keeping stats for all the extracted paths. I mean, in this
particular case we have 850k paths - even if stats are just 1kB per
path, that's 850MB. This requires more investigation.

Thank you for the tests and investigation.

I have tried to reduce memory consumption and speed up row scanning:

1. "JSON analyze path table" context contained ~1KB JsonPathAnlStats
structure per JSON path in the global hash table. I have moved
JsonPathAnlStats to the stack of compute_json_stats(), and hash
table now consumes ~70 bytes per path.

2. I have fixed copying of resulting JSONB stats into context, which
reduced the size of "Analyze Column" context.

3. I have optimized consumption of single-pass algorithm by storing
only value lists in the non-temporary context. That helped to
execute "2 64 64" test case in 30 seconds. Single-pass is a
bit faster in non-TOASTed cases, and much faster in TOASTed.
But it consumes much more memory and still goes to OOM in the
cases with more than ~100k paths.

4. I have implemented per-path document lists/bitmaps, which really
speed up the case "2 8 1000". List is converted into bitmap when
it becomes larger than bitmap.

5. Also I have fixed some bugs.

All these changes you can find commit form in our GitHub repository
on the branch jsonb_stats_20220310 [1]https://github.com/postgrespro/postgres/tree/jsonb_stats_20220310.

Updated results of the test:

levels keys uniq keys paths master multi-pass single-pass
ms MB ms MB
-------------------------------------------------------------------
1 1 1 2 153 122 10 82 14
1 1 1000 1001 134 105 11 78 38
1 8 8 9 157 384 19 328 32
1 8 1000 1001 155 454 23 402 72
1 64 64 65 129 2889 45 2386 155
1 64 1000 1001 158 3990 94 1447 177
2 1 1 3 237 147 10 91 16
2 1 1000 30577 152 264 32 394 234
2 8 8 72 245 1943 37 1692 139
2 8 1000 852333 152 9175 678 OOM
2 64 64 4161 1784 ~1 hour 53 30018 1750
2 64 1000 1001001 4715 ~4 hours 1600 OOM

The two last multi-pass results are too slow, because JSONBs becomes
TOASTed. For measuring master in these tests, I disabled
WIDTH_THRESHOLD check which skipped TOASTed values > 1KB.

Next, I am going to try to disable all-paths collection and implement
collection of most common paths (and/or hashed paths maybe).

[1]: https://github.com/postgrespro/postgres/tree/jsonb_stats_20220310

--
Nikita Glukhov
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company

Attachments:

0001-Add-pg_operator.oprstat-for-derived-operator-statist-20220310.patchtext/x-patch; charset=UTF-8; name=0001-Add-pg_operator.oprstat-for-derived-operator-statist-20220310.patchDownload

From 300b43180b287424b595c237b452d55f4d582bc5 Mon Sep 17 00:00:00 2001
From: Nikita Glukhov <n.gluhov@postgrespro.ru>
Date: Sat, 12 Nov 2016 18:59:43 +0300
Subject: [PATCH 1/6] Add pg_operator.oprstat for derived operator statistics
 estimation

---
 src/backend/catalog/pg_operator.c      | 11 +++++
 src/backend/commands/operatorcmds.c    | 61 ++++++++++++++++++++++++++
 src/backend/utils/adt/selfuncs.c       | 38 ++++++++++++++++
 src/backend/utils/cache/lsyscache.c    | 24 ++++++++++
 src/include/catalog/pg_operator.h      |  4 ++
 src/include/utils/lsyscache.h          |  1 +
 src/test/regress/expected/oidjoins.out |  1 +
 7 files changed, 140 insertions(+)

diff --git a/src/backend/catalog/pg_operator.c b/src/backend/catalog/pg_operator.c
index 630bf3e56cc..9205e62c0eb 100644
--- a/src/backend/catalog/pg_operator.c
+++ b/src/backend/catalog/pg_operator.c
@@ -256,6 +256,7 @@ OperatorShellMake(const char *operatorName,
 	values[Anum_pg_operator_oprcode - 1] = ObjectIdGetDatum(InvalidOid);
 	values[Anum_pg_operator_oprrest - 1] = ObjectIdGetDatum(InvalidOid);
 	values[Anum_pg_operator_oprjoin - 1] = ObjectIdGetDatum(InvalidOid);
+	values[Anum_pg_operator_oprstat - 1] = ObjectIdGetDatum(InvalidOid);
 
 	/*
 	 * create a new operator tuple
@@ -301,6 +302,7 @@ OperatorShellMake(const char *operatorName,
  *		negatorName				X negator operator
  *		restrictionId			X restriction selectivity procedure ID
  *		joinId					X join selectivity procedure ID
+ *		statsId					X statistics derivation procedure ID
  *		canMerge				merge join can be used with this operator
  *		canHash					hash join can be used with this operator
  *
@@ -333,6 +335,7 @@ OperatorCreate(const char *operatorName,
 			   List *negatorName,
 			   Oid restrictionId,
 			   Oid joinId,
+			   Oid statsId,
 			   bool canMerge,
 			   bool canHash)
 {
@@ -505,6 +508,7 @@ OperatorCreate(const char *operatorName,
 	values[Anum_pg_operator_oprcode - 1] = ObjectIdGetDatum(procedureId);
 	values[Anum_pg_operator_oprrest - 1] = ObjectIdGetDatum(restrictionId);
 	values[Anum_pg_operator_oprjoin - 1] = ObjectIdGetDatum(joinId);
+	values[Anum_pg_operator_oprstat - 1] = ObjectIdGetDatum(statsId);
 
 	pg_operator_desc = table_open(OperatorRelationId, RowExclusiveLock);
 
@@ -855,6 +859,13 @@ makeOperatorDependencies(HeapTuple tuple,
 		add_exact_object_address(&referenced, addrs);
 	}
 
+	/* Dependency on statistics derivation function */
+	if (OidIsValid(oper->oprstat))
+	{
+		ObjectAddressSet(referenced, ProcedureRelationId, oper->oprstat);
+		add_exact_object_address(&referenced, addrs);
+	}
+
 	record_object_address_dependencies(&myself, addrs, DEPENDENCY_NORMAL);
 	free_object_addresses(addrs);
 
diff --git a/src/backend/commands/operatorcmds.c b/src/backend/commands/operatorcmds.c
index a5924d7d564..adf13e648a6 100644
--- a/src/backend/commands/operatorcmds.c
+++ b/src/backend/commands/operatorcmds.c
@@ -52,6 +52,7 @@
 
 static Oid	ValidateRestrictionEstimator(List *restrictionName);
 static Oid	ValidateJoinEstimator(List *joinName);
+static Oid	ValidateStatisticsDerivator(List *joinName);
 
 /*
  * DefineOperator
@@ -78,10 +79,12 @@ DefineOperator(List *names, List *parameters)
 	List	   *commutatorName = NIL;	/* optional commutator operator name */
 	List	   *negatorName = NIL;	/* optional negator operator name */
 	List	   *restrictionName = NIL;	/* optional restrict. sel. function */
+	List	   *statisticsName = NIL;	/* optional stats estimat. procedure */
 	List	   *joinName = NIL; /* optional join sel. function */
 	Oid			functionOid;	/* functions converted to OID */
 	Oid			restrictionOid;
 	Oid			joinOid;
+	Oid			statisticsOid;
 	Oid			typeId[2];		/* to hold left and right arg */
 	int			nargs;
 	ListCell   *pl;
@@ -131,6 +134,8 @@ DefineOperator(List *names, List *parameters)
 			restrictionName = defGetQualifiedName(defel);
 		else if (strcmp(defel->defname, "join") == 0)
 			joinName = defGetQualifiedName(defel);
+		else if (strcmp(defel->defname, "statistics") == 0)
+			statisticsName = defGetQualifiedName(defel);
 		else if (strcmp(defel->defname, "hashes") == 0)
 			canHash = defGetBoolean(defel);
 		else if (strcmp(defel->defname, "merges") == 0)
@@ -246,6 +251,10 @@ DefineOperator(List *names, List *parameters)
 		joinOid = ValidateJoinEstimator(joinName);
 	else
 		joinOid = InvalidOid;
+	if (statisticsName)
+		statisticsOid = ValidateStatisticsDerivator(statisticsName);
+	else
+		statisticsOid = InvalidOid;
 
 	/*
 	 * now have OperatorCreate do all the work..
@@ -260,6 +269,7 @@ DefineOperator(List *names, List *parameters)
 					   negatorName, /* optional negator operator name */
 					   restrictionOid,	/* optional restrict. sel. function */
 					   joinOid, /* optional join sel. function name */
+					   statisticsOid, /* optional stats estimation procedure */
 					   canMerge,	/* operator merges */
 					   canHash);	/* operator hashes */
 }
@@ -357,6 +367,40 @@ ValidateJoinEstimator(List *joinName)
 	return joinOid;
 }
 
+/*
+ * Look up a statistics estimator function by name, and verify that it has the
+ * correct signature and we have the permissions to attach it to an operator.
+ */
+static Oid
+ValidateStatisticsDerivator(List *statName)
+{
+	Oid			typeId[4];
+	Oid			statOid;
+	AclResult	aclresult;
+
+	typeId[0] = INTERNALOID;	/* PlannerInfo */
+	typeId[1] = INTERNALOID;	/* OpExpr */
+	typeId[2] = INT4OID;		/* varRelid */
+	typeId[3] = INTERNALOID;	/* VariableStatData */
+
+	statOid = LookupFuncName(statName, 4, typeId, false);
+
+	/* statistics estimators must return void */
+	if (get_func_rettype(statOid) != VOIDOID)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+				 errmsg("statistics estimator function %s must return type %s",
+						NameListToString(statName), "void")));
+
+	/* Require EXECUTE rights for the estimator */
+	aclresult = pg_proc_aclcheck(statOid, GetUserId(), ACL_EXECUTE);
+	if (aclresult != ACLCHECK_OK)
+		aclcheck_error(aclresult, OBJECT_FUNCTION,
+					   NameListToString(statName));
+
+	return statOid;
+}
+
 /*
  * Guts of operator deletion.
  */
@@ -424,6 +468,9 @@ AlterOperator(AlterOperatorStmt *stmt)
 	List	   *joinName = NIL; /* optional join sel. function */
 	bool		updateJoin = false;
 	Oid			joinOid;
+	List	   *statName = NIL; /* optional statistics estimation procedure */
+	bool		updateStat = false;
+	Oid			statOid;
 
 	/* Look up the operator */
 	oprId = LookupOperWithArgs(stmt->opername, false);
@@ -454,6 +501,11 @@ AlterOperator(AlterOperatorStmt *stmt)
 			joinName = param;
 			updateJoin = true;
 		}
+		else if (pg_strcasecmp(defel->defname, "stats") == 0)
+		{
+			statName = param;
+			updateStat = true;
+		}
 
 		/*
 		 * The rest of the options that CREATE accepts cannot be changed.
@@ -496,6 +548,10 @@ AlterOperator(AlterOperatorStmt *stmt)
 		joinOid = ValidateJoinEstimator(joinName);
 	else
 		joinOid = InvalidOid;
+	if (statName)
+		statOid = ValidateStatisticsDerivator(statName);
+	else
+		statOid = InvalidOid;
 
 	/* Perform additional checks, like OperatorCreate does */
 	if (!(OidIsValid(oprForm->oprleft) && OidIsValid(oprForm->oprright)))
@@ -536,6 +592,11 @@ AlterOperator(AlterOperatorStmt *stmt)
 		replaces[Anum_pg_operator_oprjoin - 1] = true;
 		values[Anum_pg_operator_oprjoin - 1] = joinOid;
 	}
+	if (updateStat)
+	{
+		replaces[Anum_pg_operator_oprstat - 1] = true;
+		values[Anum_pg_operator_oprstat - 1] = statOid;
+	}
 
 	tup = heap_modify_tuple(tup, RelationGetDescr(catalog),
 							values, nulls, replaces);
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 1fbb0b28c3b..d1dd049f1ae 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -4917,6 +4917,30 @@ ReleaseDummy(HeapTuple tuple)
 	pfree(tuple);
 }
 
+/*
+ * examine_operator_expression
+ *		Try to derive optimizer statistics for the operator expression using
+ *		operator's oprstat function.
+ *
+ * There can be another OpExpr in one of the arguments, and it will be called
+ * recursively from the oprstat procedure through the following chain:
+ * get_restriction_variable() => examine_variable() =>
+ * examine_operator_expression().
+ */
+static void
+examine_operator_expression(PlannerInfo *root, OpExpr *opexpr, int varRelid,
+							VariableStatData *vardata)
+{
+	RegProcedure oprstat = get_oprstat(opexpr->opno);
+
+	if (OidIsValid(oprstat))
+		OidFunctionCall4(oprstat,
+						 PointerGetDatum(root),
+						 PointerGetDatum(opexpr),
+						 Int32GetDatum(varRelid),
+						 PointerGetDatum(vardata));
+}
+
 /*
  * examine_variable
  *		Try to look up statistical data about an expression.
@@ -5332,6 +5356,20 @@ examine_variable(PlannerInfo *root, Node *node, int varRelid,
 				pos++;
 			}
 		}
+
+		/*
+		 * If there's no index or extended statistics matching the expression,
+		 * try deriving the statistics from statistics on arguments of the
+		 * operator expression (OpExpr). We do this last because it may be quite
+		 * expensive, and it's unclear how accurate the statistics will be.
+		 *
+		 * More restrictions on the OpExpr (e.g. that one of the arguments
+		 * has to be a Const or something) can be put by the operator itself
+		 * in its oprstat function.
+		 */
+		if (!vardata->statsTuple && IsA(basenode, OpExpr))
+			examine_operator_expression(root, (OpExpr *) basenode, varRelid,
+										vardata);
 	}
 }
 
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index feef9998634..b5440b596cd 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -1567,6 +1567,30 @@ get_oprjoin(Oid opno)
 		return (RegProcedure) InvalidOid;
 }
 
+/*
+ * get_oprstat
+ *
+ *		Returns procedure id for estimating statistics for an operator.
+ */
+RegProcedure
+get_oprstat(Oid opno)
+{
+	HeapTuple	tp;
+
+	tp = SearchSysCache1(OPEROID, ObjectIdGetDatum(opno));
+	if (HeapTupleIsValid(tp))
+	{
+		Form_pg_operator optup = (Form_pg_operator) GETSTRUCT(tp);
+		RegProcedure result;
+
+		result = optup->oprstat;
+		ReleaseSysCache(tp);
+		return result;
+	}
+	else
+		return (RegProcedure) InvalidOid;
+}
+
 /*				---------- FUNCTION CACHE ----------					 */
 
 /*
diff --git a/src/include/catalog/pg_operator.h b/src/include/catalog/pg_operator.h
index 51263f550fe..ff1bb339f75 100644
--- a/src/include/catalog/pg_operator.h
+++ b/src/include/catalog/pg_operator.h
@@ -73,6 +73,9 @@ CATALOG(pg_operator,2617,OperatorRelationId)
 
 	/* OID of join estimator, or 0 */
 	regproc		oprjoin BKI_DEFAULT(-) BKI_LOOKUP_OPT(pg_proc);
+
+	/* OID of statistics estimator, or 0 */
+	regproc		oprstat BKI_DEFAULT(-) BKI_LOOKUP_OPT(pg_proc);
 } FormData_pg_operator;
 
 /* ----------------
@@ -95,6 +98,7 @@ extern ObjectAddress OperatorCreate(const char *operatorName,
 									List *negatorName,
 									Oid restrictionId,
 									Oid joinId,
+									Oid statisticsId,
 									bool canMerge,
 									bool canHash);
 
diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h
index b8dd27d4a96..cc08fc50a50 100644
--- a/src/include/utils/lsyscache.h
+++ b/src/include/utils/lsyscache.h
@@ -118,6 +118,7 @@ extern Oid	get_commutator(Oid opno);
 extern Oid	get_negator(Oid opno);
 extern RegProcedure get_oprrest(Oid opno);
 extern RegProcedure get_oprjoin(Oid opno);
+extern RegProcedure get_oprstat(Oid opno);
 extern char *get_func_name(Oid funcid);
 extern Oid	get_func_namespace(Oid funcid);
 extern Oid	get_func_rettype(Oid funcid);
diff --git a/src/test/regress/expected/oidjoins.out b/src/test/regress/expected/oidjoins.out
index 215eb899be3..111ea99cdae 100644
--- a/src/test/regress/expected/oidjoins.out
+++ b/src/test/regress/expected/oidjoins.out
@@ -113,6 +113,7 @@ NOTICE:  checking pg_operator {oprnegate} => pg_operator {oid}
 NOTICE:  checking pg_operator {oprcode} => pg_proc {oid}
 NOTICE:  checking pg_operator {oprrest} => pg_proc {oid}
 NOTICE:  checking pg_operator {oprjoin} => pg_proc {oid}
+NOTICE:  checking pg_operator {oprstat} => pg_proc {oid}
 NOTICE:  checking pg_opfamily {opfmethod} => pg_am {oid}
 NOTICE:  checking pg_opfamily {opfnamespace} => pg_namespace {oid}
 NOTICE:  checking pg_opfamily {opfowner} => pg_authid {oid}
-- 
2.25.1

0002-Add-stats_form_tuple-20220310.patchtext/x-patch; charset=UTF-8; name=0002-Add-stats_form_tuple-20220310.patchDownload

From 7f4bd696f8162137762826478f450074df5b7939 Mon Sep 17 00:00:00 2001
From: Nikita Glukhov <n.gluhov@postgrespro.ru>
Date: Wed, 7 Dec 2016 16:12:55 +0300
Subject: [PATCH 2/6] Add stats_form_tuple()

---
 src/backend/utils/adt/selfuncs.c | 55 ++++++++++++++++++++++++++++++++
 src/include/utils/selfuncs.h     | 22 +++++++++++++
 2 files changed, 77 insertions(+)

diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index d1dd049f1ae..f534e862079 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -7954,3 +7954,58 @@ brincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 
 	*indexPages = index->pages;
 }
+
+/*
+ * stats_form_tuple - Form pg_statistic tuple from StatsData.
+ *
+ * If 'data' parameter is NULL, form all-NULL tuple (nullfrac = 1.0).
+ */
+HeapTuple
+stats_form_tuple(StatsData *data)
+{
+	Relation	rel;
+	HeapTuple	tuple;
+	Datum		values[Natts_pg_statistic];
+	bool		nulls[Natts_pg_statistic];
+	int			i;
+
+	for (i = 0; i < Natts_pg_statistic; ++i)
+		nulls[i] = false;
+
+	values[Anum_pg_statistic_starelid - 1] = ObjectIdGetDatum(InvalidOid);
+	values[Anum_pg_statistic_staattnum - 1] = Int16GetDatum(0);
+	values[Anum_pg_statistic_stainherit - 1] = BoolGetDatum(false);
+	values[Anum_pg_statistic_stanullfrac - 1] =
+									Float4GetDatum(data ? data->nullfrac : 1.0);
+	values[Anum_pg_statistic_stawidth - 1] =
+									Int32GetDatum(data ? data->width : 0);
+	values[Anum_pg_statistic_stadistinct - 1] =
+									Float4GetDatum(data ? data->distinct : 0);
+
+	for (i = 0; i < STATISTIC_NUM_SLOTS; i++)
+	{
+		StatsSlot *slot = data ? &data->slots[i] : NULL;
+
+		values[Anum_pg_statistic_stakind1 + i - 1] =
+								Int16GetDatum(slot ? slot->kind : 0);
+
+		values[Anum_pg_statistic_staop1 + i - 1] =
+					ObjectIdGetDatum(slot ? slot->opid : InvalidOid);
+
+		if (slot && DatumGetPointer(slot->numbers))
+			values[Anum_pg_statistic_stanumbers1 + i - 1] = slot->numbers;
+		else
+			nulls[Anum_pg_statistic_stanumbers1 + i - 1] = true;
+
+		if (slot && DatumGetPointer(slot->values))
+			values[Anum_pg_statistic_stavalues1 + i - 1] = slot->values;
+		else
+			nulls[Anum_pg_statistic_stavalues1 + i - 1] = true;
+	}
+
+	rel = table_open(StatisticRelationId, AccessShareLock);
+	tuple = heap_form_tuple(RelationGetDescr(rel), values, nulls);
+	table_close(rel, NoLock);
+
+	return tuple;
+}
diff --git a/src/include/utils/selfuncs.h b/src/include/utils/selfuncs.h
index 8f3d73edfb2..49fc3d717b1 100644
--- a/src/include/utils/selfuncs.h
+++ b/src/include/utils/selfuncs.h
@@ -16,6 +16,7 @@
 #define SELFUNCS_H
 
 #include "access/htup.h"
+#include "catalog/pg_statistic.h"
 #include "fmgr.h"
 #include "nodes/pathnodes.h"
 
@@ -133,6 +134,24 @@ typedef struct
 	double		num_sa_scans;	/* # indexscans from ScalarArrayOpExprs */
 } GenericCosts;
 
+/* Single pg_statistic slot */
+typedef struct StatsSlot
+{
+	int16	kind;		/* stakindN: statistic kind (STATISTIC_KIND_XXX) */
+	Oid		opid;		/* staopN: associated operator, if needed */
+	Datum	numbers;	/* stanumbersN: float4 array of numbers */
+	Datum	values;		/* stavaluesN: anyarray of values */
+} StatsSlot;
+
+/* Deformed pg_statistic tuple */
+typedef struct StatsData
+{
+	float4		nullfrac;	/* stanullfrac: fraction of NULL values  */
+	float4		distinct;	/* stadistinct: number of distinct non-NULL values */
+	int32		width;		/* stawidth: average width in bytes of non-NULL values */
+	StatsSlot	slots[STATISTIC_NUM_SLOTS]; /* slots for different statistic types */
+} StatsData;
+
 /* Hooks for plugins to get control when we ask for stats */
 typedef bool (*get_relation_stats_hook_type) (PlannerInfo *root,
 											  RangeTblEntry *rte,
@@ -231,6 +250,9 @@ extern void genericcostestimate(PlannerInfo *root, IndexPath *path,
 								double loop_count,
 								GenericCosts *costs);
 
+extern HeapTuple stats_form_tuple(StatsData *data);
+
+
 /* Functions in array_selfuncs.c */
 
 extern Selectivity scalararraysel_containment(PlannerInfo *root,
-- 
2.25.1

0003-Add-symbolic-names-for-some-jsonb-operators-20220310.patchtext/x-patch; charset=UTF-8; name=0003-Add-symbolic-names-for-some-jsonb-operators-20220310.patchDownload

From 5fb7ee5ba041d8fc74a04cf40e9f3b1bd324610f Mon Sep 17 00:00:00 2001
From: Nikita Glukhov <n.gluhov@postgrespro.ru>
Date: Sat, 12 Nov 2016 18:59:43 +0300
Subject: [PATCH 3/6] Add symbolic names for some jsonb operators

---
 src/include/catalog/pg_operator.dat | 45 +++++++++++++++++------------
 1 file changed, 26 insertions(+), 19 deletions(-)

diff --git a/src/include/catalog/pg_operator.dat b/src/include/catalog/pg_operator.dat
index bc5f8213f3a..8e0e65ad275 100644
--- a/src/include/catalog/pg_operator.dat
+++ b/src/include/catalog/pg_operator.dat
@@ -2067,7 +2067,7 @@
 { oid => '1751', descr => 'negate',
   oprname => '-', oprkind => 'l', oprleft => '0', oprright => 'numeric',
   oprresult => 'numeric', oprcode => 'numeric_uminus' },
-{ oid => '1752', descr => 'equal',
+{ oid => '1752', oid_symbol => 'NumericEqOperator', descr => 'equal',
   oprname => '=', oprcanmerge => 't', oprcanhash => 't', oprleft => 'numeric',
   oprright => 'numeric', oprresult => 'bool', oprcom => '=(numeric,numeric)',
   oprnegate => '<>(numeric,numeric)', oprcode => 'numeric_eq',
@@ -2077,7 +2077,7 @@
   oprresult => 'bool', oprcom => '<>(numeric,numeric)',
   oprnegate => '=(numeric,numeric)', oprcode => 'numeric_ne',
   oprrest => 'neqsel', oprjoin => 'neqjoinsel' },
-{ oid => '1754', descr => 'less than',
+{ oid => '1754', oid_symbol => 'NumericLtOperator', descr => 'less than',
   oprname => '<', oprleft => 'numeric', oprright => 'numeric',
   oprresult => 'bool', oprcom => '>(numeric,numeric)',
   oprnegate => '>=(numeric,numeric)', oprcode => 'numeric_lt',
@@ -3172,70 +3172,77 @@
 { oid => '3967', descr => 'get value from json as text with path elements',
   oprname => '#>>', oprleft => 'json', oprright => '_text', oprresult => 'text',
   oprcode => 'json_extract_path_text' },
-{ oid => '3211', descr => 'get jsonb object field',
+{ oid => '3211', oid_symbol => 'JsonbObjectFieldOperator',
+  descr => 'get jsonb object field',
   oprname => '->', oprleft => 'jsonb', oprright => 'text', oprresult => 'jsonb',
   oprcode => 'jsonb_object_field' },
-{ oid => '3477', descr => 'get jsonb object field as text',
+{ oid => '3477', oid_symbol => 'JsonbObjectFieldTextOperator',
+  descr => 'get jsonb object field as text',
   oprname => '->>', oprleft => 'jsonb', oprright => 'text', oprresult => 'text',
   oprcode => 'jsonb_object_field_text' },
-{ oid => '3212', descr => 'get jsonb array element',
+{ oid => '3212', oid_symbol => 'JsonbArrayElementOperator',
+  descr => 'get jsonb array element',
   oprname => '->', oprleft => 'jsonb', oprright => 'int4', oprresult => 'jsonb',
   oprcode => 'jsonb_array_element' },
-{ oid => '3481', descr => 'get jsonb array element as text',
+{ oid => '3481', oid_symbol => 'JsonbArrayElementTextOperator',
+  descr => 'get jsonb array element as text',
   oprname => '->>', oprleft => 'jsonb', oprright => 'int4', oprresult => 'text',
   oprcode => 'jsonb_array_element_text' },
-{ oid => '3213', descr => 'get value from jsonb with path elements',
+{ oid => '3213', oid_symbol => 'JsonbExtractPathOperator',
+  descr => 'get value from jsonb with path elements',
   oprname => '#>', oprleft => 'jsonb', oprright => '_text',
   oprresult => 'jsonb', oprcode => 'jsonb_extract_path' },
-{ oid => '3206', descr => 'get value from jsonb as text with path elements',
+{ oid => '3206', oid_symbol => 'JsonbExtractPathTextOperator',
+  descr => 'get value from jsonb as text with path elements',
   oprname => '#>>', oprleft => 'jsonb', oprright => '_text',
   oprresult => 'text', oprcode => 'jsonb_extract_path_text' },
-{ oid => '3240', descr => 'equal',
+{ oid => '3240', oid_symbol => 'JsonbEqOperator', descr => 'equal',
   oprname => '=', oprcanmerge => 't', oprcanhash => 't', oprleft => 'jsonb',
   oprright => 'jsonb', oprresult => 'bool', oprcom => '=(jsonb,jsonb)',
   oprnegate => '<>(jsonb,jsonb)', oprcode => 'jsonb_eq', oprrest => 'eqsel',
   oprjoin => 'eqjoinsel' },
-{ oid => '3241', descr => 'not equal',
+{ oid => '3241', oid_symbol => 'JsonbNeOperator', descr => 'not equal',
   oprname => '<>', oprleft => 'jsonb', oprright => 'jsonb', oprresult => 'bool',
   oprcom => '<>(jsonb,jsonb)', oprnegate => '=(jsonb,jsonb)',
   oprcode => 'jsonb_ne', oprrest => 'neqsel', oprjoin => 'neqjoinsel' },
-{ oid => '3242', descr => 'less than',
+{ oid => '3242', oid_symbol => 'JsonbLtOperator', descr => 'less than',
   oprname => '<', oprleft => 'jsonb', oprright => 'jsonb', oprresult => 'bool',
   oprcom => '>(jsonb,jsonb)', oprnegate => '>=(jsonb,jsonb)',
   oprcode => 'jsonb_lt', oprrest => 'scalarltsel',
   oprjoin => 'scalarltjoinsel' },
-{ oid => '3243', descr => 'greater than',
+{ oid => '3243', oid_symbol => 'JsonbGtOperator', descr => 'greater than',
   oprname => '>', oprleft => 'jsonb', oprright => 'jsonb', oprresult => 'bool',
   oprcom => '<(jsonb,jsonb)', oprnegate => '<=(jsonb,jsonb)',
   oprcode => 'jsonb_gt', oprrest => 'scalargtsel',
   oprjoin => 'scalargtjoinsel' },
-{ oid => '3244', descr => 'less than or equal',
+{ oid => '3244', oid_symbol => 'JsonbLeOperator', descr => 'less than or equal',
   oprname => '<=', oprleft => 'jsonb', oprright => 'jsonb', oprresult => 'bool',
   oprcom => '>=(jsonb,jsonb)', oprnegate => '>(jsonb,jsonb)',
   oprcode => 'jsonb_le', oprrest => 'scalarlesel',
   oprjoin => 'scalarlejoinsel' },
-{ oid => '3245', descr => 'greater than or equal',
+{ oid => '3245', oid_symbol => 'JsonbGeOperator',
+  descr => 'greater than or equal',
   oprname => '>=', oprleft => 'jsonb', oprright => 'jsonb', oprresult => 'bool',
   oprcom => '<=(jsonb,jsonb)', oprnegate => '<(jsonb,jsonb)',
   oprcode => 'jsonb_ge', oprrest => 'scalargesel',
   oprjoin => 'scalargejoinsel' },
-{ oid => '3246', descr => 'contains',
+{ oid => '3246', oid_symbol => 'JsonbContainsOperator', descr => 'contains',
   oprname => '@>', oprleft => 'jsonb', oprright => 'jsonb', oprresult => 'bool',
   oprcom => '<@(jsonb,jsonb)', oprcode => 'jsonb_contains',
   oprrest => 'matchingsel', oprjoin => 'matchingjoinsel' },
-{ oid => '3247', descr => 'key exists',
+{ oid => '3247', oid_symbol => 'JsonbExistsOperator', descr => 'key exists',
   oprname => '?', oprleft => 'jsonb', oprright => 'text', oprresult => 'bool',
   oprcode => 'jsonb_exists', oprrest => 'matchingsel',
   oprjoin => 'matchingjoinsel' },
-{ oid => '3248', descr => 'any key exists',
+{ oid => '3248', oid_symbol => 'JsonbExistsAnyOperator', descr => 'any key exists',
   oprname => '?|', oprleft => 'jsonb', oprright => '_text', oprresult => 'bool',
   oprcode => 'jsonb_exists_any', oprrest => 'matchingsel',
   oprjoin => 'matchingjoinsel' },
-{ oid => '3249', descr => 'all keys exist',
+{ oid => '3249', oid_symbol => 'JsonbExistsAllOperator', descr => 'all keys exist',
   oprname => '?&', oprleft => 'jsonb', oprright => '_text', oprresult => 'bool',
   oprcode => 'jsonb_exists_all', oprrest => 'matchingsel',
   oprjoin => 'matchingjoinsel' },
-{ oid => '3250', descr => 'is contained by',
+{ oid => '3250', oid_symbol => 'JsonbContainedOperator', descr => 'is contained by',
   oprname => '<@', oprleft => 'jsonb', oprright => 'jsonb', oprresult => 'bool',
   oprcom => '@>(jsonb,jsonb)', oprcode => 'jsonb_contained',
   oprrest => 'matchingsel', oprjoin => 'matchingjoinsel' },
-- 
2.25.1

0004-Add-helper-jsonb-functions-and-macros-20220310.patchtext/x-patch; charset=UTF-8; name=0004-Add-helper-jsonb-functions-and-macros-20220310.patchDownload

From 6813b8ec7e8c45e04924d845c0f984d51ffc5f24 Mon Sep 17 00:00:00 2001
From: Nikita Glukhov <n.gluhov@postgrespro.ru>
Date: Fri, 16 Dec 2016 17:16:47 +0300
Subject: [PATCH 4/6] Add helper jsonb functions and macros

---
 src/backend/utils/adt/jsonb_util.c |  27 +++++
 src/backend/utils/adt/jsonfuncs.c  |  10 +-
 src/include/utils/jsonb.h          | 165 ++++++++++++++++++++++++++++-
 3 files changed, 195 insertions(+), 7 deletions(-)

diff --git a/src/backend/utils/adt/jsonb_util.c b/src/backend/utils/adt/jsonb_util.c
index 60442758b32..2715173fa37 100644
--- a/src/backend/utils/adt/jsonb_util.c
+++ b/src/backend/utils/adt/jsonb_util.c
@@ -389,6 +389,22 @@ findJsonbValueFromContainer(JsonbContainer *container, uint32 flags,
 	return NULL;
 }
 
+/*
+ * findJsonbValueFromContainer() wrapper that sets up JsonbValue key string.
+ */
+JsonbValue *
+findJsonbValueFromContainerLen(JsonbContainer *container, uint32 flags,
+							   const char *key, uint32 keylen)
+{
+	JsonbValue	k;
+
+	k.type = jbvString;
+	k.val.string.val = key;
+	k.val.string.len = keylen;
+
+	return findJsonbValueFromContainer(container, flags, &k);
+}
+
 /*
  * Find value by key in Jsonb object and fetch it into 'res', which is also
  * returned.
@@ -603,6 +619,17 @@ pushJsonbValue(JsonbParseState **pstate, JsonbIteratorToken seq,
 		return pushJsonbValueScalar(pstate, seq, jbval);
 	}
 
+	/*
+	 * XXX I'm not quite sure why we actually do this? Why do we need to change
+	 * how JsonbValue is converted to Jsonb for the statistics patch?
+	 */
+	/* push value from scalar container without its enclosing array */
+	if (*pstate && JsonbExtractScalar(jbval->val.binary.data, &v))
+	{
+		Assert(IsAJsonbScalar(&v));
+		return pushJsonbValueScalar(pstate, seq, &v);
+	}
+
 	/* unpack the binary and add each piece to the pstate */
 	it = JsonbIteratorInit(jbval->val.binary.data);
 
diff --git a/src/backend/utils/adt/jsonfuncs.c b/src/backend/utils/adt/jsonfuncs.c
index 29664aa6e40..5e7a2f27483 100644
--- a/src/backend/utils/adt/jsonfuncs.c
+++ b/src/backend/utils/adt/jsonfuncs.c
@@ -5261,7 +5261,8 @@ iterate_jsonb_values(Jsonb *jb, uint32 flags, void *state,
 		if (type == WJB_KEY)
 		{
 			if (flags & jtiKey)
-				action(state, v.val.string.val, v.val.string.len);
+				action(state, unconstify(char *, v.val.string.val),
+					   v.val.string.len);
 
 			continue;
 		}
@@ -5276,7 +5277,8 @@ iterate_jsonb_values(Jsonb *jb, uint32 flags, void *state,
 		{
 			case jbvString:
 				if (flags & jtiString)
-					action(state, v.val.string.val, v.val.string.len);
+					action(state, unconstify(char *, v.val.string.val),
+						   v.val.string.len);
 				break;
 			case jbvNumeric:
 				if (flags & jtiNumeric)
@@ -5398,7 +5400,9 @@ transform_jsonb_string_values(Jsonb *jsonb, void *action_state,
 	{
 		if ((type == WJB_VALUE || type == WJB_ELEM) && v.type == jbvString)
 		{
-			out = transform_action(action_state, v.val.string.val, v.val.string.len);
+			out = transform_action(action_state,
+								   unconstify(char *, v.val.string.val),
+								   v.val.string.len);
 			v.val.string.val = VARDATA_ANY(out);
 			v.val.string.len = VARSIZE_ANY_EXHDR(out);
 			res = pushJsonbValue(&st, type, type < WJB_BEGIN_ARRAY ? &v : NULL);
diff --git a/src/include/utils/jsonb.h b/src/include/utils/jsonb.h
index 4cbe6edf218..e1ed712f26c 100644
--- a/src/include/utils/jsonb.h
+++ b/src/include/utils/jsonb.h
@@ -14,6 +14,7 @@
 
 #include "lib/stringinfo.h"
 #include "utils/array.h"
+#include "utils/builtins.h"
 #include "utils/numeric.h"
 
 /* Tokens used when sequentially processing a jsonb value */
@@ -229,8 +230,7 @@ typedef struct
 #define JB_ROOT_IS_OBJECT(jbp_) ((*(uint32 *) VARDATA(jbp_) & JB_FOBJECT) != 0)
 #define JB_ROOT_IS_ARRAY(jbp_)	((*(uint32 *) VARDATA(jbp_) & JB_FARRAY) != 0)
 
-
-enum jbvType
+typedef enum jbvType
 {
 	/* Scalar types */
 	jbvNull = 0x0,
@@ -250,7 +250,7 @@ enum jbvType
 	 * into JSON strings when outputted to json/jsonb.
 	 */
 	jbvDatetime = 0x20,
-};
+} JsonbValueType;
 
 /*
  * JsonbValue:	In-memory representation of Jsonb.  This is a convenient
@@ -269,7 +269,7 @@ struct JsonbValue
 		struct
 		{
 			int			len;
-			char	   *val;	/* Not necessarily null-terminated */
+			const char *val;	/* Not necessarily null-terminated */
 		}			string;		/* String primitive type */
 
 		struct
@@ -382,6 +382,10 @@ extern int	compareJsonbContainers(JsonbContainer *a, JsonbContainer *b);
 extern JsonbValue *findJsonbValueFromContainer(JsonbContainer *sheader,
 											   uint32 flags,
 											   JsonbValue *key);
+extern JsonbValue *findJsonbValueFromContainerLen(JsonbContainer *container,
+												  uint32 flags,
+												  const char *key,
+												  uint32 keylen);
 extern JsonbValue *getKeyJsonValueFromContainer(JsonbContainer *container,
 												const char *keyVal, int keyLen,
 												JsonbValue *res);
@@ -412,4 +416,157 @@ extern Datum jsonb_set_element(Jsonb *jb, Datum *path, int path_len,
 							   JsonbValue *newval);
 extern Datum jsonb_get_element(Jsonb *jb, Datum *path, int npath,
 							   bool *isnull, bool as_text);
+
+/*
+ * XXX Not sure we want to add these functions to jsonb.h, which is the
+ * public API. Maybe it belongs rather to jsonb_typanalyze.c or elsewhere,
+ * closer to how it's used?
+ */
+
+/* helper inline functions for JsonbValue initialization */
+static inline JsonbValue *
+JsonValueInitObject(JsonbValue *val, int nPairs, int nPairsAllocated)
+{
+	val->type = jbvObject;
+	val->val.object.nPairs = nPairs;
+	val->val.object.pairs = nPairsAllocated ?
+							palloc(sizeof(JsonbPair) * nPairsAllocated) : NULL;
+
+	return val;
+}
+
+static inline JsonbValue *
+JsonValueInitArray(JsonbValue *val, int nElems, int nElemsAllocated,
+				   bool rawScalar)
+{
+	val->type = jbvArray;
+	val->val.array.nElems = nElems;
+	val->val.array.elems = nElemsAllocated ?
+							palloc(sizeof(JsonbValue) * nElemsAllocated) : NULL;
+	val->val.array.rawScalar = rawScalar;
+
+	return val;
+}
+
+static inline JsonbValue *
+JsonValueInitBinary(JsonbValue *val, Jsonb *jb)
+{
+	val->type = jbvBinary;
+	val->val.binary.data = &(jb)->root;
+	val->val.binary.len = VARSIZE_ANY_EXHDR(jb);
+	return val;
+}
+
+
+static inline JsonbValue *
+JsonValueInitString(JsonbValue *jbv, const char *str)
+{
+	jbv->type = jbvString;
+	jbv->val.string.len = strlen(str);
+	jbv->val.string.val = memcpy(palloc(jbv->val.string.len + 1), str,
+								 jbv->val.string.len + 1);
+	return jbv;
+}
+
+static inline JsonbValue *
+JsonValueInitStringWithLen(JsonbValue *jbv, const char *str, int len)
+{
+	jbv->type = jbvString;
+	jbv->val.string.val = str;
+	jbv->val.string.len = len;
+	return jbv;
+}
+
+static inline JsonbValue *
+JsonValueInitText(JsonbValue *jbv, text *txt)
+{
+	jbv->type = jbvString;
+	jbv->val.string.val = VARDATA_ANY(txt);
+	jbv->val.string.len = VARSIZE_ANY_EXHDR(txt);
+	return jbv;
+}
+
+static inline JsonbValue *
+JsonValueInitNumeric(JsonbValue *jbv, Numeric num)
+{
+	jbv->type = jbvNumeric;
+	jbv->val.numeric = num;
+	return jbv;
+}
+
+static inline JsonbValue *
+JsonValueInitInteger(JsonbValue *jbv, int64 i)
+{
+	jbv->type = jbvNumeric;
+	jbv->val.numeric = DatumGetNumeric(DirectFunctionCall1(
+											int8_numeric, Int64GetDatum(i)));
+	return jbv;
+}
+
+static inline JsonbValue *
+JsonValueInitFloat(JsonbValue *jbv, float4 f)
+{
+	jbv->type = jbvNumeric;
+	jbv->val.numeric = DatumGetNumeric(DirectFunctionCall1(
+											float4_numeric, Float4GetDatum(f)));
+	return jbv;
+}
+
+static inline JsonbValue *
+JsonValueInitDouble(JsonbValue *jbv, float8 f)
+{
+	jbv->type = jbvNumeric;
+	jbv->val.numeric = DatumGetNumeric(DirectFunctionCall1(
+											float8_numeric, Float8GetDatum(f)));
+	return jbv;
+}
+
+/* helper macros for jsonb building */
+#define pushJsonbKey(pstate, jbv, key) \
+		pushJsonbValue(pstate, WJB_KEY, JsonValueInitString(jbv, key))
+
+#define pushJsonbValueGeneric(Type, pstate, jbv, val) \
+		pushJsonbValue(pstate, WJB_VALUE, JsonValueInit##Type(jbv, val))
+
+#define pushJsonbElemGeneric(Type, pstate, jbv, val) \
+		pushJsonbValue(pstate, WJB_ELEM, JsonValueInit##Type(jbv, val))
+
+#define pushJsonbValueInteger(pstate, jbv, i) \
+		pushJsonbValueGeneric(Integer, pstate, jbv, i)
+
+#define pushJsonbValueFloat(pstate, jbv, f) \
+		pushJsonbValueGeneric(Float, pstate, jbv, f)
+
+#define pushJsonbElemFloat(pstate, jbv, f) \
+		pushJsonbElemGeneric(Float, pstate, jbv, f)
+
+#define pushJsonbElemString(pstate, jbv, txt) \
+		pushJsonbElemGeneric(String, pstate, jbv, txt)
+
+#define pushJsonbElemText(pstate, jbv, txt) \
+		pushJsonbElemGeneric(Text, pstate, jbv, txt)
+
+#define pushJsonbElemNumeric(pstate, jbv, num) \
+		pushJsonbElemGeneric(Numeric, pstate, jbv, num)
+
+#define pushJsonbElemInteger(pstate, jbv, num) \
+		pushJsonbElemGeneric(Integer, pstate, jbv, num)
+
+#define pushJsonbElemBinary(pstate, jbv, jbcont) \
+		pushJsonbElemGeneric(Binary, pstate, jbv, jbcont)
+
+#define pushJsonbKeyValueGeneric(Type, pstate, jbv, key, val) ( \
+		pushJsonbKey(pstate, jbv, key), \
+		pushJsonbValueGeneric(Type, pstate, jbv, val) \
+	)
+
+#define pushJsonbKeyValueString(pstate, jbv, key, val) \
+		pushJsonbKeyValueGeneric(String, pstate, jbv, key, val)
+
+#define pushJsonbKeyValueFloat(pstate, jbv, key, val) \
+		pushJsonbKeyValueGeneric(Float, pstate, jbv, key, val)
+
+#define pushJsonbKeyValueInteger(pstate, jbv, key, val) \
+		pushJsonbKeyValueGeneric(Integer, pstate, jbv, key, val)
+
 #endif							/* __JSONB_H__ */
-- 
2.25.1

0005-Export-scalarineqsel-20220310.patchtext/x-patch; charset=UTF-8; name=0005-Export-scalarineqsel-20220310.patchDownload

From f74f8951d35eaa872a2add0bf73d340c5a14e3bc Mon Sep 17 00:00:00 2001
From: Nikita Glukhov <n.gluhov@postgrespro.ru>
Date: Sat, 12 Nov 2016 19:01:43 +0300
Subject: [PATCH 5/6] Export scalarineqsel()

---
 src/backend/utils/adt/selfuncs.c | 2 +-
 src/include/utils/selfuncs.h     | 4 ++++
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index f534e862079..6ef31fdd79c 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -573,7 +573,7 @@ neqsel(PG_FUNCTION_ARGS)
  * it will return an approximate estimate based on assuming that the constant
  * value falls in the middle of the bin identified by binary search.
  */
-static double
+double
 scalarineqsel(PlannerInfo *root, Oid operator, bool isgt, bool iseq,
 			  Oid collation,
 			  VariableStatData *vardata, Datum constval, Oid consttype)
diff --git a/src/include/utils/selfuncs.h b/src/include/utils/selfuncs.h
index 49fc3d717b1..88a95401c5f 100644
--- a/src/include/utils/selfuncs.h
+++ b/src/include/utils/selfuncs.h
@@ -249,6 +249,10 @@ extern List *add_predicate_to_index_quals(IndexOptInfo *index,
 extern void genericcostestimate(PlannerInfo *root, IndexPath *path,
 								double loop_count,
 								GenericCosts *costs);
+extern double scalarineqsel(PlannerInfo *root, Oid operator, bool isgt,
+							bool iseq, Oid collation,
+							VariableStatData *vardata, Datum constval,
+							Oid consttype);
 
 extern HeapTuple stats_form_tuple(StatsData *data);
 
-- 
2.25.1

0006-Add-jsonb-statistics-20220310.patchtext/x-patch; charset=UTF-8; name=0006-Add-jsonb-statistics-20220310.patchDownload

From 768c3cc2e47a8a85098af4aad38c58156c00b374 Mon Sep 17 00:00:00 2001
From: Nikita Glukhov <n.gluhov@postgrespro.ru>
Date: Fri, 11 Mar 2022 01:06:50 +0300
Subject: [PATCH 6/6] Add jsonb statistics

---
 src/backend/catalog/system_functions.sql  |   36 +
 src/backend/catalog/system_views.sql      |   56 +
 src/backend/utils/adt/Makefile            |    2 +
 src/backend/utils/adt/jsonb_selfuncs.c    | 1582 ++++++++++++++++++++
 src/backend/utils/adt/jsonb_typanalyze.c  | 1627 +++++++++++++++++++++
 src/backend/utils/adt/jsonpath_exec.c     |    2 +-
 src/include/catalog/pg_operator.dat       |   17 +-
 src/include/catalog/pg_proc.dat           |   11 +
 src/include/catalog/pg_statistic.h        |    2 +
 src/include/catalog/pg_type.dat           |    2 +-
 src/include/utils/json_selfuncs.h         |  113 ++
 src/test/regress/expected/jsonb_stats.out |  713 +++++++++
 src/test/regress/expected/rules.out       |   32 +
 src/test/regress/parallel_schedule        |    2 +-
 src/test/regress/sql/jsonb_stats.sql      |  249 ++++
 15 files changed, 4435 insertions(+), 11 deletions(-)
 create mode 100644 src/backend/utils/adt/jsonb_selfuncs.c
 create mode 100644 src/backend/utils/adt/jsonb_typanalyze.c
 create mode 100644 src/include/utils/json_selfuncs.h
 create mode 100644 src/test/regress/expected/jsonb_stats.out
 create mode 100644 src/test/regress/sql/jsonb_stats.sql

diff --git a/src/backend/catalog/system_functions.sql b/src/backend/catalog/system_functions.sql
index 81bac6f5812..0b9f68e88ff 100644
--- a/src/backend/catalog/system_functions.sql
+++ b/src/backend/catalog/system_functions.sql
@@ -594,6 +594,42 @@ LANGUAGE internal
 STRICT IMMUTABLE PARALLEL SAFE
 AS 'unicode_is_normalized';
 
+-- XXX is this function immutable / parallel safe?
+-- XXX do we actually need to cast to text and then to jsonb?
+CREATE FUNCTION pg_json_path_stats(tab regclass, path_index integer) RETURNS text
+AS $$
+	SELECT jsonb_pretty((
+		CASE
+			WHEN stakind1 = 8 THEN stavalues1
+			WHEN stakind2 = 8 THEN stavalues2
+			WHEN stakind3 = 8 THEN stavalues3
+			WHEN stakind4 = 8 THEN stavalues4
+			WHEN stakind5 = 8 THEN stavalues5
+		END::text::jsonb[])[$2])
+	FROM pg_statistic
+	WHERE starelid = $1
+$$ LANGUAGE 'sql';
+
+-- XXX is this function immutable / parallel safe?
+-- XXX do we actually need to cast to text and then to jsonb?
+CREATE FUNCTION pg_json_path_stats(tab regclass, path text) RETURNS text
+AS $$
+	SELECT jsonb_pretty(pathstats)
+	FROM (
+		SELECT unnest(
+			CASE
+				WHEN stakind1 = 8 THEN stavalues1
+				WHEN stakind2 = 8 THEN stavalues2
+				WHEN stakind3 = 8 THEN stavalues3
+				WHEN stakind4 = 8 THEN stavalues4
+				WHEN stakind5 = 8 THEN stavalues5
+			END::text::jsonb[]) pathstats
+		FROM pg_statistic
+		WHERE starelid = $1
+	) paths
+	WHERE pathstats->>'path' = $2
+$$ LANGUAGE 'sql';
+
 --
 -- The default permissions for functions mean that anyone can execute them.
 -- A number of functions shouldn't be executable by just anyone, but rather
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 40b7bca5a96..9d32eeb4a3a 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -364,6 +364,62 @@ CREATE VIEW pg_stats_ext_exprs WITH (security_barrier) AS
 -- unprivileged users may read pg_statistic_ext but not pg_statistic_ext_data
 REVOKE ALL ON pg_statistic_ext_data FROM public;
 
+-- XXX This probably needs to do the same checks as pg_stats, i.e.
+--    WHERE NOT attisdropped
+--    AND has_column_privilege(c.oid, a.attnum, 'select')
+--    AND (c.relrowsecurity = false OR NOT row_security_active(c.oid));
+CREATE VIEW pg_stats_json AS
+	SELECT
+		nspname AS schemaname,
+		relname AS tablename,
+		attname AS attname,
+
+		path->>'path' AS json_path,
+
+		stainherit AS inherited,
+
+		(path->'json'->>'nullfrac')::float4 AS null_frac,
+		(path->'json'->>'width')::float4 AS avg_width,
+		(path->'json'->>'distinct')::float4 AS n_distinct,
+
+		ARRAY(SELECT val FROM jsonb_array_elements(
+				path->'json'->'mcv'->'values') val)::anyarray
+			AS most_common_vals,
+
+		ARRAY(SELECT num::text::float4 FROM jsonb_array_elements(
+				path->'json'->'mcv'->'numbers') num)
+			AS most_common_freqs,
+
+		ARRAY(SELECT val FROM jsonb_array_elements(
+				path->'json'->'histogram'->'values') val)
+			AS histogram_bounds,
+
+		ARRAY(SELECT val::text::int FROM jsonb_array_elements(
+				path->'array_length'->'mcv'->'values') val)
+			AS most_common_array_lengths,
+
+		ARRAY(SELECT num::text::float4 FROM jsonb_array_elements(
+				path->'array_length'->'mcv'->'numbers') num)
+			AS most_common_array_length_freqs,
+
+		(path->'json'->>'correlation')::float4 AS correlation
+
+	FROM
+		pg_statistic s JOIN pg_class c ON (c.oid = s.starelid)
+		JOIN pg_attribute a ON (c.oid = attrelid AND attnum = s.staattnum)
+		LEFT JOIN pg_namespace n ON (n.oid = c.relnamespace),
+		LATERAL (
+			SELECT unnest((CASE
+					WHEN stakind1 = 8 THEN stavalues1
+					WHEN stakind2 = 8 THEN stavalues2
+					WHEN stakind3 = 8 THEN stavalues3
+					WHEN stakind4 = 8 THEN stavalues4
+					WHEN stakind5 = 8 THEN stavalues5
+				END ::text::jsonb[])[2:]) AS path
+		) paths;
+
+-- no need to revoke any privileges, we've already revoked accss to pg_statistic
+
 CREATE VIEW pg_publication_tables AS
     SELECT
         P.pubname AS pubname,
diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile
index 41b486bceff..5e359ccf4fb 100644
--- a/src/backend/utils/adt/Makefile
+++ b/src/backend/utils/adt/Makefile
@@ -50,6 +50,8 @@ OBJS = \
 	jsonb.o \
 	jsonb_gin.o \
 	jsonb_op.o \
+	jsonb_selfuncs.o \
+	jsonb_typanalyze.o \
 	jsonb_util.o \
 	jsonfuncs.o \
 	jsonbsubs.o \
diff --git a/src/backend/utils/adt/jsonb_selfuncs.c b/src/backend/utils/adt/jsonb_selfuncs.c
new file mode 100644
index 00000000000..f5520f88a1d
--- /dev/null
+++ b/src/backend/utils/adt/jsonb_selfuncs.c
@@ -0,0 +1,1582 @@
+/*-------------------------------------------------------------------------
+ *
+ * jsonb_selfuncs.c
+ *	  Functions for selectivity estimation of jsonb operators
+ *
+ * Copyright (c) 2016-2022, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/adt/jsonb_selfuncs.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include <math.h>
+
+#include "fmgr.h"
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "catalog/pg_operator.h"
+#include "catalog/pg_statistic.h"
+#include "catalog/pg_type.h"
+#include "common/string.h"
+#include "nodes/primnodes.h"
+#include "utils/builtins.h"
+#include "utils/json.h"
+#include "utils/jsonb.h"
+#include "utils/json_selfuncs.h"
+#include "utils/lsyscache.h"
+#include "utils/rel.h"
+#include "utils/selfuncs.h"
+
+#define DEFAULT_JSON_CONTAINS_SEL	0.001
+
+/*
+ * jsonGetField
+ *		Given a JSONB document and a key, extract the JSONB value for the key.
+ */
+static inline Datum
+jsonGetField(Datum obj, const char *field)
+{
+	Jsonb 	   *jb = DatumGetJsonbP(obj);
+	JsonbValue *jbv = findJsonbValueFromContainerLen(&jb->root, JB_FOBJECT,
+													 field, strlen(field));
+	return jbv ? JsonbPGetDatum(JsonbValueToJsonb(jbv)) : PointerGetDatum(NULL);
+}
+
+/*
+ * jsonGetFloat4
+ *		Given a JSONB value, interpret it as a float4 value.
+ *
+ * This expects the JSONB value to be a numeric, because that's how we store
+ * floats in JSONB, and we cast it to float4.
+ */
+static inline float4
+jsonGetFloat4(Datum jsonb, float4 default_val)
+{
+	Jsonb	   *jb;
+	JsonbValue	jv;
+
+	if (!DatumGetPointer(jsonb))
+		return default_val;
+
+	jb = DatumGetJsonbP(jsonb);
+
+	if (!JsonbExtractScalar(&jb->root, &jv) || jv.type != jbvNumeric)
+		return default_val;
+
+	return DatumGetFloat4(DirectFunctionCall1(numeric_float4,
+											  NumericGetDatum(jv.val.numeric)));
+}
+
+/*
+ * jsonStatsInit
+ *		Given a pg_statistic tuple, expand STATISTIC_KIND_JSON into JsonStats.
+ */
+bool
+jsonStatsInit(JsonStats data, const VariableStatData *vardata)
+{
+	Jsonb	   *jb;
+	JsonbValue	prefix;
+
+	if (!vardata->statsTuple)
+		return false;
+
+	data->statsTuple = vardata->statsTuple;
+	memset(&data->attslot, 0, sizeof(data->attslot));
+
+	/* Were there just NULL values in the column? No JSON stats, but still useful. */
+	if (((Form_pg_statistic) GETSTRUCT(data->statsTuple))->stanullfrac >= 1.0)
+	{
+		data->nullfrac = 1.0;
+		return true;
+	}
+
+	/* Do we have the JSON stats built in the pg_statistic? */
+	if (!get_attstatsslot(&data->attslot, data->statsTuple,
+						  STATISTIC_KIND_JSON, InvalidOid,
+						  ATTSTATSSLOT_NUMBERS | ATTSTATSSLOT_VALUES))
+		return false;
+
+	/*
+	 * Valid JSON stats should have at least 2 elements in values:
+	 *  0th - root path prefix
+	 *  1st - root path stats
+	 */
+	if (data->attslot.nvalues < 2)
+	{
+		free_attstatsslot(&data->attslot);
+		return false;
+	}
+
+	/* XXX If the ACL check was not OK, would we even get here? */
+	data->acl_ok = vardata->acl_ok;
+	data->rel = vardata->rel;
+	data->nullfrac =
+		data->attslot.nnumbers > 0 ? data->attslot.numbers[0] : 0.0;
+	data->pathdatums = data->attslot.values + 1;
+	data->npaths = data->attslot.nvalues - 1;
+
+	/* Extract root path prefix */
+	jb = DatumGetJsonbP(data->attslot.values[0]);
+	if (!JsonbExtractScalar(&jb->root, &prefix) || prefix.type != jbvString)
+	{
+		free_attstatsslot(&data->attslot);
+		return false;
+	}
+
+	data->prefix = prefix.val.string.val;
+	data->prefixlen = prefix.val.string.len;
+
+	/* Create path cache, initialze only two fields that acting as flags */
+	data->paths = palloc(sizeof(*data->paths) * data->npaths);
+
+	for (int i = 0; i < data->npaths; i++)
+	{
+		data->paths[i].data = NULL;
+		data->paths[i].path = NULL;
+	}
+
+	return true;
+}
+
+/*
+ * jsonStatsRelease
+ *		Release resources (statistics slot) associated with the JsonStats value.
+ */
+void
+jsonStatsRelease(JsonStats data)
+{
+	free_attstatsslot(&data->attslot);
+}
+
+/*
+ * jsonPathStatsAllocSpecialStats
+ *		Allocate a copy of JsonPathStats for accessing special (length etc.)
+ *		stats for a given JSON path.
+ */
+static JsonPathStats
+jsonPathStatsAllocSpecialStats(JsonPathStats pstats, JsonPathStatsType type)
+{
+	JsonPathStats stats;
+
+	if (!pstats)
+		return NULL;
+
+	/* copy and replace stats type */
+	stats = palloc(sizeof(*stats));
+	*stats = *pstats;
+	stats->type = type;
+
+	return stats;
+}
+
+/*
+ * jsonPathStatsGetArrayLengthStats
+ *		Extract statistics of array lengths for the path.
+ */
+JsonPathStats
+jsonPathStatsGetArrayLengthStats(JsonPathStats pstats)
+{
+	/*
+	 * The array length statistics is relevant only for values that are arrays.
+	 * So if we observed no such values, we know there can't be such
+	 * statistics and so we simply return NULL.
+	 */
+	if (jsonPathStatsGetTypeFreq(pstats, jbvArray, 0.0) <= 0.0)
+		return NULL;
+
+	return jsonPathStatsAllocSpecialStats(pstats, JsonPathStatsArrayLength);
+}
+
+/*
+ * jsonPathStatsGetObjectLengthStats
+ *		Extract statistics of object length for the path.
+ */
+JsonPathStats
+jsonPathStatsGetObjectLengthStats(JsonPathStats pstats)
+{
+	/*
+	 * The object length statistics is relevant only for values that are arrays.
+	 * So if we observed no such values, we know there can't be such
+	 * statistics and so we simply return NULL.
+	 */
+	if (jsonPathStatsGetTypeFreq(pstats, jbvObject, 0.0) <= 0.0)
+		return NULL;
+
+	return jsonPathStatsAllocSpecialStats(pstats, JsonPathStatsObjectLength);
+}
+
+/*
+ * jsonPathStatsGetPath
+ *		Try to use cached path name or extract it from per-path stats datum.
+ *
+ * Returns true on succces, false on error.
+ */
+static inline bool
+jsonPathStatsGetPath(JsonPathStats stats, Datum pathdatum,
+					 const char **path, int *pathlen)
+{
+	*path = stats->path;
+
+	if (*path)
+		/* use cached path name */
+		*pathlen = stats->pathlen;
+	else
+	{
+		Jsonb	   *jsonb = DatumGetJsonbP(pathdatum);
+		JsonbValue	pathkey;
+		JsonbValue *pathval;
+
+		/* extract path from the statistics represented as jsonb document */
+		JsonValueInitStringWithLen(&pathkey, "path", 4);
+		pathval = findJsonbValueFromContainer(&jsonb->root, JB_FOBJECT, &pathkey);
+
+		if (!pathval || pathval->type != jbvString)
+			return false;	/* XXX invalid stats data, maybe throw error */
+
+		/* cache extracted path name */
+		*path = stats->path = pathval->val.string.val;
+		*pathlen = stats->pathlen = pathval->val.string.len;
+	}
+
+	return true;
+}
+
+/* Context for bsearch()ing paths */
+typedef struct JsonPathStatsSearchContext
+{
+	JsonStats	stats;
+	const char *path;
+	int			pathlen;
+} JsonPathStatsSearchContext;
+
+/*
+ * jsonPathStatsCompare
+ *		Compare two JsonPathStats structs, so that we can sort them.
+ *
+ * We do this so that we can search for stats for a given path simply by
+ * bsearch().
+ *
+ * XXX We never build two structs for the same path, so we know the paths
+ * are different - one may be a prefix of the other, but then we sort the
+ * strings by length.
+ */
+static int
+jsonPathStatsCompare(const void *pv1, const void *pv2)
+{
+	JsonPathStatsSearchContext const *cxt = pv1;
+	Datum const *pathdatum = (Datum const *) pv2;
+	int			index = pathdatum - cxt->stats->pathdatums;
+	JsonPathStats stats = &cxt->stats->paths[index];
+	const char *path;
+	int			pathlen;
+	int			res;
+
+	if (!jsonPathStatsGetPath(stats, *pathdatum, &path, &pathlen))
+		return 1;	/* XXX invalid stats data */
+
+	/* compare the shared part first, then compare by length */
+	res = strncmp(cxt->path, path, Min(cxt->pathlen, pathlen));
+
+	return res ? res : cxt->pathlen - pathlen;
+}
+
+/*
+ * jsonStatsFindPath
+ *		Find stats for a given path.
+ *
+ * The stats are sorted by path, so we can simply do bsearch().
+ * This is low-level function and jsdata->prefix is not considered, the caller
+ * should handle it by itself.
+ */
+static JsonPathStats
+jsonStatsFindPath(JsonStats jsdata, const char *path, int pathlen)
+{
+	JsonPathStatsSearchContext cxt;
+	JsonPathStats stats;
+	Datum	   *pdatum;
+	int			index;
+
+	cxt.stats = jsdata;
+	cxt.path = path;
+	cxt.pathlen = pathlen;
+
+	pdatum = bsearch(&cxt, jsdata->pathdatums, jsdata->npaths,
+					 sizeof(*jsdata->pathdatums), jsonPathStatsCompare);
+
+	if (!pdatum)
+		return NULL;
+
+	index = pdatum - jsdata->pathdatums;
+	stats = &jsdata->paths[index];
+
+	Assert(stats->path);
+	Assert(stats->pathlen == pathlen);
+
+	/* Init all fields if needed (stats->data == NULL means uninitialized) */
+	if (!stats->data)
+	{
+		stats->data = jsdata;
+		stats->datum = pdatum;
+		stats->type = JsonPathStatsValues;
+	}
+
+	return stats;
+}
+
+/*
+ * jsonStatsGetPathByStr
+ *		Find stats for a given path string considering jsdata->prefix.
+ */
+JsonPathStats
+jsonStatsGetPathByStr(JsonStats jsdata, const char *subpath, int subpathlen)
+{
+	JsonPathStats stats;
+	char	   *path;
+	int			pathlen;
+
+	if (jsdata->nullfrac >= 1.0)
+		return NULL;
+
+	pathlen = jsdata->prefixlen + subpathlen - 1;
+	path = palloc(pathlen);
+
+	memcpy(path, jsdata->prefix, jsdata->prefixlen);
+	memcpy(&path[jsdata->prefixlen], &subpath[1], subpathlen - 1);
+
+	stats = jsonStatsFindPath(jsdata, path, pathlen);
+
+	if (!stats)
+		pfree(path);
+
+	return stats;
+}
+
+/*
+ * jsonStatsGetRootPath
+ *		Find JSON stats for root prefix path.
+ */
+static JsonPathStats
+jsonStatsGetRootPath(JsonStats jsdata)
+{
+	if (jsdata->nullfrac >= 1.0)
+		return NULL;
+
+	return jsonStatsFindPath(jsdata, jsdata->prefix, jsdata->prefixlen);
+}
+
+#define jsonStatsGetRootArrayPath(jsdata) \
+		jsonStatsGetPathByStr(jsdata, JSON_PATH_ROOT_ARRAY, JSON_PATH_ROOT_ARRAY_LEN)
+
+/*
+ * jsonPathAppendEntry
+ *		Append entry (represented as simple string) to a path.
+ *
+ * NULL entry is treated as wildcard array accessor "[*]".
+ */
+void
+jsonPathAppendEntry(StringInfo path, const char *entry)
+{
+	if (entry)
+	{
+		appendStringInfoCharMacro(path, '.');
+		escape_json(path, entry);
+	}
+	else
+		appendStringInfoString(path, "[*]");
+}
+
+/*
+ * jsonPathAppendEntryWithLen
+ *		Append string (represented as string + length) to a path.
+ */
+static void
+jsonPathAppendEntryWithLen(StringInfo path, const char *entry, int len)
+{
+	char *tmpentry = pnstrdup(entry, len);
+	jsonPathAppendEntry(path, tmpentry);
+	pfree(tmpentry);
+}
+
+/*
+ * jsonPathStatsGetSubpath
+ *		Find JSON path stats for object key or array elements (if 'key' = NULL).
+ */
+JsonPathStats
+jsonPathStatsGetSubpath(JsonPathStats pstats, const char *key)
+{
+	JsonPathStats spstats;
+	StringInfoData str;
+
+	initStringInfo(&str);
+	appendBinaryStringInfo(&str, pstats->path, pstats->pathlen);
+	jsonPathAppendEntry(&str, key);
+
+	spstats = jsonStatsFindPath(pstats->data, str.data, str.len);
+	if (!spstats)
+		pfree(str.data);
+
+	return spstats;
+}
+
+/*
+ * jsonPathStatsGetArrayIndexSelectivity
+ *		Given stats for a path, determine selectivity for an array index.
+ */
+Selectivity
+jsonPathStatsGetArrayIndexSelectivity(JsonPathStats pstats, int index)
+{
+	JsonPathStats lenstats = jsonPathStatsGetArrayLengthStats(pstats);
+	JsonbValue	tmpjbv;
+	Jsonb	   *jb;
+
+	/*
+	 * If we have no array length stats, assume all documents match.
+	 *
+	 * XXX Shouldn't this use a default smaller than 1.0? What do the selfuncs
+	 * for regular arrays use?
+	 */
+	if (!lenstats)
+		return 1.0;
+
+	jb = JsonbValueToJsonb(JsonValueInitInteger(&tmpjbv, index));
+
+	/* calculate fraction of elements smaller than the index */
+	return jsonSelectivity(lenstats, JsonbPGetDatum(jb), JsonbGtOperator);
+}
+
+/*
+ * jsonStatsGetPath
+ *		Find JSON statistics for a given path.
+ *
+ * 'path' is an array of text datums of length 'pathlen' (can be zero).
+ */
+static JsonPathStats
+jsonStatsGetPath(JsonStats jsdata, Datum *path, int pathlen,
+				 bool try_arrays_indexes, float4 *nullfrac)
+{
+	JsonPathStats pstats = jsonStatsGetRootPath(jsdata);
+	Selectivity	sel = 1.0;
+
+	for (int i = 0; pstats && i < pathlen; i++)
+	{
+		char	   *key = TextDatumGetCString(path[i]);
+		char	   *tail;
+		int			index;
+
+		if (!try_arrays_indexes)
+		{
+			/* Find object key stats */
+			pstats = jsonPathStatsGetSubpath(pstats, key);
+			pfree(key);
+			continue;
+		}
+
+		/* Try to interpret path entry as integer array index */
+		errno = 0;
+		index = strtoint(key, &tail, 10);
+
+		if (tail == key || *tail != '\0' || errno != 0)
+		{
+			/* Find object key stats */
+			pstats = jsonPathStatsGetSubpath(pstats, key);
+		}
+		else
+		{
+			/* Find array index stats */
+			/* FIXME consider object key "index" also */
+			JsonPathStats arrstats = jsonPathStatsGetSubpath(pstats, NULL);
+
+			if (arrstats)
+			{
+				float4		arrfreq = jsonPathStatsGetFreq(pstats, 0.0);
+
+				sel *= jsonPathStatsGetArrayIndexSelectivity(pstats, index);
+
+				if (arrfreq > 0.0)
+					sel /= arrfreq;
+			}
+
+			pstats = arrstats;
+		}
+
+		pfree(key);
+	}
+
+	*nullfrac = 1.0 - sel;
+
+	return pstats;
+}
+
+/*
+ * jsonPathStatsGetNextSubpathStats
+ *		Iterate all collected subpaths of a given path.
+ *
+ * This function can be useful for estimation of selectivity of jsonpath
+ * '.*' and  '.**' operators.
+ *
+ * The next found subpath is written into *pkeystats, which should be set to
+ * NULL before the first call.
+ *
+ * If keysOnly is true, emit only top-level object-key subpaths.
+ *
+ * Returns false on the end of iteration and true otherwise.
+ */
+bool
+jsonPathStatsGetNextSubpathStats(JsonPathStats stats, JsonPathStats *pkeystats,
+								 bool keysOnly)
+{
+	JsonPathStats keystats = *pkeystats;
+	/* compute next index */
+	int			index =
+		(keystats ? keystats->datum : stats->datum) - stats->data->pathdatums + 1;
+
+	if (stats->type != JsonPathStatsValues)
+		return false;	/* length stats doe not have subpaths */
+
+	for (; index < stats->data->npaths; index++)
+	{
+		Datum	   *pathdatum = &stats->data->pathdatums[index];
+		const char *path;
+		int			pathlen;
+
+		keystats = &stats->data->paths[index];
+
+		if (!jsonPathStatsGetPath(keystats, *pathdatum, &path, &pathlen))
+			break;	/* invalid path stats */
+
+		/* Break, if subpath does not start from a desired prefix */
+		if (pathlen <= stats->pathlen ||
+			memcmp(path, stats->path, stats->pathlen))
+			break;
+
+		if (keysOnly)
+		{
+			const char *c = &path[stats->pathlen];
+
+			if (*c == '[')
+			{
+				Assert(c[1] == '*' && c[2] == ']');
+
+#if 0	/* TODO add separate flag for requesting top-level array accessors */
+				/* skip if it is not last key in the path */
+				if (pathlen > stats->pathlen + 3)
+#endif
+					continue;	/* skip array accessors */
+			}
+			else if (*c == '.')
+			{
+				/* find end of '."key"' */
+				const char *pathend = path + pathlen - 1;
+
+				if (++c >= pathend || *c != '"')
+					break;		/* invalid path */
+
+				while (++c <= pathend && *c != '"')
+					if (*c == '\\')	/* handle escaped chars */
+						c++;
+
+				if (c > pathend)
+					break;		/* invalid path */
+
+				/* skip if it is not last key in the path */
+				if (c < pathend)
+					continue;
+			}
+			else
+				continue;	/* invalid path */
+		}
+
+		/* Init path stats if needed */
+		if (!keystats->data)
+		{
+			keystats->data = stats->data;
+			keystats->datum = pathdatum;
+			keystats->type = JsonPathStatsValues;
+		}
+
+		*pkeystats = keystats;
+
+		return true;
+	}
+
+	return false;
+}
+
+/*
+ * jsonStatsConvertArray
+ *		Convert a JSONB array into an array of some regular data type.
+ *
+ * The "type" identifies what elements are in the input JSONB array, while
+ * typid determines the target type.
+ */
+static Datum
+jsonStatsConvertArray(Datum jsonbValueArray, JsonStatType type, Oid typid,
+					  float4 multiplier)
+{
+	Datum	   *values;
+	Jsonb	   *jbvals;
+	JsonbValue	jbv;
+	JsonbIterator *it;
+	JsonbIteratorToken r;
+	int			nvalues;
+	int			i;
+	int16		typlen;
+	bool		typbyval;
+	char		typalign;
+
+	if (!DatumGetPointer(jsonbValueArray))
+		return PointerGetDatum(NULL);
+
+	jbvals = DatumGetJsonbP(jsonbValueArray);
+
+	nvalues = JsonContainerSize(&jbvals->root);
+
+	values = palloc(sizeof(Datum) * nvalues);
+
+	for (i = 0, it = JsonbIteratorInit(&jbvals->root);
+		(r = JsonbIteratorNext(&it, &jbv, true)) != WJB_DONE;)
+	{
+		if (r == WJB_ELEM)
+		{
+			Datum value;
+
+			switch (type)
+			{
+				case JsonStatJsonb:
+				case JsonStatJsonbWithoutSubpaths:
+					value = JsonbPGetDatum(JsonbValueToJsonb(&jbv));
+					break;
+
+				case JsonStatText:
+				case JsonStatString:
+					Assert(jbv.type == jbvString);
+					value = PointerGetDatum(
+								cstring_to_text_with_len(jbv.val.string.val,
+														 jbv.val.string.len));
+					break;
+
+				case JsonStatNumeric:
+					Assert(jbv.type == jbvNumeric);
+					value = NumericGetDatum(jbv.val.numeric);
+					break;
+
+				case JsonStatFloat4:
+					Assert(jbv.type == jbvNumeric);
+					value = DirectFunctionCall1(numeric_float4,
+												NumericGetDatum(jbv.val.numeric));
+					value = Float4GetDatum(DatumGetFloat4(value) * multiplier);
+					break;
+
+				default:
+					elog(ERROR, "invalid json stat type %d", type);
+					value = (Datum) 0;
+					break;
+			}
+
+			Assert(i < nvalues);
+			values[i++] = value;
+		}
+	}
+
+	Assert(i == nvalues);
+
+	get_typlenbyvalalign(typid, &typlen, &typbyval, &typalign);
+
+	return PointerGetDatum(
+		construct_array(values, nvalues, typid, typlen, typbyval, typalign));
+}
+
+/*
+ * jsonPathStatsExtractData
+ *		Extract pg_statistics values from statistics for a single path.
+ *
+ * Extract ordinary MCV, Histogram, Correlation slots for a requested stats
+ * type. If requested stats for JSONB, include also transformed JSON slot for
+ * a path and possibly for its subpaths.
+ */
+static bool
+jsonPathStatsExtractData(JsonPathStats pstats, JsonStatType stattype,
+						 float4 nullfrac, StatsData *statdata)
+{
+	Datum		data;
+	Datum		nullf;
+	Datum		dist;
+	Datum		width;
+	Datum		mcv;
+	Datum		hst;
+	Datum		corr;
+	Oid			type;
+	Oid			eqop;
+	Oid			ltop;
+	const char *key;
+	StatsSlot  *slot = statdata->slots;
+
+	nullfrac = 1.0 - (1.0 - pstats->data->nullfrac) * (1.0 - nullfrac);
+
+	/*
+	 * Depending on requested statistics type, select:
+	 *	- stavalues data type
+	 *	- corresponding eq/lt operators
+	 *	- JSONB field, containing stats slots for this statistics type
+	 */
+	switch (stattype)
+	{
+		case JsonStatJsonb:
+		case JsonStatJsonbWithoutSubpaths:
+			key = pstats->type == JsonPathStatsArrayLength ? "array_length" :
+				  pstats->type == JsonPathStatsObjectLength ? "object_length" :
+				  "json";
+			type = JSONBOID;
+			eqop = JsonbEqOperator;
+			ltop = JsonbLtOperator;
+			break;
+		case JsonStatText:
+			key = "text";
+			type = TEXTOID;
+			eqop = TextEqualOperator;
+			ltop = TextLessOperator;
+			break;
+		case JsonStatString:
+			key = "string";
+			type = TEXTOID;
+			eqop = TextEqualOperator;
+			ltop = TextLessOperator;
+			break;
+		case JsonStatNumeric:
+			key = "numeric";
+			type = NUMERICOID;
+			eqop = NumericEqOperator;
+			ltop = NumericLtOperator;
+			break;
+		case JsonStatFloat4:	/* special internal stats type */
+		default:
+			elog(ERROR, "invalid json statistic type %d", stattype);
+			break;
+	}
+
+	/* Extract object containing slots */
+	data = jsonGetField(*pstats->datum, key);
+
+	if (!DatumGetPointer(data))
+		return false;
+
+	nullf = jsonGetField(data, "nullfrac");
+	dist = jsonGetField(data, "distinct");
+	width = jsonGetField(data, "width");
+	mcv = jsonGetField(data, "mcv");
+	hst = jsonGetField(data, "histogram");
+	corr = jsonGetField(data, "correlation");
+
+	statdata->nullfrac = jsonGetFloat4(nullf, 0);
+	statdata->distinct = jsonGetFloat4(dist, 0);
+	statdata->width = (int32) jsonGetFloat4(width, 0);
+
+	statdata->nullfrac += (1.0 - statdata->nullfrac) * nullfrac;
+
+	/* Include MCV slot if exists */
+	if (DatumGetPointer(mcv))
+	{
+		slot->kind = STATISTIC_KIND_MCV;
+		slot->opid = eqop;
+		slot->numbers = jsonStatsConvertArray(jsonGetField(mcv, "numbers"),
+											  JsonStatFloat4, FLOAT4OID,
+											  1.0 - nullfrac);
+		slot->values  = jsonStatsConvertArray(jsonGetField(mcv, "values"),
+											  stattype, type, 0);
+		slot++;
+	}
+
+	/* Include Histogram slot if exists */
+	if (DatumGetPointer(hst))
+	{
+		slot->kind = STATISTIC_KIND_HISTOGRAM;
+		slot->opid = ltop;
+		slot->numbers = jsonStatsConvertArray(jsonGetField(hst, "numbers"),
+											  JsonStatFloat4, FLOAT4OID, 1.0);
+		slot->values  = jsonStatsConvertArray(jsonGetField(hst, "values"),
+											  stattype, type, 0);
+		slot++;
+	}
+
+	/* Include Correlation slot if exists */
+	if (DatumGetPointer(corr))
+	{
+		Datum		correlation = Float4GetDatum(jsonGetFloat4(corr, 0));
+
+		slot->kind = STATISTIC_KIND_CORRELATION;
+		slot->opid = ltop;
+		slot->numbers = PointerGetDatum(construct_array(&correlation, 1,
+														FLOAT4OID, 4, true,
+														'i'));
+		slot++;
+	}
+
+	/* Include JSON statistics for a given path and possibly for its subpaths */
+	if ((stattype == JsonStatJsonb ||
+		 stattype == JsonStatJsonbWithoutSubpaths) &&
+		jsonAnalyzeBuildSubPathsData(pstats->data->pathdatums,
+									 pstats->data->npaths,
+									 pstats->datum - pstats->data->pathdatums,
+									 pstats->path,
+									 pstats->pathlen,
+									 stattype == JsonStatJsonb,
+									 nullfrac,
+									 &slot->values,
+									 &slot->numbers))
+	{
+		slot->kind = STATISTIC_KIND_JSON;
+		slot++;
+	}
+
+	return true;
+}
+
+static float4
+jsonPathStatsGetFloat(JsonPathStats pstats, const char *key, float4 defaultval)
+{
+	if (!pstats)
+		return defaultval;
+
+	return jsonGetFloat4(jsonGetField(*pstats->datum, key), defaultval);
+}
+
+float4
+jsonPathStatsGetFreq(JsonPathStats pstats, float4 defaultfreq)
+{
+	return jsonPathStatsGetFloat(pstats, "freq", defaultfreq);
+}
+
+float4
+jsonPathStatsGetAvgArraySize(JsonPathStats pstats)
+{
+	return jsonPathStatsGetFloat(pstats, "avg_array_length", 1.0);
+}
+
+/*
+ * jsonPathStatsGetTypeFreq
+ *		Get frequency of different JSON object types for a given path.
+ *
+ * JSON documents don't have any particular schema, and the same path may point
+ * to values with different types in multiple documents. Consider for example
+ * two documents {"a" : "b"} and {"a" : 100} which have both a string and int
+ * for the same path. So we track the frequency of different JSON types for
+ * each path, so that we can consider this later.
+ */
+float4
+jsonPathStatsGetTypeFreq(JsonPathStats pstats, JsonbValueType type,
+						 float4 defaultfreq)
+{
+	const char *key;
+
+	if (!pstats)
+		return defaultfreq;
+
+	/*
+	 * When dealing with (object/array) length stats, we only really care about
+	 * objects and arrays.
+	 *
+	 * Lengths are always numeric, so simply return 0 if requested frequency
+	 * of non-numeric values.
+	 */
+	if (pstats->type == JsonPathStatsArrayLength)
+	{
+		if (type != jbvNumeric)
+			return 0.0;
+
+		return jsonPathStatsGetFloat(pstats, "freq_array", defaultfreq);
+	}
+
+	if (pstats->type == JsonPathStatsObjectLength)
+	{
+		if (type != jbvNumeric)
+			return 0.0;
+
+		return jsonPathStatsGetFloat(pstats, "freq_object", defaultfreq);
+	}
+
+	/* Which JSON type are we interested in? Pick the right freq_type key. */
+	switch (type)
+	{
+		case jbvNull:
+			key = "freq_null";
+			break;
+		case jbvString:
+			key = "freq_string";
+			break;
+		case jbvNumeric:
+			key = "freq_numeric";
+			break;
+		case jbvBool:
+			key = "freq_boolean";
+			break;
+		case jbvObject:
+			key = "freq_object";
+			break;
+		case jbvArray:
+			key = "freq_array";
+			break;
+		default:
+			elog(ERROR, "Invalid jsonb value type: %d", type);
+			break;
+	}
+
+	return jsonPathStatsGetFloat(pstats, key, defaultfreq);
+}
+
+/*
+ * jsonPathStatsFormTuple
+ *		For a pg_statistic tuple representing JSON statistics.
+ *
+ * XXX Maybe it's a bit expensive to first build StatsData and then transform it
+ * again while building the tuple. Could it be done in a single step? Would it be
+ * more efficient? Not sure how expensive it actually is, though.
+ */
+static HeapTuple
+jsonPathStatsFormTuple(JsonPathStats pstats, JsonStatType type, float4 nullfrac)
+{
+	StatsData	statdata;
+
+	if (!pstats || !pstats->datum)
+		return NULL;
+
+	/*
+	 * If it is the ordinary root path stats, there is no need to transform
+	 * the tuple, it can be simply copied.
+	 */
+	if (pstats->datum == &pstats->data->pathdatums[0] &&
+		pstats->type == JsonPathStatsValues)
+		return heap_copytuple(pstats->data->statsTuple);
+
+	MemSet(&statdata, 0, sizeof(statdata));
+
+	if (!jsonPathStatsExtractData(pstats, type, nullfrac, &statdata))
+		return NULL;
+
+	return stats_form_tuple(&statdata);
+}
+
+/*
+ * jsonStatsGetPathTuple
+ *		Extract JSON statistics for a text[] path and form pg_statistics tuple.
+ */
+static HeapTuple
+jsonStatsGetPathTuple(JsonStats jsdata, JsonStatType type,
+					  Datum *path, int pathlen, bool try_arrays_indexes)
+{
+	float4			nullfrac;
+	JsonPathStats	pstats = jsonStatsGetPath(jsdata, path, pathlen,
+											  try_arrays_indexes, &nullfrac);
+
+	return jsonPathStatsFormTuple(pstats, type, nullfrac);
+}
+
+/*
+ * jsonStatsGetArrayIndexStatsTuple
+ *		Extract JSON statistics for a array index and form pg_statistics tuple.
+ */
+static HeapTuple
+jsonStatsGetArrayIndexStatsTuple(JsonStats jsdata, JsonStatType type, int32 index)
+{
+	/* Extract statistics for root array elements */
+	JsonPathStats arrstats = jsonStatsGetRootArrayPath(jsdata);
+	JsonPathStats rootstats;
+	Selectivity	index_sel;
+
+	if (!arrstats)
+		return NULL;
+
+	/* Compute relative selectivity of 'EXISTS($[index])' */
+	rootstats = jsonStatsGetRootPath(jsdata);
+	index_sel = jsonPathStatsGetArrayIndexSelectivity(rootstats, index);
+	index_sel /= jsonPathStatsGetFreq(arrstats, 0.0);
+
+	/* Form pg_statistics tuple, taking into account array index selectivity */
+	return jsonPathStatsFormTuple(arrstats, type, 1.0 - index_sel);
+}
+
+/*
+ * jsonStatsGetPathFreq
+ *		Return frequency of a path (fraction of documents containing it).
+ */
+static float4
+jsonStatsGetPathFreq(JsonStats jsdata, Datum *path, int pathlen,
+					 bool try_array_indexes)
+{
+	float4		nullfrac;
+	JsonPathStats pstats = jsonStatsGetPath(jsdata, path, pathlen,
+											try_array_indexes, &nullfrac);
+	float4		freq = (1.0 - nullfrac) * jsonPathStatsGetFreq(pstats, 0.0);
+
+	CLAMP_PROBABILITY(freq);
+	return freq;
+}
+
+/*
+ * jsonbStatsVarOpConst
+ *		Prepare optimizer statistics for a given operator, from JSON stats.
+ *
+ * This handles only OpExpr expressions, with variable and a constant. We get
+ * the constant as is, and the variable is represented by statistics fetched
+ * by get_restriction_variable().
+ *
+ * opid    - OID of the operator (input parameter)
+ * resdata - pointer to calculated statistics for result of operator
+ * vardata - statistics for the restriction variable
+ * cnst    - constant from the operator expression
+ *
+ * Returns true when useful optimizer statistics have been calculated.
+ */
+static bool
+jsonbStatsVarOpConst(Oid opid, VariableStatData *resdata,
+					 const VariableStatData *vardata, Const *cnst)
+{
+	JsonStatData jsdata;
+	JsonStatType statype = JsonStatJsonb;
+
+	if (!jsonStatsInit(&jsdata, vardata))
+		return false;
+
+	switch (opid)
+	{
+		case JsonbObjectFieldTextOperator:
+			statype = JsonStatText;
+			/* FALLTHROUGH */
+		case JsonbObjectFieldOperator:
+		{
+			if (cnst->consttype != TEXTOID)
+			{
+				jsonStatsRelease(&jsdata);
+				return false;
+			}
+
+			resdata->statsTuple = jsonStatsGetPathTuple(&jsdata, statype,
+														&cnst->constvalue, 1,
+														false);
+			break;
+		}
+
+		case JsonbArrayElementTextOperator:
+			statype = JsonStatText;
+			/* FALLTHROUGH */
+		case JsonbArrayElementOperator:
+		{
+			if (cnst->consttype != INT4OID)
+			{
+				jsonStatsRelease(&jsdata);
+				return false;
+			}
+
+			resdata->statsTuple =
+				jsonStatsGetArrayIndexStatsTuple(&jsdata, statype,
+												 DatumGetInt32(cnst->constvalue));
+			break;
+		}
+
+		case JsonbExtractPathTextOperator:
+			statype = JsonStatText;
+			/* FALLTHROUGH */
+		case JsonbExtractPathOperator:
+		{
+			Datum	   *path;
+			bool	   *nulls;
+			int			pathlen;
+			bool		have_nulls = false;
+
+			if (cnst->consttype != TEXTARRAYOID)
+			{
+				jsonStatsRelease(&jsdata);
+				return false;
+			}
+
+			deconstruct_array(DatumGetArrayTypeP(cnst->constvalue), TEXTOID,
+							  -1, false, 'i', &path, &nulls, &pathlen);
+
+			for (int i = 0; i < pathlen; i++)
+			{
+				if (nulls[i])
+				{
+					have_nulls = true;
+					break;
+				}
+			}
+
+			if (!have_nulls)
+				resdata->statsTuple = jsonStatsGetPathTuple(&jsdata, statype,
+															path, pathlen,
+															true);
+
+			pfree(path);
+			pfree(nulls);
+			break;
+		}
+
+		default:
+			jsonStatsRelease(&jsdata);
+			return false;
+	}
+
+	if (!resdata->statsTuple)
+		resdata->statsTuple = stats_form_tuple(NULL);	/* form all-NULL tuple */
+
+	resdata->acl_ok = vardata->acl_ok;
+	resdata->freefunc = heap_freetuple;
+	Assert(resdata->rel == vardata->rel);
+	Assert(resdata->atttype ==
+		(statype == JsonStatJsonb ? JSONBOID :
+		 statype == JsonStatText ? TEXTOID :
+		 /* statype == JsonStatFreq */ BOOLOID));
+
+	jsonStatsRelease(&jsdata);
+	return true;
+}
+
+/*
+ * jsonb_stats
+ *		Statistics estimation procedure for JSONB data type.
+ *
+ * This only supports OpExpr expressions, with (Var op Const) shape.
+ *
+ * Var really can be a chain of OpExprs with derived statistics
+ * (jsonb_column -> 'key1' -> key2'), because get_restriction_variable()
+ * already handles this case.
+ */
+Datum
+jsonb_stats(PG_FUNCTION_ARGS)
+{
+	PlannerInfo *root = (PlannerInfo *) PG_GETARG_POINTER(0);
+	OpExpr	   *opexpr = (OpExpr *) PG_GETARG_POINTER(1);
+	int			varRelid = PG_GETARG_INT32(2);
+	VariableStatData *resdata	= (VariableStatData *) PG_GETARG_POINTER(3);
+	VariableStatData vardata;
+	Node	   *constexpr;
+	bool		varonleft;
+
+	/* should only be called for OpExpr expressions */
+	Assert(IsA(opexpr, OpExpr));
+
+	/* Is the expression simple enough? (Var op Const) or similar? */
+	if (!get_restriction_variable(root, opexpr->args, varRelid,
+								  &vardata, &constexpr, &varonleft))
+		PG_RETURN_VOID();
+
+	/* XXX Could we also get varonleft=false in useful cases? */
+	if (IsA(constexpr, Const) && varonleft)
+		jsonbStatsVarOpConst(opexpr->opno, resdata, &vardata,
+							 (Const *) constexpr);
+
+	ReleaseVariableStats(vardata);
+
+	PG_RETURN_VOID();
+}
+
+/*
+ * jsonSelectivity
+ *		Use JSON statistics to estimate selectivity for (in)equalities.
+ *
+ * The statistics is represented as (arrays of) JSON values etc. so we
+ * need to pass the right operators to the functions.
+ */
+Selectivity
+jsonSelectivity(JsonPathStats stats, Datum scalar, Oid operator)
+{
+	VariableStatData vardata;
+	Selectivity sel;
+
+	if (!stats)
+		return 0.0;
+
+	vardata.atttype = JSONBOID;
+	vardata.atttypmod = -1;
+	vardata.isunique = false;
+	vardata.rel = stats->data->rel;
+	vardata.var = NULL;
+	vardata.vartype = JSONBOID;
+	vardata.acl_ok = stats->data->acl_ok;
+	vardata.statsTuple = jsonPathStatsFormTuple(stats,
+												JsonStatJsonbWithoutSubpaths, 0.0);
+
+	if (operator == JsonbEqOperator)
+		sel = var_eq_const(&vardata, operator, InvalidOid, scalar, false, true, false);
+	else
+		sel = scalarineqsel(NULL, operator,
+							/* is it greater or greater-or-equal? */
+							operator == JsonbGtOperator ||
+							operator == JsonbGeOperator,
+							/* is it equality? */
+							operator == JsonbLeOperator ||
+							operator == JsonbGeOperator,
+							InvalidOid,
+							&vardata, scalar, JSONBOID);
+
+	if (vardata.statsTuple)
+		heap_freetuple(vardata.statsTuple);
+
+	return sel;
+}
+
+/*
+ * jsonAccumulateSubPathSelectivity
+ *		Transform absolute subpath selectivity into relative and accumulate it
+ *		into parent path simply by multiplication of relative selectivities.
+ */
+static void
+jsonAccumulateSubPathSelectivity(Selectivity subpath_abs_sel,
+								 Selectivity path_freq,
+								 Selectivity *path_relative_sel,
+								 JsonPathStats array_path_stats)
+{
+	Selectivity sel = subpath_abs_sel / path_freq;	/* relative selectivity */
+
+	/* XXX Try to take into account array length */
+	if (array_path_stats)
+		sel = 1.0 - pow(1.0 - sel,
+						jsonPathStatsGetAvgArraySize(array_path_stats));
+
+	/* Accumulate selectivity of subpath into parent path */
+	*path_relative_sel *= sel;
+}
+
+/*
+ * jsonSelectivityContains
+ *		Estimate selectivity for containment operator on JSON.
+ *
+ * Iterate through query jsonb elements, build paths to its leaf elements,
+ * calculate selectivies of 'path == scalar' in leaves, multiply relative
+ * selectivities of subpaths at each path level, propagate computed
+ * selectivities to the root.
+ */
+static Selectivity
+jsonSelectivityContains(JsonStats stats, Jsonb *jb)
+{
+	JsonbValue		v;
+	JsonbIterator  *it;
+	JsonbIteratorToken r;
+	StringInfoData	pathstr;	/* path string */
+	struct Path					/* path stack entry */
+	{
+		struct Path *parent;	/* parent entry */
+		int			len;		/* associated length of pathstr */
+		Selectivity	freq;		/* absolute frequence of path */
+		Selectivity	sel;		/* relative selectivity of subpaths */
+		JsonPathStats stats;	/* statistics for the path */
+		bool		is_array_accesor;	/* is it '[*]' ? */
+	}			root,			/* root path entry */
+			   *path = &root;	/* path entry stack */
+	Selectivity	sel;			/* resulting selectivity */
+	Selectivity	scalarSel;		/* selectivity of 'jsonb == scalar' */
+
+	/* Initialize root path string */
+	initStringInfo(&pathstr);
+	appendBinaryStringInfo(&pathstr, stats->prefix, stats->prefixlen);
+
+	/* Initialize root path entry */
+	root.parent = NULL;
+	root.len = pathstr.len;
+	root.stats = jsonStatsFindPath(stats, pathstr.data, pathstr.len);
+	root.freq = jsonPathStatsGetFreq(root.stats, 0.0);
+	root.sel = 1.0;
+	root.is_array_accesor = pathstr.data[pathstr.len - 1] == ']';
+
+	/* Return 0, if NULL fraction is 1. */
+	if (root.freq <= 0.0)
+		return 0.0;
+
+	/*
+	 * Selectivity of query 'jsonb @> scalar' consists of  selectivities of
+	 * 'jsonb == scalar' and 'jsonb[*] == scalar'.  Selectivity of
+	 * 'jsonb[*] == scalar' will be computed in root.sel, but for
+	 * 'jsonb == scalar' we need additional computation.
+	 */
+	if (JsonContainerIsScalar(&jb->root))
+		scalarSel = jsonSelectivity(root.stats, JsonbPGetDatum(jb),
+									JsonbEqOperator);
+	else
+		scalarSel = 0.0;
+
+	it = JsonbIteratorInit(&jb->root);
+
+	while ((r = JsonbIteratorNext(&it, &v, false)) != WJB_DONE)
+	{
+		switch (r)
+		{
+			case WJB_BEGIN_OBJECT:
+			{
+				struct Path *p;
+				Selectivity freq =
+					jsonPathStatsGetTypeFreq(path->stats, jbvObject, 0.0);
+
+				/* If there are no objects, selectivity is 0. */
+				if (freq <= 0.0)
+					return 0.0;
+
+				/*
+				 * Push path entry for object keys, actual key names are
+				 * appended later in WJB_KEY case.
+				 */
+				p = palloc(sizeof(*p));
+				p->len = pathstr.len;
+				p->parent = path;
+				p->stats = NULL;
+				p->freq = freq;
+				p->sel = 1.0;
+				p->is_array_accesor = false;
+				path = p;
+				break;
+			}
+
+			case WJB_BEGIN_ARRAY:
+			{
+				struct Path *p;
+				JsonPathStats pstats;
+				Selectivity freq;
+
+				/*
+				 * First, find stats for the parent path if needed, it will be
+				 * used in jsonAccumulateSubPathSelectivity().
+				 */
+				if (!path->stats)
+					path->stats = jsonStatsFindPath(stats, pathstr.data,
+													pathstr.len);
+
+				/* Appeend path string entry for array elements, get stats. */
+				jsonPathAppendEntry(&pathstr, NULL);
+				pstats = jsonStatsFindPath(stats, pathstr.data, pathstr.len);
+				freq = jsonPathStatsGetFreq(pstats, 0.0);
+
+				/* If there are no arrays, return 0 or scalar selectivity */
+				if (freq <= 0.0)
+					return scalarSel;
+
+				/* Push path entry for array elements. */
+				p = palloc(sizeof(*p));
+				p->len = pathstr.len;
+				p->parent = path;
+				p->stats = pstats;
+				p->freq = freq;
+				p->sel = 1.0;
+				p->is_array_accesor = true;
+				path = p;
+				break;
+			}
+
+			case WJB_END_OBJECT:
+			case WJB_END_ARRAY:
+			{
+				struct Path *p = path;
+				/* Absoulte selectivity of the path with its all subpaths */
+				Selectivity abs_sel = p->sel * p->freq;
+
+				/* Pop last path entry */
+				path = path->parent;
+				pfree(p);
+				pathstr.len = path->len;
+				pathstr.data[pathstr.len] = '\0';
+
+				/* Accumulate selectivity into parent path */
+				jsonAccumulateSubPathSelectivity(abs_sel, path->freq,
+												 &path->sel,
+												 path->is_array_accesor ?
+												 path->parent->stats : NULL);
+				break;
+			}
+
+			case WJB_KEY:
+			{
+				/* Remove previous key in the path string */
+				pathstr.len = path->parent->len;
+				pathstr.data[pathstr.len] = '\0';
+
+				/* Append current key to path string */
+				jsonPathAppendEntryWithLen(&pathstr, v.val.string.val,
+										   v.val.string.len);
+				path->len = pathstr.len;
+				break;
+			}
+
+			case WJB_VALUE:
+			case WJB_ELEM:
+			{
+				/*
+				 * Extract statistics for a path.  Array elements share the
+				 * same statistics that was extracted in WJB_BEGIN_ARRAY.
+				 */
+				JsonPathStats pstats = r == WJB_ELEM ? path->stats :
+					jsonStatsFindPath(stats, pathstr.data, pathstr.len);
+				Selectivity abs_sel;	/* Absolute selectivity of 'path == scalar' */
+
+				if (pstats)
+				{
+					/* Make scalar jsonb datum and compute selectivity */
+					Datum		scalar = JsonbPGetDatum(JsonbValueToJsonb(&v));
+
+					abs_sel = jsonSelectivity(pstats, scalar, JsonbEqOperator);
+				}
+				else
+					abs_sel = 0.0;
+
+				/* Accumulate selectivity into parent path */
+				jsonAccumulateSubPathSelectivity(abs_sel, path->freq,
+												 &path->sel,
+												 path->is_array_accesor ?
+												 path->parent->stats : NULL);
+				break;
+			}
+
+			default:
+				break;
+		}
+	}
+
+	/* Compute absolute selectivity for root, including raw scalar case. */
+	sel = root.sel * root.freq + scalarSel;
+	CLAMP_PROBABILITY(sel);
+	return sel;
+}
+
+/*
+ * jsonSelectivityExists
+ *		Estimate selectivity for JSON "exists" operator.
+ */
+static Selectivity
+jsonSelectivityExists(JsonStats stats, Datum key)
+{
+	JsonPathStats rootstats;
+	JsonPathStats arrstats;
+	JsonbValue	jbvkey;
+	Datum		jbkey;
+	Selectivity keysel;
+	Selectivity scalarsel;
+	Selectivity arraysel;
+	Selectivity sel;
+
+	JsonValueInitStringWithLen(&jbvkey,
+							   VARDATA_ANY(key), VARSIZE_ANY_EXHDR(key));
+
+	jbkey = JsonbPGetDatum(JsonbValueToJsonb(&jbvkey));
+
+	keysel = jsonStatsGetPathFreq(stats, &key, 1, false);
+
+	rootstats = jsonStatsGetRootPath(stats);
+	scalarsel = jsonSelectivity(rootstats, jbkey, JsonbEqOperator);
+
+	arrstats = jsonStatsGetRootArrayPath(stats);
+	arraysel = jsonSelectivity(arrstats, jbkey, JsonbEqOperator);
+	arraysel = 1.0 - pow(1.0 - arraysel,
+						 jsonPathStatsGetAvgArraySize(rootstats));
+
+	sel = keysel + scalarsel + arraysel;
+	CLAMP_PROBABILITY(sel);
+	return sel;
+}
+
+static Selectivity
+jsonb_sel_internal(JsonStats stats, Oid operator, Const *cnst, bool varonleft)
+{
+	switch (operator)
+	{
+		case JsonbExistsOperator:
+			if (!varonleft || cnst->consttype != TEXTOID)
+				break;
+
+			return jsonSelectivityExists(stats, cnst->constvalue);
+
+		case JsonbExistsAnyOperator:
+		case JsonbExistsAllOperator:
+		{
+			Datum	   *keys;
+			bool	   *nulls;
+			Selectivity	freq = 1.0;
+			int			nkeys;
+			int			i;
+			bool		all = operator == JsonbExistsAllOperator;
+
+			if (!varonleft || cnst->consttype != TEXTARRAYOID)
+				break;
+
+			deconstruct_array(DatumGetArrayTypeP(cnst->constvalue), TEXTOID,
+							  -1, false, 'i', &keys, &nulls, &nkeys);
+
+			for (i = 0; i < nkeys; i++)
+				if (!nulls[i])
+				{
+					Selectivity pathfreq = jsonSelectivityExists(stats,
+																 keys[i]);
+					freq *= all ? pathfreq : (1.0 - pathfreq);
+				}
+
+			pfree(keys);
+			pfree(nulls);
+
+			if (!all)
+				freq = 1.0 - freq;
+
+			return freq;
+		}
+
+		case JsonbContainedOperator:
+			if (varonleft || cnst->consttype != JSONBOID)
+				break;
+
+			return jsonSelectivityContains(stats,
+										   DatumGetJsonbP(cnst->constvalue));
+
+		case JsonbContainsOperator:
+			if (!varonleft || cnst->consttype != JSONBOID)
+				break;
+
+			return jsonSelectivityContains(stats,
+										   DatumGetJsonbP(cnst->constvalue));
+
+		default:
+			break;
+	}
+
+	return DEFAULT_JSON_CONTAINS_SEL;
+}
+
+/*
+ * jsonb_sel
+ *		The main procedure estimating selectivity for all JSONB operators.
+ */
+Datum
+jsonb_sel(PG_FUNCTION_ARGS)
+{
+	PlannerInfo *root = (PlannerInfo *) PG_GETARG_POINTER(0);
+	Oid			operator = PG_GETARG_OID(1);
+	List	   *args = (List *) PG_GETARG_POINTER(2);
+	int			varRelid = PG_GETARG_INT32(3);
+	double		sel = DEFAULT_JSON_CONTAINS_SEL;
+	Node	   *other;
+	bool		varonleft;
+	VariableStatData vardata;
+
+	if (get_restriction_variable(root, args, varRelid,
+								  &vardata, &other, &varonleft))
+	{
+		if (IsA(other, Const))
+		{
+			Const	   *cnst = (Const *) other;
+
+			if (cnst->constisnull)
+				sel = 0.0;
+			else
+			{
+				JsonStatData stats;
+
+				if (jsonStatsInit(&stats, &vardata))
+				{
+					sel = jsonb_sel_internal(&stats, operator, cnst, varonleft);
+					jsonStatsRelease(&stats);
+				}
+			}
+		}
+
+		ReleaseVariableStats(vardata);
+	}
+
+	PG_RETURN_FLOAT8((float8) sel);
+}
diff --git a/src/backend/utils/adt/jsonb_typanalyze.c b/src/backend/utils/adt/jsonb_typanalyze.c
new file mode 100644
index 00000000000..7882db23a87
--- /dev/null
+++ b/src/backend/utils/adt/jsonb_typanalyze.c
@@ -0,0 +1,1627 @@
+/*-------------------------------------------------------------------------
+ *
+ * jsonb_typanalyze.c
+ *	  Functions for gathering statistics from jsonb columns
+ *
+ * Copyright (c) 2016-2022, PostgreSQL Global Development Group
+ *
+ * Functions in this module are used to analyze contents of JSONB columns
+ * and build optimizer statistics. In principle we extract paths from all
+ * sampled documents and calculate the usual statistics (MCV, histogram)
+ * for each path - in principle each path is treated as a column.
+ *
+ * Because we're not enforcing any JSON schema, the documents may differ
+ * a lot - the documents may contain large number of different keys, the
+ * types of values may be entirely different, etc. This makes it more
+ * challenging than building stats for regular columns. For example not
+ * only do we need to decide which values to keep in the MCV, but also
+ * which paths to keep (in case the documents are so variable we can't
+ * keep all paths).
+ *
+ * The statistics is stored in pg_statistic, in a slot with a new stakind
+ * value (STATISTIC_KIND_JSON). The statistics is serialized as an array
+ * of JSONB values, eash element storing statistics for one path.
+ *
+ * For each path, we store the following keys:
+ *
+ * - path         - path this stats is for, serialized as jsonpath
+ * - freq         - frequency of documents containing this path
+ * - json         - the regular per-column stats (MCV, histogram, ...)
+ * - freq_null    - frequency of JSON null values
+ * - freq_array   - frequency of JSON array values
+ * - freq_object  - frequency of JSON object values
+ * - freq_string  - frequency of JSON string values
+ * - freq_numeric - frequency of JSON numeric values
+ *
+ * This is stored in the stavalues array.
+ *
+ * The first element of stavalues is a path prefix.  It is used for avoiding
+ * path transformations when the derived statistics for the chains of ->
+ * operators is computed.
+ *
+ * The per-column stats (stored in the "json" key) have additional internal
+ * structure, to allow storing multiple stakind types (histogram, mcv). See
+ * jsonAnalyzeMakeScalarStats for details.
+ *
+ *
+ * XXX It's a bit weird the "regular" stats are stored in the "json" key,
+ * while the JSON stats (frequencies of different JSON types) are right
+ * at the top level.
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/adt/jsonb_typanalyze.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "fmgr.h"
+#include "access/hash.h"
+#include "access/detoast.h"
+#include "catalog/pg_attribute.h"
+#include "catalog/pg_operator.h"
+#include "catalog/pg_type.h"
+#include "commands/vacuum.h"
+#include "utils/builtins.h"
+#include "utils/guc.h"
+#include "utils/hsearch.h"
+#include "utils/json.h"
+#include "utils/jsonb.h"
+#include "utils/json_selfuncs.h"
+#include "utils/lsyscache.h"
+#include "utils/memutils.h"
+
+typedef struct JsonPathEntry JsonPathEntry;
+
+/*
+ * Element of a path in the JSON document (i.e. not jsonpath). Elements
+ * are linked together to build longer paths.
+ *
+ * 'entry' can be not zero-terminated when it is pointing to JSONB keys, so
+ * 'len' is necessary.  'len' is also used for faster entry comparison, to
+ * distinguish array entries ('len' == -1).
+ */
+typedef struct JsonPathEntry
+{
+	JsonPathEntry  *parent;
+	const char	   *entry;		/* element of the path as a string */
+	int				len;		/* length of entry string (may be 0 or -1) */
+	uint32			hash;		/* hash of the whole path (with parent) */
+	char	   *pathstr;		/* full path string */
+	int			depth;			/* nesting level, i.e. path length */
+} JsonPathEntry;
+
+#define JsonPathEntryIsArray(entry) ((entry)->len == -1)
+
+/*
+ * An array containing a dynamic number of values extracted from JSON documents.
+ * All values should have the same data type:
+ *		jsonb   - ordinary path stats, values of different JSON types
+ *		int32   - array/object length stats
+ *		text    - separate stats fro strings
+ *		numeric - separate stats fro numbers
+ */
+typedef struct JsonValues
+{
+	Datum	   *buf;
+	int			count;
+	int			allocated;
+} JsonValues;
+
+/*
+ * Scalar statistics built for an array of values, extracted from a JSON
+ * document (for one particular path).
+ */
+typedef struct JsonScalarStats
+{
+	JsonValues		values;
+	VacAttrStats	stats;
+} JsonScalarStats;
+
+/*
+ * Statistics calculated for a set of values.
+ *
+ *
+ * XXX This seems rather complicated and needs simplification. We're not
+ * really using all the various JsonScalarStats bits, there's a lot of
+ * duplication (e.g. each JsonScalarStats contains it's own array, which
+ * has a copy of data from the one in "jsons").
+ */
+typedef struct JsonValueStats
+{
+	JsonScalarStats	jsons;		/* stats for all JSON types together */
+
+#ifdef JSON_ANALYZE_SCALARS		/* XXX */
+	JsonScalarStats	strings;	/* stats for JSON strings */
+	JsonScalarStats	numerics;	/* stats for JSON numerics */
+#endif
+
+	JsonScalarStats	arrlens;	/* stats of array lengths */
+	JsonScalarStats	objlens;	/* stats of object lengths */
+
+	int				nnulls;		/* number of JSON null values */
+	int				ntrue;		/* number of JSON true values */
+	int				nfalse;		/* number of JSON false values */
+	int				nobjects;	/* number of JSON objects */
+	int				narrays;	/* number of JSON arrays */
+	int				nstrings;	/* number of JSON strings */
+	int				nnumerics;	/* number of JSON numerics */
+
+	int64			narrelems;	/* total number of array elements
+								 * (for avg. array length) */
+} JsonValueStats;
+
+typedef struct JsonPathDocBitmap
+{
+	bool		is_list;
+	int			size;
+	int			allocated;
+	union
+	{
+		int32	   *list;
+		uint8	   *bitmap;
+	}			data;
+} JsonPathDocBitmap;
+
+/* JSON path and list of documents containing it */
+typedef struct JsonPathAnlDocs
+{
+	JsonPathEntry path;
+	JsonPathDocBitmap bitmap;
+} JsonPathAnlDocs;
+
+/* Main structure for analyzed JSON path  */
+typedef struct JsonPathAnlStats
+{
+	JsonPathEntry path;
+	double		freq;		/* frequence of the path */
+	JsonValueStats vstats;	/* collected values and raw computed stats */
+	Jsonb	   *stats;		/* stats converted into jsonb form */
+} JsonPathAnlStats;
+
+/* Some parent path stats counters that used for frequency calculations */
+typedef struct JsonPathParentStats
+{
+	double		freq;
+	int			count;
+	int			narrays;
+} JsonPathParentStats;
+
+/* various bits needed while analyzing JSON */
+typedef struct JsonAnalyzeContext
+{
+	VacAttrStats		   *stats;
+	MemoryContext			mcxt;
+	AnalyzeAttrFetchFunc	fetchfunc;
+	HTAB				   *pathshash;
+	JsonPathAnlStats	   *root;
+	double					totalrows;
+	double					total_width;
+	int						samplerows;
+	int						current_rownum;
+	int						target;
+	int						null_cnt;
+	int						analyzed_cnt;
+	int						maxdepth;
+	bool					scalarsOnly;
+	bool					single_pass;
+} JsonAnalyzeContext;
+
+/*
+ * JsonPathMatch
+ *		Determine when two JSON paths (list of JsonPathEntry) match.
+ *
+ * Returned int instead of bool, because it is an implementation of
+ * HashCompareFunc.
+ */
+static int
+JsonPathEntryMatch(const void *key1, const void *key2, Size keysize)
+{
+	const JsonPathEntry *path1 = key1;
+	const JsonPathEntry *path2 = key2;
+
+	return path1->parent != path2->parent ||
+		   path1->len != path2->len ||
+		   (path1->len > 0 &&
+			strncmp(path1->entry, path2->entry, path1->len));
+}
+
+/*
+ * JsonPathHash
+ *		Calculate hash of the path entry.
+ *
+ * Parent hash should be already calculated.
+ */
+static uint32
+JsonPathEntryHash(const void *key, Size keysize)
+{
+	const JsonPathEntry	   *path = key;
+	uint32					hash = path->parent ? path->parent->hash : 0;
+
+	hash = (hash << 1) | (hash >> 31);
+	hash ^= path->len < 0 ? 0 :
+		DatumGetUInt32(hash_any((const unsigned char *) path->entry, path->len));
+
+	return hash;
+}
+
+static void
+jsonStatsBitmapInit(JsonPathDocBitmap *bitmap)
+{
+	memset(bitmap, 0, sizeof(*bitmap));
+	bitmap->is_list = true;
+}
+
+static void
+jsonStatsBitmapAdd(JsonAnalyzeContext *cxt, JsonPathDocBitmap *bitmap, int doc)
+{
+	/* Use more compact list representation if not too many bits set */
+	if (bitmap->is_list)
+	{
+		int		   *list = bitmap->data.list;
+
+#if 1	/* Enable list representation */
+		if (bitmap->size > 0 && list[bitmap->size - 1] == doc)
+			return;
+
+		if (bitmap->size < cxt->samplerows / sizeof(list[0]) / 8)
+		{
+			if (bitmap->size >= bitmap->allocated)
+			{
+				MemoryContext oldcxt = MemoryContextSwitchTo(cxt->mcxt);
+
+				if (bitmap->allocated)
+				{
+					bitmap->allocated *= 2;
+					list = repalloc(list, sizeof(list[0]) * bitmap->allocated);
+				}
+				else
+				{
+					bitmap->allocated = 8;
+					list = palloc(sizeof(list[0]) * bitmap->allocated);
+				}
+
+				bitmap->data.list = list;
+
+				MemoryContextSwitchTo(oldcxt);
+			}
+
+			list[bitmap->size++] = doc;
+			return;
+		}
+#endif
+		/* convert list to bitmap */
+		bitmap->allocated = (cxt->samplerows + 7) / 8;
+		bitmap->data.bitmap = MemoryContextAllocZero(cxt->mcxt, bitmap->allocated);
+		bitmap->is_list = false;
+
+		if (list)
+		{
+			for (int i = 0; i < bitmap->size; i++)
+			{
+				int			d = list[i];
+
+				bitmap->data.bitmap[d / 8] |= (1 << (d % 8));
+			}
+
+			pfree(list);
+		}
+	}
+
+	/* set bit in bitmap */
+	if (doc < cxt->samplerows &&
+		!(bitmap->data.bitmap[doc / 8] & (1 << (doc % 8))))
+	{
+		bitmap->data.bitmap[doc / 8] |= (1 << (doc % 8));
+		bitmap->size++;
+	}
+}
+
+static bool
+jsonStatsBitmapNext(JsonPathDocBitmap *bitmap, int *pbit)
+{
+	uint8	   *bmp = bitmap->data.bitmap;
+	uint8	   *pb;
+	uint8	   *pb_end = &bmp[bitmap->allocated];
+	int			bit = *pbit;
+
+	Assert(!bitmap->is_list);
+
+	if (bit < 0)
+	{
+		pb = bmp;
+		bit = 0;
+	}
+	else
+	{
+		++bit;
+		pb = &bmp[bit / 8];
+		bit %= 8;
+	}
+
+	for (; pb < pb_end; pb++, bit = 0)
+	{
+		uint8		b;
+
+		/* Skip zero bytes */
+		if (!bit)
+		{
+			while (!*pb)
+			{
+				if (++pb >= pb_end)
+					return false;
+			}
+		}
+
+		b = *pb;
+
+		/* Skip zero bits */
+		while (bit < 8 && !(b & (1 << bit)))
+			bit++;
+
+		if (bit >= 8)
+			continue;	/* Non-zero bit not found, go to next byte */
+
+		/* Output next non-zero bit */
+		*pbit = (pb - bmp) * 8 + bit;
+		return true;
+	}
+
+	return false;
+}
+
+static void
+jsonStatsAnlInit(JsonPathAnlStats *stats)
+{
+	/* initialize the stats counter for this path entry */
+	memset(&stats->vstats, 0, sizeof(JsonValueStats));
+	stats->stats = NULL;
+	stats->freq = 0.0;
+}
+
+/*
+ * jsonAnalyzeAddPath
+ *		Add an entry for a JSON path to the working list of statistics.
+ *
+ * Returns a pointer to JsonPathAnlStats (which might have already existed
+ * if the path was in earlier document), which can then be populated or
+ * updated.
+ */
+static inline JsonPathEntry *
+jsonAnalyzeAddPath(JsonAnalyzeContext *ctx, JsonPathEntry *parent,
+				   const char *entry, int len)
+{
+	JsonPathEntry path;
+	JsonPathEntry *stats;
+	bool		found;
+
+	/* Init path entry */
+	path.parent = parent;
+	path.entry = entry;
+	path.len = len;
+	path.hash = JsonPathEntryHash(&path, 0);
+
+	/* See if we already saw this path earlier. */
+	stats = hash_search_with_hash_value(ctx->pathshash, &path, path.hash,
+										HASH_ENTER, &found);
+
+	/*
+	 * Nope, it's the first time we see this path, so initialize all the
+	 * fields (path string, counters, ...).
+	 */
+	if (!found)
+	{
+		JsonPathEntry *parent = stats->parent;
+		const char *ppath = parent->pathstr;
+		StringInfoData si;
+		MemoryContext oldcxt;
+
+		oldcxt = MemoryContextSwitchTo(ctx->mcxt);
+
+		/* NULL entries are treated as wildcard array accessors "[*]" */
+		if (stats->entry)
+			/* Copy path entry name into the right MemoryContext */
+			stats->entry = pnstrdup(stats->entry, stats->len);
+
+		MemoryContextSwitchTo(oldcxt);
+
+		/* Initialze full path string */
+		initStringInfo(&si);
+		appendStringInfoString(&si, ppath);
+		jsonPathAppendEntry(&si, stats->entry);
+
+		MemoryContextSwitchTo(ctx->mcxt);
+		stats->pathstr = pstrdup(si.data);
+		MemoryContextSwitchTo(oldcxt);
+
+		pfree(si.data);
+
+		if (ctx->single_pass)
+			jsonStatsAnlInit((JsonPathAnlStats *) stats);
+		else
+			jsonStatsBitmapInit(&((JsonPathAnlDocs *) stats)->bitmap);
+
+		stats->depth = parent->depth + 1;
+
+		/* update maximal depth */
+		if (ctx->maxdepth < stats->depth)
+			ctx->maxdepth = stats->depth;
+	}
+
+	return stats;
+}
+
+/*
+ * JsonValuesAppend
+ *		Add a JSON value to the dynamic array (enlarge it if needed).
+ *
+ * XXX This is likely one of the problems - the documents may be pretty
+ * large, with a lot of different values for each path. At that point
+ * it's problematic to keep all of that in memory at once. So maybe we
+ * need to introduce some sort of compaction (e.g. we could try
+ * deduplicating the values), limit on size of the array or something.
+ */
+static inline void
+JsonValuesAppend(JsonValues *values, Datum value, int initialSize)
+{
+	if (values->count >= values->allocated)
+	{
+		if (values->allocated)
+		{
+			values->allocated = values->allocated * 2;
+			values->buf = repalloc(values->buf,
+									sizeof(values->buf[0]) * values->allocated);
+		}
+		else
+		{
+			values->allocated = initialSize;
+			values->buf = palloc(sizeof(values->buf[0]) * values->allocated);
+		}
+	}
+
+	values->buf[values->count++] = value;
+}
+
+/*
+ * jsonAnalyzeJsonValue
+ *		Process a value extracted from the document (for a given path).
+ */
+static inline void
+jsonAnalyzeJsonValue(JsonAnalyzeContext *ctx, JsonValueStats *vstats,
+					 JsonbValue *jv)
+{
+	JsonbValue *jbv;
+	JsonbValue	jbvtmp;
+	Jsonb	   *jb;
+	Datum		value;
+	MemoryContext oldcxt = NULL;
+
+	/* XXX if analyzing only scalar values, make containers empty */
+	if (ctx->scalarsOnly && jv->type == jbvBinary)
+	{
+		if (JsonContainerIsObject(jv->val.binary.data))
+			jbv = JsonValueInitObject(&jbvtmp, 0, 0);
+		else
+		{
+			Assert(JsonContainerIsArray(jv->val.binary.data));
+			jbv = JsonValueInitArray(&jbvtmp, 0, 0, false);
+		}
+	}
+	else
+		jbv = jv;
+
+	jb = JsonbValueToJsonb(jbv);
+
+	if (ctx->single_pass)
+	{
+		oldcxt = MemoryContextSwitchTo(ctx->stats->anl_context);
+		jb = memcpy(palloc(VARSIZE(jb)), jb, VARSIZE(jb));
+	}
+
+	/* always add it to the "global" JSON stats, shared by all types */
+	JsonValuesAppend(&vstats->jsons.values,
+					 JsonbPGetDatum(jb),
+					 ctx->target);
+
+	/* also update the type-specific counters */
+	switch (jv->type)
+	{
+		case jbvNull:
+			vstats->nnulls++;
+			break;
+
+		case jbvBool:
+			if (jv->val.boolean)
+				vstats->ntrue++;
+			else
+				vstats->nfalse++;
+			break;
+
+		case jbvString:
+			vstats->nstrings++;
+#ifdef JSON_ANALYZE_SCALARS
+			value = PointerGetDatum(
+						cstring_to_text_with_len(jv->val.string.val,
+												 jv->val.string.len));
+			JsonValuesAppend(&vstats->strings.values, value, ctx->target);
+#endif
+			break;
+
+		case jbvNumeric:
+			vstats->nnumerics++;
+#ifdef JSON_ANALYZE_SCALARS
+			value = PointerGetDatum(jv->val.numeric);
+			JsonValuesAppend(&vstats->numerics.values, value, ctx->target);
+#endif
+			break;
+
+		case jbvBinary:
+			if (JsonContainerIsObject(jv->val.binary.data))
+			{
+				uint32		size = JsonContainerSize(jv->val.binary.data);
+
+				value = DatumGetInt32(size);
+				vstats->nobjects++;
+				JsonValuesAppend(&vstats->objlens.values, value, ctx->target);
+			}
+			else if (JsonContainerIsArray(jv->val.binary.data))
+			{
+				uint32		size = JsonContainerSize(jv->val.binary.data);
+
+				value = DatumGetInt32(size);
+				vstats->narrays++;
+				JsonValuesAppend(&vstats->arrlens.values, value, ctx->target);
+				vstats->narrelems += size;
+			}
+			break;
+
+		default:
+			elog(ERROR, "invalid scalar json value type %d", jv->type);
+			break;
+	}
+
+	if (ctx->single_pass)
+		MemoryContextSwitchTo(oldcxt);
+}
+
+/*
+ * jsonAnalyzeCollectPaths
+ *		Parse the JSON document and collect all paths and their values.
+ */
+static void
+jsonAnalyzeCollectPaths(JsonAnalyzeContext *ctx, Jsonb *jb, void *param)
+{
+	JsonbValue			jv;
+	JsonbIterator	   *it;
+	JsonbIteratorToken	tok;
+	JsonPathEntry	   *stats = &ctx->root->path;
+	int					doc = ctx->current_rownum;
+	bool				collect_values = (bool)(intptr_t) param;
+	bool				scalar = false;
+
+	if (collect_values && !JB_ROOT_IS_SCALAR(jb))
+		jsonAnalyzeJsonValue(ctx, &((JsonPathAnlStats *) stats)->vstats,
+							 JsonValueInitBinary(&jv, jb));
+
+	it = JsonbIteratorInit(&jb->root);
+
+	while ((tok = JsonbIteratorNext(&it, &jv, true)) != WJB_DONE)
+	{
+		switch (tok)
+		{
+			case WJB_BEGIN_OBJECT:
+				/*
+				 * Read next token to see if the object is empty or not.
+				 * If not, make stats for the first key.  Subsequent WJB_KEYs
+				 * and WJB_END_OBJECT will expect that stats will be pointing
+				 * to the key of current object.
+				 */
+				tok = JsonbIteratorNext(&it, &jv, true);
+
+				if (tok == WJB_END_OBJECT)
+					/* Empty object, simply skip stats initialization. */
+					break;
+
+				if (tok != WJB_KEY)
+					elog(ERROR, "unexpected jsonb iterator token: %d", tok);
+
+				stats = jsonAnalyzeAddPath(ctx, stats,
+										   jv.val.string.val,
+										   jv.val.string.len);
+				break;
+
+			case WJB_BEGIN_ARRAY:
+				/* Make stats for non-scalar array and use it for all elements */
+				if (!(scalar = jv.val.array.rawScalar))
+					stats = jsonAnalyzeAddPath(ctx, stats, NULL, -1);
+				break;
+
+			case WJB_END_ARRAY:
+				if (scalar)
+					break;
+				/* FALLTHROUGH */
+			case WJB_END_OBJECT:
+				/* Reset to parent stats */
+				stats = stats->parent;
+				break;
+
+			case WJB_KEY:
+				/*
+				 * Stats should point to the previous key of current object,
+				 * use its parent path as a base path.
+				 */
+				stats = jsonAnalyzeAddPath(ctx, stats->parent,
+										   jv.val.string.val,
+										   jv.val.string.len);
+				break;
+
+			case WJB_VALUE:
+			case WJB_ELEM:
+				if (collect_values)
+					jsonAnalyzeJsonValue(ctx,
+										 &((JsonPathAnlStats *) stats)->vstats,
+										 &jv);
+				else if (stats != &ctx->root->path)
+					jsonStatsBitmapAdd(ctx,
+									   &((JsonPathAnlDocs *) stats)->bitmap,
+									   doc);
+
+				/*
+				 * Manually recurse into container by creating child iterator.
+				 * We use skipNested=true to give jsonAnalyzeJsonValue()
+				 * ability to access jbvBinary containers.
+				 */
+				if (jv.type == jbvBinary)
+				{
+					JsonbIterator *it2 = JsonbIteratorInit(jv.val.binary.data);
+
+					it2->parent = it;
+					it = it2;
+				}
+				break;
+
+			default:
+				break;
+		}
+	}
+}
+
+/*
+ * jsonAnalyzeCollectSubpath
+ *		Recursively extract trailing part of a path and collect its values.
+ */
+static void
+jsonAnalyzeCollectSubpath(JsonAnalyzeContext *ctx, JsonPathAnlStats *pstats,
+						  JsonbValue *jbv, JsonPathEntry **entries,
+						  int start_entry)
+{
+	JsonbValue	scalar;
+	int			i;
+
+	for (i = start_entry; i < pstats->path.depth; i++)
+	{
+		JsonPathEntry  *entry = entries[i];
+		JsonbContainer *jbc = jbv->val.binary.data;
+		JsonbValueType	type = jbv->type;
+
+		if (i > start_entry)
+			pfree(jbv);
+
+		if (type != jbvBinary)
+			return;
+
+		if (JsonPathEntryIsArray(entry))
+		{
+			JsonbIterator	   *it;
+			JsonbIteratorToken	r;
+			JsonbValue			elem;
+
+			if (!JsonContainerIsArray(jbc) || JsonContainerIsScalar(jbc))
+				return;
+
+			it = JsonbIteratorInit(jbc);
+
+			while ((r = JsonbIteratorNext(&it, &elem, true)) != WJB_DONE)
+			{
+				if (r == WJB_ELEM)
+					jsonAnalyzeCollectSubpath(ctx, pstats, &elem, entries, i + 1);
+			}
+
+			return;
+		}
+		else
+		{
+			if (!JsonContainerIsObject(jbc))
+				return;
+
+			jbv = findJsonbValueFromContainerLen(jbc, JB_FOBJECT,
+												 entry->entry, entry->len);
+
+			if (!jbv)
+				return;
+		}
+	}
+
+	if (i == start_entry &&
+		jbv->type == jbvBinary &&
+		JsonbExtractScalar(jbv->val.binary.data, &scalar))
+		jbv = &scalar;
+
+	jsonAnalyzeJsonValue(ctx, &pstats->vstats, jbv);
+
+	if (i > start_entry)
+		pfree(jbv);
+}
+
+/*
+ * jsonAnalyzeCollectPath
+ *		Extract a single path from JSON documents and collect its values.
+ */
+static void
+jsonAnalyzeCollectPath(JsonAnalyzeContext *ctx, Jsonb *jb, void *param)
+{
+	JsonPathAnlStats *pstats = (JsonPathAnlStats *) param;
+	JsonbValue	jbvtmp;
+	JsonbValue *jbv = JsonValueInitBinary(&jbvtmp, jb);
+	JsonPathEntry *path;
+	JsonPathEntry **entries;
+	int			i;
+
+	entries = palloc(sizeof(*entries) * pstats->path.depth);
+
+	/* Build entry array in direct order */
+	for (path = &pstats->path, i = pstats->path.depth - 1;
+		 path->parent && i >= 0;
+		 path = path->parent, i--)
+		entries[i] = path;
+
+	jsonAnalyzeCollectSubpath(ctx, pstats, jbv, entries, 0);
+
+	pfree(entries);
+}
+
+static Datum
+jsonAnalyzePathFetch(VacAttrStatsP stats, int rownum, bool *isnull)
+{
+	*isnull = false;
+	return stats->exprvals[rownum];
+}
+
+/*
+ * jsonAnalyzePathValues
+ *		Calculate per-column statistics for values for a single path.
+ *
+ * We have already accumulated all the values for the path, so we simply
+ * call the typanalyze function for the proper data type, and then
+ * compute_stats (which points to compute_scalar_stats or so).
+ */
+static void
+jsonAnalyzePathValues(JsonAnalyzeContext *ctx, JsonScalarStats *sstats,
+					  Oid typid, double freq, bool use_anl_context)
+{
+	JsonValues			   *values = &sstats->values;
+	VacAttrStats		   *stats = &sstats->stats;
+	FormData_pg_attribute	attr;
+	FormData_pg_type		type;
+	int						i;
+
+	if (!sstats->values.count)
+		return;
+
+	get_typlenbyvalalign(typid, &type.typlen, &type.typbyval, &type.typalign);
+
+	attr.attstattarget = ctx->target;
+
+	stats->attr = &attr;
+	stats->attrtypid = typid;
+	stats->attrtypmod = -1;
+	stats->attrtype = &type;
+	stats->anl_context = use_anl_context ? ctx->stats->anl_context : CurrentMemoryContext;
+
+	stats->exprvals = values->buf;
+
+	/*
+	 * The fields describing the stats->stavalues[n] element types default to
+	 * the type of the data being analyzed, but the type-specific typanalyze
+	 * function can change them if it wants to store something else.
+	 */
+	for (i = 0; i < STATISTIC_NUM_SLOTS; i++)
+	{
+		stats->statypid[i] = stats->attrtypid;
+		stats->statyplen[i] = stats->attrtype->typlen;
+		stats->statypbyval[i] = stats->attrtype->typbyval;
+		stats->statypalign[i] = stats->attrtype->typalign;
+	}
+
+	std_typanalyze(stats);
+
+	stats->compute_stats(stats, jsonAnalyzePathFetch,
+						 values->count,
+						 ctx->totalrows / ctx->samplerows * values->count);
+
+	/*
+	 * We've only kept the non-null values, so compute_stats will always
+	 * leave this as 1.0. But we have enough info to calculate the correct
+	 * value.
+	 */
+	stats->stanullfrac = (float4)(1.0 - freq);
+
+	/*
+	 * Similarly, we need to correct the MCV frequencies, becuse those are
+	 * also calculated only from the non-null values. All we need to do is
+	 * simply multiply that with the non-NULL frequency.
+	 */
+	for (i = 0; i < STATISTIC_NUM_SLOTS; i++)
+	{
+		if (stats->stakind[i] == STATISTIC_KIND_MCV)
+		{
+			int j;
+			for (j = 0; j < stats->numnumbers[i]; j++)
+				stats->stanumbers[i][j] *= freq;
+		}
+	}
+}
+
+/*
+ * jsonAnalyzeMakeScalarStats
+ *		Serialize scalar stats into a JSON representation.
+ *
+ * We simply produce a JSON document with a list of predefined keys:
+ *
+ * - nullfrac
+ * - distinct
+ * - width
+ * - correlation
+ * - mcv or histogram
+ *
+ * For the mcv / histogram, we store a nested values / numbers.
+ */
+static JsonbValue *
+jsonAnalyzeMakeScalarStats(JsonbParseState **ps, const char *name,
+							const VacAttrStats *stats)
+{
+	JsonbValue	val;
+	int			i;
+	int			j;
+
+	pushJsonbKey(ps, &val, name);
+
+	pushJsonbValue(ps, WJB_BEGIN_OBJECT, NULL);
+
+	pushJsonbKeyValueFloat(ps, &val, "nullfrac", stats->stanullfrac);
+	pushJsonbKeyValueFloat(ps, &val, "distinct", stats->stadistinct);
+	pushJsonbKeyValueInteger(ps, &val, "width", stats->stawidth);
+
+	for (i = 0; i < STATISTIC_NUM_SLOTS; i++)
+	{
+		if (!stats->stakind[i])
+			break;
+
+		switch (stats->stakind[i])
+		{
+			case STATISTIC_KIND_MCV:
+				pushJsonbKey(ps, &val, "mcv");
+				break;
+
+			case STATISTIC_KIND_HISTOGRAM:
+				pushJsonbKey(ps, &val, "histogram");
+				break;
+
+			case STATISTIC_KIND_CORRELATION:
+				pushJsonbKeyValueFloat(ps, &val, "correlation",
+									   stats->stanumbers[i][0]);
+				continue;
+
+			default:
+				elog(ERROR, "unexpected stakind %d", stats->stakind[i]);
+				break;
+		}
+
+		pushJsonbValue(ps, WJB_BEGIN_OBJECT, NULL);
+
+		if (stats->numvalues[i] > 0)
+		{
+			pushJsonbKey(ps, &val, "values");
+			pushJsonbValue(ps, WJB_BEGIN_ARRAY, NULL);
+			for (j = 0; j < stats->numvalues[i]; j++)
+			{
+				Datum v = stats->stavalues[i][j];
+				if (stats->attrtypid == JSONBOID)
+					pushJsonbElemBinary(ps, &val, DatumGetJsonbP(v));
+				else if (stats->attrtypid == TEXTOID)
+					pushJsonbElemText(ps, &val, DatumGetTextP(v));
+				else if (stats->attrtypid == NUMERICOID)
+					pushJsonbElemNumeric(ps, &val, DatumGetNumeric(v));
+				else if (stats->attrtypid == INT4OID)
+					pushJsonbElemInteger(ps, &val, DatumGetInt32(v));
+				else
+					elog(ERROR, "unexpected stat value type %d",
+						 stats->attrtypid);
+			}
+			pushJsonbValue(ps, WJB_END_ARRAY, NULL);
+		}
+
+		if (stats->numnumbers[i] > 0)
+		{
+			pushJsonbKey(ps, &val, "numbers");
+			pushJsonbValue(ps, WJB_BEGIN_ARRAY, NULL);
+			for (j = 0; j < stats->numnumbers[i]; j++)
+				pushJsonbElemFloat(ps, &val, stats->stanumbers[i][j]);
+			pushJsonbValue(ps, WJB_END_ARRAY, NULL);
+		}
+
+		pushJsonbValue(ps, WJB_END_OBJECT, NULL);
+	}
+
+	return pushJsonbValue(ps, WJB_END_OBJECT, NULL);
+}
+
+static void
+pushJsonbKeyValueFloatNonZero(JsonbParseState **ps, JsonbValue *jbv,
+							  const char *field, double val)
+{
+	if (val != 0.0)
+		pushJsonbKeyValueFloat(ps, jbv, field, val);
+}
+
+/*
+ * jsonAnalyzeBuildPathStats
+ *		Serialize statistics for a particular json path.
+ *
+ * This includes both the per-column stats (stored in "json" key) and the
+ * JSON specific stats (like frequencies of different object types).
+ */
+static Jsonb *
+jsonAnalyzeBuildPathStats(JsonPathAnlStats *pstats)
+{
+	const JsonValueStats *vstats = &pstats->vstats;
+	float4				freq = pstats->freq;
+	bool				fullstats = true;	/* pstats->path.parent != NULL */
+	JsonbValue			val;
+	JsonbValue		   *jbv;
+	JsonbParseState	   *ps = NULL;
+
+	pushJsonbValue(&ps, WJB_BEGIN_OBJECT, NULL);
+
+	pushJsonbKeyValueString(&ps, &val, "path", pstats->path.pathstr);
+
+	pushJsonbKeyValueFloat(&ps, &val, "freq", freq);
+
+	pushJsonbKeyValueFloatNonZero(&ps, &val, "freq_null",
+								  freq * vstats->nnulls /
+								  vstats->jsons.values.count);
+
+	pushJsonbKeyValueFloatNonZero(&ps, &val, "freq_boolean",
+								  freq * (vstats->nfalse + vstats->ntrue) /
+								  vstats->jsons.values.count);
+
+	pushJsonbKeyValueFloatNonZero(&ps, &val, "freq_string",
+								  freq * vstats->nstrings /
+								  vstats->jsons.values.count);
+
+	pushJsonbKeyValueFloatNonZero(&ps, &val, "freq_numeric",
+								  freq * vstats->nnumerics /
+								  vstats->jsons.values.count);
+
+	pushJsonbKeyValueFloatNonZero(&ps, &val, "freq_array",
+								  freq * vstats->narrays /
+								  vstats->jsons.values.count);
+
+	pushJsonbKeyValueFloatNonZero(&ps, &val, "freq_object",
+								  freq * vstats->nobjects /
+								  vstats->jsons.values.count);
+
+	/*
+	 * We keep array length stats here for queries like jsonpath '$.size() > 5'.
+	 * Object lengths stats can be useful for other query lanuages.
+	 */
+	if (vstats->arrlens.values.count)
+		jsonAnalyzeMakeScalarStats(&ps, "array_length", &vstats->arrlens.stats);
+
+	if (vstats->objlens.values.count)
+		jsonAnalyzeMakeScalarStats(&ps, "object_length", &vstats->objlens.stats);
+
+	if (vstats->narrays)
+		pushJsonbKeyValueFloat(&ps, &val, "avg_array_length",
+							   (float4) vstats->narrelems / vstats->narrays);
+
+	if (fullstats)
+	{
+#ifdef JSON_ANALYZE_SCALARS
+		jsonAnalyzeMakeScalarStats(&ps, "string", &vstats->strings.stats);
+		jsonAnalyzeMakeScalarStats(&ps, "numeric", &vstats->numerics.stats);
+#endif
+		jsonAnalyzeMakeScalarStats(&ps, "json", &vstats->jsons.stats);
+	}
+
+	jbv = pushJsonbValue(&ps, WJB_END_OBJECT, NULL);
+
+	return JsonbValueToJsonb(jbv);
+}
+
+/*
+ * jsonAnalyzeCalcPathFreq
+ *		Calculate path frequency, i.e. how many documents contain this path.
+ */
+static void
+jsonAnalyzeCalcPathFreq(JsonAnalyzeContext *ctx, JsonPathAnlStats *pstats,
+						JsonPathParentStats *parent)
+{
+	if (pstats->path.parent)
+	{
+		int			count = JsonPathEntryIsArray(&pstats->path)	?
+			parent->narrays : pstats->vstats.jsons.values.count;
+
+		pstats->freq = parent->freq * count / parent->count;
+
+		CLAMP_PROBABILITY(pstats->freq);
+	}
+	else
+		pstats->freq = (double) ctx->analyzed_cnt / ctx->samplerows;
+}
+
+/*
+ * jsonAnalyzePath
+ *		Build statistics for values accumulated for this path.
+ *
+ * We're done with accumulating values for this path, so calculate the
+ * statistics for the various arrays.
+ *
+ * XXX I wonder if we could introduce some simple heuristict on which
+ * paths to keep, similarly to what we do for MCV lists. For example a
+ * path that occurred just once is not very interesting, so we could
+ * decide to ignore it and not build the stats. Although that won't
+ * save much, because there'll be very few values accumulated.
+ */
+static Jsonb *
+jsonAnalyzePath(JsonAnalyzeContext *ctx, JsonPathAnlStats *pstats,
+				JsonPathParentStats *parent_stats)
+{
+	JsonValueStats	   *vstats = &pstats->vstats;
+	Jsonb			   *stats;
+
+	jsonAnalyzeCalcPathFreq(ctx, pstats, parent_stats);
+
+	/* values combining all object types */
+	jsonAnalyzePathValues(ctx, &vstats->jsons, JSONBOID, pstats->freq,
+						  /* store root stats in analyze context */
+						  !parent_stats);
+
+	/*
+	 * Lengths and array lengths.  We divide counts by the total number of json
+	 * values to compute correct nullfrac (i.e. not all jsons have lengths).
+	 */
+	jsonAnalyzePathValues(ctx, &vstats->arrlens, INT4OID,
+						  pstats->freq * vstats->arrlens.values.count /
+						  vstats->jsons.values.count, false);
+	jsonAnalyzePathValues(ctx, &vstats->objlens, INT4OID,
+						  pstats->freq * vstats->objlens.values.count /
+						  vstats->jsons.values.count, false);
+
+#ifdef JSON_ANALYZE_SCALARS
+	/* stats for values of string/numeric types only */
+	jsonAnalyzePathValues(ctx, &vstats->strings, TEXTOID, pstats->freq, false);
+	jsonAnalyzePathValues(ctx, &vstats->numerics, NUMERICOID, pstats->freq, false);
+#endif
+
+	/* Build jsonb with path stats */
+	stats = jsonAnalyzeBuildPathStats(pstats);
+
+	/* Copy stats to non-temporary context */
+	return memcpy(MemoryContextAlloc(ctx->stats->anl_context, VARSIZE(stats)),
+				  stats, VARSIZE(stats));
+}
+
+/*
+ * JsonPathStatsCompare
+ *		Compare two path stats (by path string).
+ *
+ * We store the stats sorted by path string, and this is the comparator.
+ */
+static int
+JsonPathStatsCompare(const void *pv1, const void *pv2)
+{
+	return strcmp((*((const JsonPathEntry **) pv1))->pathstr,
+				  (*((const JsonPathEntry **) pv2))->pathstr);
+}
+
+/*
+ * jsonAnalyzeSortPaths
+ *		Reads all stats stored in the hash table and sorts them.
+ */
+static JsonPathEntry **
+jsonAnalyzeSortPaths(JsonAnalyzeContext *ctx, int *p_npaths)
+{
+	HASH_SEQ_STATUS	hseq;
+	JsonPathEntry *path;
+	JsonPathEntry **paths;
+	int			npaths;
+
+	npaths = hash_get_num_entries(ctx->pathshash) + 1;
+	paths = MemoryContextAlloc(ctx->mcxt, sizeof(*paths) * npaths);
+
+	paths[0] = &ctx->root->path;
+
+	hash_seq_init(&hseq, ctx->pathshash);
+
+	for (int i = 1; (path = hash_seq_search(&hseq)) != NULL; i++)
+		paths[i] = path;
+
+	pg_qsort(paths, npaths, sizeof(*paths), JsonPathStatsCompare);
+
+	*p_npaths = npaths;
+	return paths;
+}
+
+/*
+ * jsonAnalyzeBuildPathStatsArray
+ *		Build jsonb datum array for path stats, that will be used as stavalues.
+ *
+ * The first element is a path prefix.
+ */
+static Datum *
+jsonAnalyzeBuildPathStatsArray(Jsonb **pstats, int npaths, int *nvals,
+							   const char *prefix, int prefixlen)
+{
+	Datum	   *values = palloc(sizeof(Datum) * (npaths + 1));
+	JsonbValue *jbvprefix = palloc(sizeof(JsonbValue));
+	int			i;
+
+	JsonValueInitStringWithLen(jbvprefix,
+							   memcpy(palloc(prefixlen), prefix, prefixlen),
+							   prefixlen);
+
+	values[0] = JsonbPGetDatum(JsonbValueToJsonb(jbvprefix));
+
+	for (i = 0; i < npaths; i++)
+		values[i + 1] = JsonbPGetDatum(pstats[i]);
+
+	*nvals = npaths + 1;
+
+	return values;
+}
+
+/*
+ * jsonAnalyzeMakeStats
+ *		Build stavalues jsonb array for the root path prefix.
+ */
+static Datum *
+jsonAnalyzeMakeStats(JsonAnalyzeContext *ctx, Jsonb **paths,
+					 int npaths, int *numvalues)
+{
+	Datum	   *values;
+	MemoryContext oldcxt = MemoryContextSwitchTo(ctx->stats->anl_context);
+
+	values = jsonAnalyzeBuildPathStatsArray(paths, npaths, numvalues,
+											JSON_PATH_ROOT, JSON_PATH_ROOT_LEN);
+
+	MemoryContextSwitchTo(oldcxt);
+
+	return values;
+}
+
+/*
+ * jsonAnalyzeBuildSubPathsData
+ *		Build statvalues and stanumbers arrays for the subset of paths starting
+ *		from a given prefix.
+ *
+ * pathsDatums[index] should point to the desired path.
+ */
+bool
+jsonAnalyzeBuildSubPathsData(Datum *pathsDatums, int npaths, int index,
+							 const char	*path, int pathlen,
+							 bool includeSubpaths, float4 nullfrac,
+							 Datum *pvals, Datum *pnums)
+{
+	Jsonb	  **pvalues = palloc(sizeof(*pvalues) * npaths);
+	Datum	   *values;
+	Datum		numbers[1];
+	JsonbValue	pathkey;
+	int			nsubpaths = 0;
+	int			nvalues;
+	int			i;
+
+	JsonValueInitStringWithLen(&pathkey, "path", 4);
+
+	for (i = index; i < npaths; i++)
+	{
+		/* Extract path name */
+		Jsonb	   *jb = DatumGetJsonbP(pathsDatums[i]);
+		JsonbValue *jbv = findJsonbValueFromContainer(&jb->root, JB_FOBJECT,
+													  &pathkey);
+
+		/* Check if path name starts with a given prefix */
+		if (!jbv || jbv->type != jbvString ||
+			jbv->val.string.len < pathlen ||
+			memcmp(jbv->val.string.val, path, pathlen))
+			break;
+
+		pfree(jbv);
+
+		/* Collect matching path */
+		pvalues[nsubpaths] = jb;
+
+		nsubpaths++;
+
+		/*
+		 * The path should go before its subpaths, so if subpaths are not
+		 * needed the loop is broken after the first matching path.
+		 */
+		if (!includeSubpaths)
+			break;
+	}
+
+	if (!nsubpaths)
+	{
+		pfree(pvalues);
+		return false;
+	}
+
+	/* Construct new array from the selected paths */
+	values = jsonAnalyzeBuildPathStatsArray(pvalues, nsubpaths, &nvalues,
+											path, pathlen);
+	*pvals = PointerGetDatum(construct_array(values, nvalues, JSONBOID, -1,
+											 false, 'i'));
+
+	pfree(pvalues);
+	pfree(values);
+
+	numbers[0] = Float4GetDatum(nullfrac);
+	*pnums = PointerGetDatum(construct_array(numbers, 1, FLOAT4OID, 4,
+											 true /*FLOAT4PASSBYVAL*/, 'i'));
+
+	return true;
+}
+
+/*
+ * jsonAnalyzeInit
+ *		Initialize the analyze context so that we can start adding paths.
+ */
+static void
+jsonAnalyzeInit(JsonAnalyzeContext *ctx, VacAttrStats *stats,
+				AnalyzeAttrFetchFunc fetchfunc,
+				int samplerows, double totalrows, bool single_pass)
+{
+	HASHCTL	hash_ctl;
+
+	memset(ctx, 0, sizeof(*ctx));
+
+	ctx->stats = stats;
+	ctx->fetchfunc = fetchfunc;
+	ctx->mcxt = CurrentMemoryContext;
+	ctx->samplerows = samplerows;
+	ctx->totalrows = totalrows;
+	ctx->target = stats->attr->attstattarget;
+	ctx->scalarsOnly = false;
+	ctx->single_pass = single_pass;
+
+	MemSet(&hash_ctl, 0, sizeof(hash_ctl));
+	hash_ctl.keysize = sizeof(JsonPathEntry);
+	hash_ctl.entrysize = ctx->single_pass ? sizeof(JsonPathAnlStats) : sizeof(JsonPathAnlDocs);
+	hash_ctl.hash = JsonPathEntryHash;
+	hash_ctl.match = JsonPathEntryMatch;
+	hash_ctl.hcxt = ctx->mcxt;
+
+	ctx->pathshash = hash_create("JSON analyze path table", 100, &hash_ctl,
+					HASH_ELEM | HASH_FUNCTION | HASH_COMPARE | HASH_CONTEXT);
+
+	ctx->root = MemoryContextAllocZero(ctx->mcxt, sizeof(JsonPathAnlStats));
+	ctx->root->path.pathstr = JSON_PATH_ROOT;
+}
+
+/*
+ * jsonAnalyzePass
+ *		One analysis pass over the JSON column.
+ *
+ * Performs one analysis pass on the JSON documents, and passes them to the
+ * custom analyzefunc.
+ */
+static void
+jsonAnalyzePass(JsonAnalyzeContext *ctx,
+				void (*analyzefunc)(JsonAnalyzeContext *, Jsonb *, void *),
+				void *analyzearg,
+				JsonPathDocBitmap *bitmap)
+{
+	MemoryContext	tmpcxt = AllocSetContextCreate(CurrentMemoryContext,
+												"Json Analyze Pass Context",
+												ALLOCSET_DEFAULT_MINSIZE,
+												ALLOCSET_DEFAULT_INITSIZE,
+												ALLOCSET_DEFAULT_MAXSIZE);
+	MemoryContext	oldcxt = MemoryContextSwitchTo(tmpcxt);
+	int			row_num = -1;
+
+	ctx->null_cnt = 0;
+	ctx->analyzed_cnt = 0;
+	ctx->total_width = 0;
+
+	/* Loop over the jsonbs. */
+	for (int i = 0; i < (bitmap ? bitmap->size : ctx->samplerows); i++)
+	{
+		Datum		value;
+		Jsonb	   *jb;
+		Size		width;
+		bool		isnull;
+
+		vacuum_delay_point();
+
+		if (bitmap)
+		{
+			if (bitmap->is_list)
+				row_num = bitmap->data.list[i];
+			else if (!jsonStatsBitmapNext(bitmap, &row_num))
+				break;
+		}
+		else
+			row_num = i;
+
+		value = ctx->fetchfunc(ctx->stats, row_num, &isnull);
+
+		if (isnull)
+		{
+			/* json is null, just count that */
+			ctx->null_cnt++;
+			continue;
+		}
+
+		width = toast_raw_datum_size(value);
+
+		ctx->total_width += VARSIZE_ANY(DatumGetPointer(value)); /* FIXME raw width? */
+
+		/* Skip too-large values. */
+#define JSON_WIDTH_THRESHOLD (100 * 1024)
+
+		if (width > JSON_WIDTH_THRESHOLD)
+			continue;
+
+		ctx->analyzed_cnt++;
+
+		jb = DatumGetJsonbP(value);
+
+		if (!ctx->single_pass)
+			MemoryContextSwitchTo(oldcxt);
+
+		ctx->current_rownum = row_num;
+		analyzefunc(ctx, jb, analyzearg);
+
+		if (!ctx->single_pass)
+			oldcxt = MemoryContextSwitchTo(tmpcxt);
+
+		MemoryContextReset(tmpcxt);
+	}
+
+	MemoryContextSwitchTo(oldcxt);
+}
+
+/*
+ * compute_json_stats() -- compute statistics for a json column
+ */
+static void
+compute_json_stats(VacAttrStats *stats, AnalyzeAttrFetchFunc fetchfunc,
+				   int samplerows, double totalrows)
+{
+	JsonAnalyzeContext	ctx;
+	JsonPathEntry **paths;
+	Jsonb	  **pstats;
+	int			npaths;
+	int			root_analyzed_cnt;
+	int			root_null_cnt;
+	double		root_total_width;
+
+	jsonAnalyzeInit(&ctx, stats, fetchfunc, samplerows, totalrows,
+					false /* FIXME make GUC or simply remove */);
+
+	/*
+	 * Collect and analyze JSON path values in single or multiple passes.
+	 * Sigle-pass collection is faster but consumes much more memory than
+	 * collecting and analyzing by the one path at pass.
+	 */
+	if (ctx.single_pass)
+	{
+		/* Collect all values of all paths */
+		jsonAnalyzePass(&ctx, jsonAnalyzeCollectPaths, (void *)(intptr_t) true, NULL);
+
+		root_analyzed_cnt = ctx.analyzed_cnt;
+		root_null_cnt = ctx.null_cnt;
+		root_total_width = ctx.total_width;
+
+		/*
+		 * Now that we're done with processing the documents, we sort the paths
+		 * we extracted and calculate stats for each of them.
+		 *
+		 * XXX I wonder if we could do this in two phases, to maybe not collect
+		 * (or even accumulate) values for paths that are not interesting.
+		 */
+		paths = jsonAnalyzeSortPaths(&ctx, &npaths);
+		pstats = palloc(sizeof(*pstats) * npaths);
+
+		for (int i = 0; i < npaths; i++)
+		{
+			JsonPathAnlStats *astats = (JsonPathAnlStats *) paths[i];
+			JsonPathAnlStats *parent = (JsonPathAnlStats *) paths[i]->parent;
+			JsonPathParentStats parent_stats;
+
+			if (parent)
+			{
+				parent_stats.freq = parent->freq;
+				parent_stats.count = parent->vstats.jsons.values.count;
+				parent_stats.narrays = parent->vstats.narrays;
+			}
+
+			pstats[i] = jsonAnalyzePath(&ctx, astats,
+										parent ? &parent_stats : NULL);
+		}
+	}
+	else
+	{
+		MemoryContext	oldcxt;
+		MemoryContext	tmpcxt = AllocSetContextCreate(CurrentMemoryContext,
+													"Json Analyze Tmp Context",
+													ALLOCSET_DEFAULT_MINSIZE,
+													ALLOCSET_DEFAULT_INITSIZE,
+													ALLOCSET_DEFAULT_MAXSIZE);
+		JsonPathParentStats *stack;
+
+		elog(DEBUG1, "analyzing %s attribute \"%s\"",
+			stats->attrtypid == JSONBOID ? "jsonb" : "json",
+			NameStr(stats->attr->attname));
+
+		elog(DEBUG1, "collecting json paths");
+
+		oldcxt = MemoryContextSwitchTo(tmpcxt);
+
+		/* Collect all paths first without accumulating any Values, sort them */
+		jsonAnalyzePass(&ctx, jsonAnalyzeCollectPaths, (void *)(intptr_t) false, NULL);
+		paths = jsonAnalyzeSortPaths(&ctx, &npaths);
+		pstats = MemoryContextAlloc(oldcxt, sizeof(*pstats) * npaths);
+		stack = MemoryContextAlloc(oldcxt, sizeof(*stack) * (ctx.maxdepth + 1));
+
+		root_analyzed_cnt = ctx.analyzed_cnt;
+		root_null_cnt = ctx.null_cnt;
+		root_total_width = ctx.total_width;
+
+		/*
+		 * Next, process each path independently to save memory (we don't want
+		 * to accumulate all values for all paths, with a lot of duplicities).
+		 */
+		MemoryContextReset(tmpcxt);
+
+		for (int i = 0; i < npaths; i++)
+		{
+			JsonPathEntry *path = paths[i];
+			JsonPathAnlStats astats_tmp;
+			JsonPathAnlStats *astats;
+
+			if (!i)
+				astats = ctx.root;
+			else
+			{
+				astats = &astats_tmp;
+				jsonStatsAnlInit(astats);
+				astats->path = *path;
+			}
+
+			elog(DEBUG1, "analyzing json path (%d/%d) %s",
+				 i + 1, npaths, path->pathstr);
+
+			jsonAnalyzePass(&ctx, jsonAnalyzeCollectPath, astats,
+							/* root has no bitmap */
+							i > 0 ? &((JsonPathAnlDocs *) path)->bitmap : NULL);
+
+			pstats[i] = jsonAnalyzePath(&ctx, astats,
+										path->depth ? &stack[path->depth - 1] : NULL);
+
+			/* Save parent stats in the stack */
+			stack[path->depth].freq = astats->freq;
+			stack[path->depth].count = astats->vstats.jsons.values.count;
+			stack[path->depth].narrays = astats->vstats.narrays;
+
+			MemoryContextReset(tmpcxt);
+		}
+
+		MemoryContextSwitchTo(oldcxt);
+
+		MemoryContextDelete(tmpcxt);
+	}
+
+	/* We can only compute real stats if we found some non-null values. */
+	if (root_null_cnt >= samplerows)
+	{
+		/* We found only nulls; assume the column is entirely null */
+		stats->stats_valid = true;
+		stats->stanullfrac = 1.0;
+		stats->stawidth = 0;		/* "unknown" */
+		stats->stadistinct = 0.0;	/* "unknown" */
+	}
+	else if (!root_analyzed_cnt)
+	{
+		int	nonnull_cnt = samplerows - root_null_cnt;
+
+		/* We found some non-null values, but they were all too wide */
+		stats->stats_valid = true;
+		/* Do the simple null-frac and width stats */
+		stats->stanullfrac = (double) root_null_cnt / (double) samplerows;
+		stats->stawidth = root_total_width / (double) nonnull_cnt;
+		/* Assume all too-wide values are distinct, so it's a unique column */
+		stats->stadistinct = -1.0 * (1.0 - stats->stanullfrac);
+	}
+	else
+	{
+		VacAttrStats   *jsstats = &ctx.root->vstats.jsons.stats;
+		int				i;
+		int				empty_slot = -1;
+
+		stats->stats_valid = true;
+
+		stats->stanullfrac	= jsstats->stanullfrac;
+		stats->stawidth		= jsstats->stawidth;
+		stats->stadistinct	= jsstats->stadistinct;
+
+		/*
+		 * We need to store the statistics the statistics slots. We simply
+		 * store the regular stats in the first slots, and then we put the
+		 * JSON stats into the first empty slot.
+		 */
+		for (i = 0; i < STATISTIC_NUM_SLOTS; i++)
+		{
+			/* once we hit an empty slot, we're done */
+			if (!jsstats->staop[i])
+			{
+				empty_slot = i;		/* remember the empty slot */
+				break;
+			}
+
+			stats->stakind[i] 		= jsstats->stakind[i];
+			stats->staop[i] 		= jsstats->staop[i];
+			stats->stanumbers[i] 	= jsstats->stanumbers[i];
+			stats->stavalues[i] 	= jsstats->stavalues[i];
+			stats->statypid[i] 		= jsstats->statypid[i];
+			stats->statyplen[i] 	= jsstats->statyplen[i];
+			stats->statypbyval[i] 	= jsstats->statypbyval[i];
+			stats->statypalign[i] 	= jsstats->statypalign[i];
+			stats->numnumbers[i] 	= jsstats->numnumbers[i];
+			stats->numvalues[i] 	= jsstats->numvalues[i];
+		}
+
+		Assert((empty_slot >= 0) && (empty_slot < STATISTIC_NUM_SLOTS));
+
+		stats->stakind[empty_slot] = STATISTIC_KIND_JSON;
+		stats->staop[empty_slot] = InvalidOid;
+		stats->numnumbers[empty_slot] = 1;
+		stats->stanumbers[empty_slot] = MemoryContextAlloc(stats->anl_context,
+														   sizeof(float4));
+		stats->stanumbers[empty_slot][0] = 0.0; /* nullfrac */
+		stats->stavalues[empty_slot] =
+			jsonAnalyzeMakeStats(&ctx, pstats, npaths,
+								 &stats->numvalues[empty_slot]);
+
+		/* We are storing jsonb values */
+		stats->statypid[empty_slot] = JSONBOID;
+		get_typlenbyvalalign(stats->statypid[empty_slot],
+							 &stats->statyplen[empty_slot],
+							 &stats->statypbyval[empty_slot],
+							 &stats->statypalign[empty_slot]);
+	}
+}
+
+/*
+ * json_typanalyze -- typanalyze function for jsonb
+ */
+Datum
+jsonb_typanalyze(PG_FUNCTION_ARGS)
+{
+	VacAttrStats *stats = (VacAttrStats *) PG_GETARG_POINTER(0);
+	Form_pg_attribute attr = stats->attr;
+
+	/* If the attstattarget column is negative, use the default value */
+	/* NB: it is okay to scribble on stats->attr since it's a copy */
+	if (attr->attstattarget < 0)
+		attr->attstattarget = default_statistics_target;
+
+	stats->compute_stats = compute_json_stats;
+	/* see comment about the choice of minrows in commands/analyze.c */
+	stats->minrows = 300 * attr->attstattarget;
+
+	PG_RETURN_BOOL(true);
+}
diff --git a/src/backend/utils/adt/jsonpath_exec.c b/src/backend/utils/adt/jsonpath_exec.c
index eff3734b6ab..94ae6e1f385 100644
--- a/src/backend/utils/adt/jsonpath_exec.c
+++ b/src/backend/utils/adt/jsonpath_exec.c
@@ -1723,7 +1723,7 @@ executeLikeRegex(JsonPathItem *jsp, JsonbValue *str, JsonbValue *rarg,
 		cxt->cflags = jspConvertRegexFlags(jsp->content.like_regex.flags);
 	}
 
-	if (RE_compile_and_execute(cxt->regex, str->val.string.val,
+	if (RE_compile_and_execute(cxt->regex, unconstify(char *, str->val.string.val),
 							   str->val.string.len,
 							   cxt->cflags, DEFAULT_COLLATION_OID, 0, NULL))
 		return jpbTrue;
diff --git a/src/include/catalog/pg_operator.dat b/src/include/catalog/pg_operator.dat
index 8e0e65ad275..9805bb15038 100644
--- a/src/include/catalog/pg_operator.dat
+++ b/src/include/catalog/pg_operator.dat
@@ -3175,7 +3175,7 @@
 { oid => '3211', oid_symbol => 'JsonbObjectFieldOperator',
   descr => 'get jsonb object field',
   oprname => '->', oprleft => 'jsonb', oprright => 'text', oprresult => 'jsonb',
-  oprcode => 'jsonb_object_field' },
+  oprcode => 'jsonb_object_field', oprstat => 'jsonb_stats' },
 { oid => '3477', oid_symbol => 'JsonbObjectFieldTextOperator',
   descr => 'get jsonb object field as text',
   oprname => '->>', oprleft => 'jsonb', oprright => 'text', oprresult => 'text',
@@ -3183,7 +3183,7 @@
 { oid => '3212', oid_symbol => 'JsonbArrayElementOperator',
   descr => 'get jsonb array element',
   oprname => '->', oprleft => 'jsonb', oprright => 'int4', oprresult => 'jsonb',
-  oprcode => 'jsonb_array_element' },
+  oprcode => 'jsonb_array_element', oprstat => 'jsonb_stats' },
 { oid => '3481', oid_symbol => 'JsonbArrayElementTextOperator',
   descr => 'get jsonb array element as text',
   oprname => '->>', oprleft => 'jsonb', oprright => 'int4', oprresult => 'text',
@@ -3191,7 +3191,8 @@
 { oid => '3213', oid_symbol => 'JsonbExtractPathOperator',
   descr => 'get value from jsonb with path elements',
   oprname => '#>', oprleft => 'jsonb', oprright => '_text',
-  oprresult => 'jsonb', oprcode => 'jsonb_extract_path' },
+  oprresult => 'jsonb', oprcode => 'jsonb_extract_path',
+  oprstat => 'jsonb_stats' },
 { oid => '3206', oid_symbol => 'JsonbExtractPathTextOperator',
   descr => 'get value from jsonb as text with path elements',
   oprname => '#>>', oprleft => 'jsonb', oprright => '_text',
@@ -3229,23 +3230,23 @@
 { oid => '3246', oid_symbol => 'JsonbContainsOperator', descr => 'contains',
   oprname => '@>', oprleft => 'jsonb', oprright => 'jsonb', oprresult => 'bool',
   oprcom => '<@(jsonb,jsonb)', oprcode => 'jsonb_contains',
-  oprrest => 'matchingsel', oprjoin => 'matchingjoinsel' },
+  oprrest => 'jsonb_sel', oprjoin => 'matchingjoinsel' },
 { oid => '3247', oid_symbol => 'JsonbExistsOperator', descr => 'key exists',
   oprname => '?', oprleft => 'jsonb', oprright => 'text', oprresult => 'bool',
-  oprcode => 'jsonb_exists', oprrest => 'matchingsel',
+  oprcode => 'jsonb_exists', oprrest => 'jsonb_sel',
   oprjoin => 'matchingjoinsel' },
 { oid => '3248', oid_symbol => 'JsonbExistsAnyOperator', descr => 'any key exists',
   oprname => '?|', oprleft => 'jsonb', oprright => '_text', oprresult => 'bool',
-  oprcode => 'jsonb_exists_any', oprrest => 'matchingsel',
+  oprcode => 'jsonb_exists_any', oprrest => 'jsonb_sel',
   oprjoin => 'matchingjoinsel' },
 { oid => '3249', oid_symbol => 'JsonbExistsAllOperator', descr => 'all keys exist',
   oprname => '?&', oprleft => 'jsonb', oprright => '_text', oprresult => 'bool',
-  oprcode => 'jsonb_exists_all', oprrest => 'matchingsel',
+  oprcode => 'jsonb_exists_all', oprrest => 'jsonb_sel',
   oprjoin => 'matchingjoinsel' },
 { oid => '3250', oid_symbol => 'JsonbContainedOperator', descr => 'is contained by',
   oprname => '<@', oprleft => 'jsonb', oprright => 'jsonb', oprresult => 'bool',
   oprcom => '@>(jsonb,jsonb)', oprcode => 'jsonb_contained',
-  oprrest => 'matchingsel', oprjoin => 'matchingjoinsel' },
+  oprrest => 'jsonb_sel', oprjoin => 'matchingjoinsel' },
 { oid => '3284', descr => 'concatenate',
   oprname => '||', oprleft => 'jsonb', oprright => 'jsonb',
   oprresult => 'jsonb', oprcode => 'jsonb_concat' },
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index d8e8715ed1c..5c8d65aa0d1 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11747,4 +11747,15 @@
   prorettype => 'bytea', proargtypes => 'pg_brin_minmax_multi_summary',
   prosrc => 'brin_minmax_multi_summary_send' },
 
+# jsonb statistics
+{ oid => '8526', descr => 'jsonb typanalyze',
+  proname => 'jsonb_typanalyze', provolatile => 's', prorettype => 'bool',
+  proargtypes => 'internal', prosrc => 'jsonb_typanalyze' },
+{ oid => '8527', descr => 'jsonb selectivity estimation',
+  proname => 'jsonb_sel', provolatile => 's', prorettype => 'float8',
+  proargtypes => 'internal oid internal int4', prosrc => 'jsonb_sel' },
+{ oid => '8528', descr => 'jsonb statsistics estimation',
+  proname => 'jsonb_stats', provolatile => 's', prorettype => 'void',
+  proargtypes => 'internal internal int4 internal', prosrc => 'jsonb_stats' },
+
 ]
diff --git a/src/include/catalog/pg_statistic.h b/src/include/catalog/pg_statistic.h
index cdf74481398..c4f53ebd1ba 100644
--- a/src/include/catalog/pg_statistic.h
+++ b/src/include/catalog/pg_statistic.h
@@ -277,6 +277,8 @@ DECLARE_FOREIGN_KEY((starelid, staattnum), pg_attribute, (attrelid, attnum));
  */
 #define STATISTIC_KIND_BOUNDS_HISTOGRAM  7
 
+#define STATISTIC_KIND_JSON 8
+
 #endif							/* EXPOSE_TO_CLIENT_CODE */
 
 #endif							/* PG_STATISTIC_H */
diff --git a/src/include/catalog/pg_type.dat b/src/include/catalog/pg_type.dat
index df458794635..b867db42f45 100644
--- a/src/include/catalog/pg_type.dat
+++ b/src/include/catalog/pg_type.dat
@@ -445,7 +445,7 @@
   typname => 'jsonb', typlen => '-1', typbyval => 'f', typcategory => 'U',
   typsubscript => 'jsonb_subscript_handler', typinput => 'jsonb_in',
   typoutput => 'jsonb_out', typreceive => 'jsonb_recv', typsend => 'jsonb_send',
-  typalign => 'i', typstorage => 'x' },
+  typanalyze => 'jsonb_typanalyze', typalign => 'i', typstorage => 'x' },
 { oid => '4072', array_type_oid => '4073', descr => 'JSON path',
   typname => 'jsonpath', typlen => '-1', typbyval => 'f', typcategory => 'U',
   typinput => 'jsonpath_in', typoutput => 'jsonpath_out',
diff --git a/src/include/utils/json_selfuncs.h b/src/include/utils/json_selfuncs.h
new file mode 100644
index 00000000000..9a36567ae65
--- /dev/null
+++ b/src/include/utils/json_selfuncs.h
@@ -0,0 +1,113 @@
+/*-------------------------------------------------------------------------
+ *
+ * json_selfuncs.h
+ *	  JSON cost estimation functions.
+ *
+ *
+ * Portions Copyright (c) 2016-2022, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *    src/include/utils/json_selfuncs.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef JSON_SELFUNCS_H_
+#define JSON_SELFUNCS_H 1
+
+#include "postgres.h"
+#include "access/htup.h"
+#include "utils/jsonb.h"
+#include "utils/lsyscache.h"
+#include "utils/selfuncs.h"
+
+#define JSON_PATH_ROOT "$"
+#define JSON_PATH_ROOT_LEN 1
+
+#define JSON_PATH_ROOT_ARRAY "$[*]"
+#define JSON_PATH_ROOT_ARRAY_LEN 4
+
+typedef enum
+{
+	JsonPathStatsValues,
+	JsonPathStatsArrayLength,
+	JsonPathStatsObjectLength
+} JsonPathStatsType;
+
+typedef struct JsonStatData JsonStatData, *JsonStats;
+
+/* Per-path JSON stats */
+typedef struct JsonPathStatsData
+{
+	JsonStats	data;			/* pointer to per-column control structure */
+	Datum	   *datum;			/* pointer to JSONB datum with stats data */
+	const char *path;			/* path string, points directly to JSONB data */
+	int			pathlen;		/* path length */
+	JsonPathStatsType type;		/* type of stats (values, lengths etc.) */
+} JsonPathStatsData, *JsonPathStats;
+
+/* Per-column JSON stats */
+struct JsonStatData
+{
+	HeapTuple	statsTuple;		/* original pg_statistic tuple */
+	AttStatsSlot attslot;		/* data extracted from STATISTIC_KIND_JSON
+								 * slot of statsTuple */
+	RelOptInfo *rel;			/* Relation, or NULL if not identifiable */
+	Datum	   *pathdatums;		/* path JSONB datums */
+	JsonPathStatsData *paths;	/* cached paths */
+	int			npaths;			/* number of paths */
+	float4		nullfrac;		/* NULL fraction */
+	const char *prefix;			/* global path prefix which needs to be used
+								 * for searching in pathdatums */
+	int			prefixlen;		/* path prefix length */
+	bool		acl_ok;			/* ACL check is Ok */
+};
+
+typedef enum JsonStatType
+{
+	JsonStatJsonb,
+	JsonStatJsonbWithoutSubpaths,
+	JsonStatText,
+	JsonStatFloat4,
+	JsonStatString,
+	JsonStatNumeric,
+	JsonStatFreq,
+} JsonStatType;
+
+extern bool jsonStatsInit(JsonStats stats, const VariableStatData *vardata);
+extern void jsonStatsRelease(JsonStats data);
+
+extern JsonPathStats jsonStatsGetPathByStr(JsonStats stats,
+										   const char *path, int pathlen);
+
+extern JsonPathStats jsonPathStatsGetSubpath(JsonPathStats stats,
+											 const char *subpath);
+
+extern bool jsonPathStatsGetNextSubpathStats(JsonPathStats stats,
+											 JsonPathStats *keystats,
+											 bool keysOnly);
+
+extern JsonPathStats jsonPathStatsGetArrayLengthStats(JsonPathStats pstats);
+extern JsonPathStats jsonPathStatsGetObjectLengthStats(JsonPathStats pstats);
+
+extern float4 jsonPathStatsGetFreq(JsonPathStats pstats, float4 defaultfreq);
+
+extern float4 jsonPathStatsGetTypeFreq(JsonPathStats pstats,
+									JsonbValueType type, float4 defaultfreq);
+
+extern float4 jsonPathStatsGetAvgArraySize(JsonPathStats pstats);
+
+extern Selectivity jsonPathStatsGetArrayIndexSelectivity(JsonPathStats pstats,
+														 int index);
+
+extern Selectivity jsonSelectivity(JsonPathStats stats, Datum scalar, Oid oper);
+
+extern void jsonPathAppendEntry(StringInfo path, const char *entry);
+
+extern bool jsonAnalyzeBuildSubPathsData(Datum *pathsDatums,
+										 int npaths, int index,
+										 const char	*path, int pathlen,
+										 bool includeSubpaths, float4 nullfrac,
+										 Datum *pvals, Datum *pnums);
+
+#endif /* JSON_SELFUNCS_H */
diff --git a/src/test/regress/expected/jsonb_stats.out b/src/test/regress/expected/jsonb_stats.out
new file mode 100644
index 00000000000..c7b1e644099
--- /dev/null
+++ b/src/test/regress/expected/jsonb_stats.out
@@ -0,0 +1,713 @@
+CREATE OR REPLACE FUNCTION explain_jsonb(sql_query text)
+RETURNS TABLE(explain_line json) AS
+$$
+BEGIN
+	RETURN QUERY EXECUTE 'EXPLAIN (ANALYZE, FORMAT json) ' || sql_query;
+END;
+$$ LANGUAGE plpgsql;
+CREATE OR REPLACE FUNCTION get_plan_and_actual_rows(sql_query text)
+RETURNS TABLE(plan integer, actual integer) AS
+$$
+	SELECT
+		(plan->>'Plan Rows')::integer plan,
+		(plan->>'Actual Rows')::integer actual
+	FROM (
+		SELECT explain_jsonb(sql_query) #> '{0,Plan,Plans,0}'
+	) p(plan)
+$$ LANGUAGE sql;
+CREATE OR REPLACE FUNCTION check_estimate(sql_query text, accuracy real)
+RETURNS boolean AS
+$$
+	SELECT plan BETWEEN actual / (1 + accuracy) AND (actual + 1) * (1 + accuracy)
+	FROM (SELECT * FROM get_plan_and_actual_rows(sql_query)) x
+$$ LANGUAGE sql;
+CREATE OR REPLACE FUNCTION check_estimate2(sql_query text, accuracy real)
+RETURNS TABLE(min integer, max integer) AS
+$$
+	SELECT (actual * (1 - accuracy))::integer, ((actual + 1) * (1 + accuracy))::integer
+	FROM (SELECT * FROM get_plan_and_actual_rows(sql_query)) x
+$$ LANGUAGE sql;
+CREATE TABLE jsonb_stats_test(js jsonb);
+INSERT INTO jsonb_stats_test SELECT NULL FROM generate_series(1, 1000);
+INSERT INTO jsonb_stats_test SELECT 'null' FROM generate_series(1, 200);
+INSERT INTO jsonb_stats_test SELECT 'true' FROM generate_series(1, 300);
+INSERT INTO jsonb_stats_test SELECT 'false' FROM generate_series(1, 500);
+INSERT INTO jsonb_stats_test SELECT '12345' FROM generate_series(1, 100);
+INSERT INTO jsonb_stats_test SELECT (1000 * (i % 10))::text::jsonb FROM generate_series(1, 400) i;
+INSERT INTO jsonb_stats_test SELECT i::text::jsonb FROM generate_series(1, 500) i;
+INSERT INTO jsonb_stats_test SELECT '"foo"' FROM generate_series(1, 100);
+INSERT INTO jsonb_stats_test SELECT format('"bar%s"', i % 10)::jsonb FROM generate_series(1, 400) i;
+INSERT INTO jsonb_stats_test SELECT format('"baz%s"', i)::jsonb FROM generate_series(1, 500) i;
+INSERT INTO jsonb_stats_test SELECT '{}' FROM generate_series(1, 100);
+INSERT INTO jsonb_stats_test SELECT jsonb_build_object('foo', 'bar') FROM generate_series(1, 100);
+INSERT INTO jsonb_stats_test SELECT jsonb_build_object('foo', 'baz' || (i % 10)) FROM generate_series(1, 300) i;
+INSERT INTO jsonb_stats_test SELECT jsonb_build_object('foo', i % 10) FROM generate_series(1, 200) i;
+INSERT INTO jsonb_stats_test SELECT jsonb_build_object('"foo \"bar"', i % 10) FROM generate_series(1, 200) i;
+INSERT INTO jsonb_stats_test SELECT '[]' FROM generate_series(1, 100);
+INSERT INTO jsonb_stats_test SELECT '["foo"]' FROM generate_series(1, 200);
+INSERT INTO jsonb_stats_test SELECT '[12345]' FROM generate_series(1, 300);
+INSERT INTO jsonb_stats_test SELECT '[["foo"]]' FROM generate_series(1, 200);
+INSERT INTO jsonb_stats_test SELECT '[{"key": "foo"}]' FROM generate_series(1, 200);
+INSERT INTO jsonb_stats_test SELECT '[null, "foo"]' FROM generate_series(1, 200);
+INSERT INTO jsonb_stats_test SELECT '[null, 12345]' FROM generate_series(1, 300);
+INSERT INTO jsonb_stats_test SELECT '[null, ["foo"]]' FROM generate_series(1, 200);
+INSERT INTO jsonb_stats_test SELECT '[null, {"key": "foo"}]' FROM generate_series(1, 200);
+-- Build random variable-length integer arrays
+SELECT setseed(0.0);
+ setseed 
+---------
+ 
+(1 row)
+
+INSERT INTO jsonb_stats_test
+SELECT jsonb_build_object('array',
+	jsonb_build_array())
+FROM generate_series(1, 1000);
+INSERT INTO jsonb_stats_test
+SELECT jsonb_build_object('array',
+	jsonb_build_array(
+		floor(random() * 10)::int))
+FROM generate_series(1, 4000);
+INSERT INTO jsonb_stats_test
+SELECT jsonb_build_object('array',
+	jsonb_build_array(
+		floor(random() * 10)::int,
+		floor(random() * 10)::int))
+FROM generate_series(1, 3000);
+INSERT INTO jsonb_stats_test
+SELECT jsonb_build_object('array',
+	jsonb_build_array(
+		floor(random() * 10)::int,
+		floor(random() * 10)::int,
+		floor(random() * 10)::int))
+FROM generate_series(1, 2000);
+ANALYZE jsonb_stats_test;
+CREATE OR REPLACE FUNCTION check_jsonb_stats_test_estimate(sql_condition text, accuracy real)
+RETURNS boolean AS
+$$
+	SELECT check_estimate('SELECT count(*) FROM jsonb_stats_test WHERE ' || sql_condition, accuracy)
+$$ LANGUAGE sql;
+DROP FUNCTION IF EXISTS check_jsonb_stats_test_estimate2(text, real);
+NOTICE:  function check_jsonb_stats_test_estimate2(text,pg_catalog.float4) does not exist, skipping
+CREATE OR REPLACE FUNCTION check_jsonb_stats_test_estimate2(sql_condition text, accuracy real)
+RETURNS TABLE(plan integer, actual integer) AS
+$$
+	SELECT get_plan_and_actual_rows('SELECT count(*) FROM jsonb_stats_test WHERE ' || sql_condition)
+$$ LANGUAGE sql;
+-- Check NULL estimate
+SELECT check_jsonb_stats_test_estimate($$js IS NULL$$, 0.03);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'bad_key' IS NULL$$, 0.01);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js #> '{bad_key}' IS NULL$$, 0.01);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 1000000 IS NULL$$, 0.01);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js #> '{1000000}' IS NULL$$, 0.01);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'bad_key1' -> 'bad_key2' IS NULL$$, 0.01);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js #> '{bad_key1,bad_key2}' IS NULL$$, 0.01);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'bad_key1' -> 1 IS NULL$$, 0.01);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js #> '{bad_key1,1}' IS NULL$$, 0.01);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 1000000 -> 'foo' IS NULL$$, 0.01);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js #> '{1000000,foo}' IS NULL$$, 0.01);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'bad_key' = '123'$$, 0.01);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 1000000 = '123'$$, 0.01);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+-- Check null eq estimate
+SELECT check_jsonb_stats_test_estimate($$js =  'null'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> 'null'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+-- Check boolean eq estimate
+SELECT check_jsonb_stats_test_estimate($$js =  'true'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> 'true'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js =  'false'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> 'false'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+-- Check numeric eq estimate
+SELECT check_jsonb_stats_test_estimate($$js = '12345'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js#>'{}' = '12345'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js = '3000'$$, 0.3);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js = '1234'$$, 1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> '6000'$$, 0.2);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+-- Check numeric range estimate
+SELECT check_jsonb_stats_test_estimate($$js < '0'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js < '100'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js < '1000'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js < '3456'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js < '10000'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js < '100000'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js > '100' AND js < '600'$$, 0.5);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js > '6800' AND js < '12000'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+-- Check string eq estimate
+SELECT check_jsonb_stats_test_estimate($$js = '"foo"'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js = '"bar7"'$$, 0.2);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js = '"baz1234"'$$, 10);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> '"bar4"'$$, 0.3);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+-- Check string range estimate
+SELECT check_jsonb_stats_test_estimate($$js > '"foo"'$$, 0.01);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js > '"bar"'$$, 0.01);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js > '"baz"'$$, 0.01);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+-- Check object eq estimate
+SELECT check_jsonb_stats_test_estimate($$js = '{}'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js > '{}'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+-- Check object key eq estimate
+SELECT check_jsonb_stats_test_estimate($$js -> 'foo' = '"bar"'$$, 0.2);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'foo' = '"baz"'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'foo' = '"baz5"'$$, 0.3);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js #> '{foo}' = '"bar"'$$, 0.2);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+-- Check object key range estimate
+SELECT check_jsonb_stats_test_estimate($$js -> 'foo' >= '"baz2"'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'foo' <  '"baz9"'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'foo' >= '"baz2"' AND js -> 'foo' < '"baz9"'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+-- Check array eq estimate
+SELECT check_jsonb_stats_test_estimate($$js = '[]'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js >= '[]' AND js < '{}'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+-- Check variable-length array element eq estimate
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 0 = '1'$$, 0.2);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 1 = '6'$$, 0.2);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 2 = '8'$$, 0.2);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 3 = '1'$$, 0.2);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+-- Check variable-length array element range estimate
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 0 < '7'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 1 < '7'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 2 < '7'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 3 < '7'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+-- Check variable-length array containment estimate
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' @> '[]'$$, 0.2);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' @> '[1]'$$, 0.2);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' @> '[100]'$$, 0.2);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' @> '[1, 2]'$$, 1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' @> '[1, 100]'$$, 1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' @> '[1, 2, 100]'$$, 1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' @> '[1, 2, 3]'$$, 5);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' @> '1'$$, 0.3);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' @> '100'$$, 10);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 0 @> '1'$$, 0.3);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 1 @> '1'$$, 0.3);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 2 @> '1'$$, 0.3);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 3 @> '1'$$, 0.3);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 0 @> '[1]'$$, 0.3);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> '{"array": []}'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> '{"array": [1]}'$$, 0.3);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> '{"array": [100]}'$$, 0.3);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> '{"array": [1, 2]}'$$, 1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> '{"array": [1, 100]}'$$, 1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> '{"array": [1, 2, 100]}'$$, 1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> '{"array": [1, 2, 3]}'$$, 3);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+-- check misc containment
+SELECT check_jsonb_stats_test_estimate($$js @> '"foo"'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> '12345'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> '[]'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> '[12345]'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> '["foo"]'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> '[["foo", "bar"]]'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> '[["foo"]]'$$, 0.2);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> '[{"key": "foo"}]'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> '[null]'$$, 0.3);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+-- Check object key null estimate
+SELECT check_jsonb_stats_test_estimate($$js -> 'foo' IS NULL$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'foo' IS NOT NULL$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> '"foo \"bar"' IS NOT NULL$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'bad_key' IS NULL$$, 0.01);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'bad_key' IS NOT NULL$$, 0.01);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+-- Check object key existence
+SELECT check_jsonb_stats_test_estimate($$js ? 'bad_key'$$, 10);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js ? 'foo'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js ? 'array'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js ?| '{foo,bad_key}'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js ?| '{foo,array}'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js ?& '{foo,bad_key}'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js ?& '{foo,bar}'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index ac468568a1a..97b6f002c45 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2504,6 +2504,38 @@ pg_stats_ext_exprs| SELECT cn.nspname AS schemaname,
      LEFT JOIN pg_namespace sn ON ((sn.oid = s.stxnamespace)))
      JOIN LATERAL ( SELECT unnest(pg_get_statisticsobjdef_expressions(s.oid)) AS expr,
             unnest(sd.stxdexpr) AS a) stat ON ((stat.expr IS NOT NULL)));
+pg_stats_json| SELECT n.nspname AS schemaname,
+    c.relname AS tablename,
+    a.attname,
+    (paths.path ->> 'path'::text) AS json_path,
+    s.stainherit AS inherited,
+    (((paths.path -> 'json'::text) ->> 'nullfrac'::text))::real AS null_frac,
+    (((paths.path -> 'json'::text) ->> 'width'::text))::real AS avg_width,
+    (((paths.path -> 'json'::text) ->> 'distinct'::text))::real AS n_distinct,
+    ARRAY( SELECT val.value AS val
+           FROM jsonb_array_elements((((paths.path -> 'json'::text) -> 'mcv'::text) -> 'values'::text)) val(value)) AS most_common_vals,
+    ARRAY( SELECT ((num.value)::text)::real AS num
+           FROM jsonb_array_elements((((paths.path -> 'json'::text) -> 'mcv'::text) -> 'numbers'::text)) num(value)) AS most_common_freqs,
+    ARRAY( SELECT val.value AS val
+           FROM jsonb_array_elements((((paths.path -> 'json'::text) -> 'histogram'::text) -> 'values'::text)) val(value)) AS histogram_bounds,
+    ARRAY( SELECT ((val.value)::text)::integer AS val
+           FROM jsonb_array_elements((((paths.path -> 'array_length'::text) -> 'mcv'::text) -> 'values'::text)) val(value)) AS most_common_array_lengths,
+    ARRAY( SELECT ((num.value)::text)::real AS num
+           FROM jsonb_array_elements((((paths.path -> 'array_length'::text) -> 'mcv'::text) -> 'numbers'::text)) num(value)) AS most_common_array_length_freqs,
+    (((paths.path -> 'json'::text) ->> 'correlation'::text))::real AS correlation
+   FROM (((pg_statistic s
+     JOIN pg_class c ON ((c.oid = s.starelid)))
+     JOIN pg_attribute a ON (((c.oid = a.attrelid) AND (a.attnum = s.staattnum))))
+     LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace))),
+    LATERAL ( SELECT unnest((((
+                CASE
+                    WHEN (s.stakind1 = 8) THEN s.stavalues1
+                    WHEN (s.stakind2 = 8) THEN s.stavalues2
+                    WHEN (s.stakind3 = 8) THEN s.stavalues3
+                    WHEN (s.stakind4 = 8) THEN s.stavalues4
+                    WHEN (s.stakind5 = 8) THEN s.stavalues5
+                    ELSE NULL::anyarray
+                END)::text)::jsonb[])[2:]) AS path) paths;
 pg_tables| SELECT n.nspname AS schemaname,
     c.relname AS tablename,
     pg_get_userbyid(c.relowner) AS tableowner,
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 6d8f524ae9e..7117f220ed7 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -111,7 +111,7 @@ test: select_views portals_p2 foreign_key cluster dependency guc bitmapops combo
 # ----------
 # Another group of parallel tests (JSON related)
 # ----------
-test: json jsonb json_encoding jsonpath jsonpath_encoding jsonb_jsonpath
+test: json jsonb json_encoding jsonpath jsonpath_encoding jsonb_jsonpath jsonb_stats
 
 # ----------
 # Another group of parallel tests
diff --git a/src/test/regress/sql/jsonb_stats.sql b/src/test/regress/sql/jsonb_stats.sql
new file mode 100644
index 00000000000..fac71d09ded
--- /dev/null
+++ b/src/test/regress/sql/jsonb_stats.sql
@@ -0,0 +1,249 @@
+CREATE OR REPLACE FUNCTION explain_jsonb(sql_query text)
+RETURNS TABLE(explain_line json) AS
+$$
+BEGIN
+	RETURN QUERY EXECUTE 'EXPLAIN (ANALYZE, FORMAT json) ' || sql_query;
+END;
+$$ LANGUAGE plpgsql;
+
+CREATE OR REPLACE FUNCTION get_plan_and_actual_rows(sql_query text)
+RETURNS TABLE(plan integer, actual integer) AS
+$$
+	SELECT
+		(plan->>'Plan Rows')::integer plan,
+		(plan->>'Actual Rows')::integer actual
+	FROM (
+		SELECT explain_jsonb(sql_query) #> '{0,Plan,Plans,0}'
+	) p(plan)
+$$ LANGUAGE sql;
+
+CREATE OR REPLACE FUNCTION check_estimate(sql_query text, accuracy real)
+RETURNS boolean AS
+$$
+	SELECT plan BETWEEN actual / (1 + accuracy) AND (actual + 1) * (1 + accuracy)
+	FROM (SELECT * FROM get_plan_and_actual_rows(sql_query)) x
+$$ LANGUAGE sql;
+
+CREATE OR REPLACE FUNCTION check_estimate2(sql_query text, accuracy real)
+RETURNS TABLE(min integer, max integer) AS
+$$
+	SELECT (actual * (1 - accuracy))::integer, ((actual + 1) * (1 + accuracy))::integer
+	FROM (SELECT * FROM get_plan_and_actual_rows(sql_query)) x
+$$ LANGUAGE sql;
+
+CREATE TABLE jsonb_stats_test(js jsonb);
+
+INSERT INTO jsonb_stats_test SELECT NULL FROM generate_series(1, 1000);
+
+INSERT INTO jsonb_stats_test SELECT 'null' FROM generate_series(1, 200);
+INSERT INTO jsonb_stats_test SELECT 'true' FROM generate_series(1, 300);
+INSERT INTO jsonb_stats_test SELECT 'false' FROM generate_series(1, 500);
+
+INSERT INTO jsonb_stats_test SELECT '12345' FROM generate_series(1, 100);
+INSERT INTO jsonb_stats_test SELECT (1000 * (i % 10))::text::jsonb FROM generate_series(1, 400) i;
+INSERT INTO jsonb_stats_test SELECT i::text::jsonb FROM generate_series(1, 500) i;
+
+INSERT INTO jsonb_stats_test SELECT '"foo"' FROM generate_series(1, 100);
+INSERT INTO jsonb_stats_test SELECT format('"bar%s"', i % 10)::jsonb FROM generate_series(1, 400) i;
+INSERT INTO jsonb_stats_test SELECT format('"baz%s"', i)::jsonb FROM generate_series(1, 500) i;
+
+INSERT INTO jsonb_stats_test SELECT '{}' FROM generate_series(1, 100);
+INSERT INTO jsonb_stats_test SELECT jsonb_build_object('foo', 'bar') FROM generate_series(1, 100);
+INSERT INTO jsonb_stats_test SELECT jsonb_build_object('foo', 'baz' || (i % 10)) FROM generate_series(1, 300) i;
+INSERT INTO jsonb_stats_test SELECT jsonb_build_object('foo', i % 10) FROM generate_series(1, 200) i;
+INSERT INTO jsonb_stats_test SELECT jsonb_build_object('"foo \"bar"', i % 10) FROM generate_series(1, 200) i;
+
+INSERT INTO jsonb_stats_test SELECT '[]' FROM generate_series(1, 100);
+INSERT INTO jsonb_stats_test SELECT '["foo"]' FROM generate_series(1, 200);
+INSERT INTO jsonb_stats_test SELECT '[12345]' FROM generate_series(1, 300);
+INSERT INTO jsonb_stats_test SELECT '[["foo"]]' FROM generate_series(1, 200);
+INSERT INTO jsonb_stats_test SELECT '[{"key": "foo"}]' FROM generate_series(1, 200);
+INSERT INTO jsonb_stats_test SELECT '[null, "foo"]' FROM generate_series(1, 200);
+INSERT INTO jsonb_stats_test SELECT '[null, 12345]' FROM generate_series(1, 300);
+INSERT INTO jsonb_stats_test SELECT '[null, ["foo"]]' FROM generate_series(1, 200);
+INSERT INTO jsonb_stats_test SELECT '[null, {"key": "foo"}]' FROM generate_series(1, 200);
+
+-- Build random variable-length integer arrays
+SELECT setseed(0.0);
+
+INSERT INTO jsonb_stats_test
+SELECT jsonb_build_object('array',
+	jsonb_build_array())
+FROM generate_series(1, 1000);
+
+INSERT INTO jsonb_stats_test
+SELECT jsonb_build_object('array',
+	jsonb_build_array(
+		floor(random() * 10)::int))
+FROM generate_series(1, 4000);
+
+INSERT INTO jsonb_stats_test
+SELECT jsonb_build_object('array',
+	jsonb_build_array(
+		floor(random() * 10)::int,
+		floor(random() * 10)::int))
+FROM generate_series(1, 3000);
+
+INSERT INTO jsonb_stats_test
+SELECT jsonb_build_object('array',
+	jsonb_build_array(
+		floor(random() * 10)::int,
+		floor(random() * 10)::int,
+		floor(random() * 10)::int))
+FROM generate_series(1, 2000);
+
+
+ANALYZE jsonb_stats_test;
+
+CREATE OR REPLACE FUNCTION check_jsonb_stats_test_estimate(sql_condition text, accuracy real)
+RETURNS boolean AS
+$$
+	SELECT check_estimate('SELECT count(*) FROM jsonb_stats_test WHERE ' || sql_condition, accuracy)
+$$ LANGUAGE sql;
+
+DROP FUNCTION IF EXISTS check_jsonb_stats_test_estimate2(text, real);
+
+CREATE OR REPLACE FUNCTION check_jsonb_stats_test_estimate2(sql_condition text, accuracy real)
+RETURNS TABLE(plan integer, actual integer) AS
+$$
+	SELECT get_plan_and_actual_rows('SELECT count(*) FROM jsonb_stats_test WHERE ' || sql_condition)
+$$ LANGUAGE sql;
+
+-- Check NULL estimate
+SELECT check_jsonb_stats_test_estimate($$js IS NULL$$, 0.03);
+SELECT check_jsonb_stats_test_estimate($$js -> 'bad_key' IS NULL$$, 0.01);
+SELECT check_jsonb_stats_test_estimate($$js #> '{bad_key}' IS NULL$$, 0.01);
+SELECT check_jsonb_stats_test_estimate($$js -> 1000000 IS NULL$$, 0.01);
+SELECT check_jsonb_stats_test_estimate($$js #> '{1000000}' IS NULL$$, 0.01);
+SELECT check_jsonb_stats_test_estimate($$js -> 'bad_key1' -> 'bad_key2' IS NULL$$, 0.01);
+SELECT check_jsonb_stats_test_estimate($$js #> '{bad_key1,bad_key2}' IS NULL$$, 0.01);
+SELECT check_jsonb_stats_test_estimate($$js -> 'bad_key1' -> 1 IS NULL$$, 0.01);
+SELECT check_jsonb_stats_test_estimate($$js #> '{bad_key1,1}' IS NULL$$, 0.01);
+SELECT check_jsonb_stats_test_estimate($$js -> 1000000 -> 'foo' IS NULL$$, 0.01);
+SELECT check_jsonb_stats_test_estimate($$js #> '{1000000,foo}' IS NULL$$, 0.01);
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'bad_key' = '123'$$, 0.01);
+SELECT check_jsonb_stats_test_estimate($$js -> 1000000 = '123'$$, 0.01);
+
+-- Check null eq estimate
+SELECT check_jsonb_stats_test_estimate($$js =  'null'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js @> 'null'$$, 0.1);
+
+-- Check boolean eq estimate
+SELECT check_jsonb_stats_test_estimate($$js =  'true'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js @> 'true'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js =  'false'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js @> 'false'$$, 0.1);
+
+-- Check numeric eq estimate
+SELECT check_jsonb_stats_test_estimate($$js = '12345'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js#>'{}' = '12345'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js = '3000'$$, 0.3);
+SELECT check_jsonb_stats_test_estimate($$js = '1234'$$, 1);
+SELECT check_jsonb_stats_test_estimate($$js @> '6000'$$, 0.2);
+
+-- Check numeric range estimate
+SELECT check_jsonb_stats_test_estimate($$js < '0'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js < '100'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js < '1000'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js < '3456'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js < '10000'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js < '100000'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js > '100' AND js < '600'$$, 0.5);
+SELECT check_jsonb_stats_test_estimate($$js > '6800' AND js < '12000'$$, 0.1);
+
+-- Check string eq estimate
+SELECT check_jsonb_stats_test_estimate($$js = '"foo"'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js = '"bar7"'$$, 0.2);
+SELECT check_jsonb_stats_test_estimate($$js = '"baz1234"'$$, 10);
+SELECT check_jsonb_stats_test_estimate($$js @> '"bar4"'$$, 0.3);
+
+-- Check string range estimate
+SELECT check_jsonb_stats_test_estimate($$js > '"foo"'$$, 0.01);
+SELECT check_jsonb_stats_test_estimate($$js > '"bar"'$$, 0.01);
+SELECT check_jsonb_stats_test_estimate($$js > '"baz"'$$, 0.01);
+
+-- Check object eq estimate
+SELECT check_jsonb_stats_test_estimate($$js = '{}'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js > '{}'$$, 0.1);
+
+-- Check object key eq estimate
+SELECT check_jsonb_stats_test_estimate($$js -> 'foo' = '"bar"'$$, 0.2);
+SELECT check_jsonb_stats_test_estimate($$js -> 'foo' = '"baz"'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js -> 'foo' = '"baz5"'$$, 0.3);
+SELECT check_jsonb_stats_test_estimate($$js #> '{foo}' = '"bar"'$$, 0.2);
+
+-- Check object key range estimate
+SELECT check_jsonb_stats_test_estimate($$js -> 'foo' >= '"baz2"'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js -> 'foo' <  '"baz9"'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js -> 'foo' >= '"baz2"' AND js -> 'foo' < '"baz9"'$$, 0.1);
+
+-- Check array eq estimate
+SELECT check_jsonb_stats_test_estimate($$js = '[]'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js >= '[]' AND js < '{}'$$, 0.1);
+
+-- Check variable-length array element eq estimate
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 0 = '1'$$, 0.2);
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 1 = '6'$$, 0.2);
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 2 = '8'$$, 0.2);
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 3 = '1'$$, 0.2);
+
+-- Check variable-length array element range estimate
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 0 < '7'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 1 < '7'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 2 < '7'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 3 < '7'$$, 0.1);
+
+-- Check variable-length array containment estimate
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' @> '[]'$$, 0.2);
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' @> '[1]'$$, 0.2);
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' @> '[100]'$$, 0.2);
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' @> '[1, 2]'$$, 1);
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' @> '[1, 100]'$$, 1);
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' @> '[1, 2, 100]'$$, 1);
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' @> '[1, 2, 3]'$$, 5);
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' @> '1'$$, 0.3);
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' @> '100'$$, 10);
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 0 @> '1'$$, 0.3);
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 1 @> '1'$$, 0.3);
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 2 @> '1'$$, 0.3);
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 3 @> '1'$$, 0.3);
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 0 @> '[1]'$$, 0.3);
+
+SELECT check_jsonb_stats_test_estimate($$js @> '{"array": []}'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js @> '{"array": [1]}'$$, 0.3);
+SELECT check_jsonb_stats_test_estimate($$js @> '{"array": [100]}'$$, 0.3);
+SELECT check_jsonb_stats_test_estimate($$js @> '{"array": [1, 2]}'$$, 1);
+SELECT check_jsonb_stats_test_estimate($$js @> '{"array": [1, 100]}'$$, 1);
+SELECT check_jsonb_stats_test_estimate($$js @> '{"array": [1, 2, 100]}'$$, 1);
+SELECT check_jsonb_stats_test_estimate($$js @> '{"array": [1, 2, 3]}'$$, 3);
+
+-- check misc containment
+SELECT check_jsonb_stats_test_estimate($$js @> '"foo"'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js @> '12345'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js @> '[]'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js @> '[12345]'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js @> '["foo"]'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js @> '[["foo", "bar"]]'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js @> '[["foo"]]'$$, 0.2);
+SELECT check_jsonb_stats_test_estimate($$js @> '[{"key": "foo"}]'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js @> '[null]'$$, 0.3);
+
+-- Check object key null estimate
+SELECT check_jsonb_stats_test_estimate($$js -> 'foo' IS NULL$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js -> 'foo' IS NOT NULL$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js -> '"foo \"bar"' IS NOT NULL$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js -> 'bad_key' IS NULL$$, 0.01);
+SELECT check_jsonb_stats_test_estimate($$js -> 'bad_key' IS NOT NULL$$, 0.01);
+
+-- Check object key existence
+SELECT check_jsonb_stats_test_estimate($$js ? 'bad_key'$$, 10);
+SELECT check_jsonb_stats_test_estimate($$js ? 'foo'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js ? 'array'$$, 0.1);
+
+SELECT check_jsonb_stats_test_estimate($$js ?| '{foo,bad_key}'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js ?| '{foo,array}'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js ?& '{foo,bad_key}'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js ?& '{foo,bar}'$$, 0.1);
-- 
2.25.1

#17

Mahendra Singh Thalor

mahi6run@gmail.com

almost 4 years ago

In reply to: Tomas Vondra (#15)

Re: Collecting statistics about contents of JSONB columns

On Fri, 4 Feb 2022 at 08:30, Tomas Vondra <tomas.vondra@enterprisedb.com>
wrote:

On 2/4/22 03:47, Tomas Vondra wrote:

./json-generate.py 30000 2 8 1000 6 1000

Sorry, this should be (different order of parameters):

./json-generate.py 30000 2 1000 8 6 1000

Thanks, Tomas for this test case.

Hi Hackers,

For the last few days, I was doing testing on the top of these JSON
optimizers patches and was taking help fro Tomas Vondra to understand
patches and testing results.
Thanks, Tomas for your feedback and suggestions.

Below is the summary:
*Point 1)* analyse is taking very much time for large documents:
For large JSON documents, analyze took very large time as compared to the
current head. For reference, I am attaching test file (./json-generate.py
30000 2 1000 8 6 1000)

Head: analyze test ; Time: 120.864 ms
With patch: analyze test ; Time: more than 2 hours

analyze is taking a very large time because with these patches, firstly we
iterate over all sample rows (in above case 30000), and we store all the
paths (here around 850k paths).
In another pass, we took 1 path at a time and collects stats for the
particular path by analyzing all the sample rows and we continue this
process for all 850k paths or we can say that we do 850k loops, and in each
loop we extract values for a single path.

*Point 2)* memory consummation increases rapidly for large documents:
In the above test case, there are total 851k paths and to keep stats for
one path, we allocate 1120 bytes.

Total paths : 852689 ~ 852k

Memory for 1 path to keep stats: 1120 ~ 1 KB

(sizeof(JsonValueStats) = 1120 from “Analyze Column”)

Total memory for all paths: 852689 * 1120 = 955011680 ~ 955 MB

Extra memory for each path will be more. I mean, while analyzing each path,
we allocate some more memory based on frequency and others

To keep all entries(851k paths) in the hash, we use around 1GB memory for
hash so this is also very large.

*Point 3*) Review comment noticed by Tomas Vondra:

+       oldcxt = MemoryContextSwitchTo(ctx->stats->anl_context);
+       pstats->stats = jsonAnalyzeBuildPathStats(pstats);
+       MemoryContextSwitchTo(oldcxt);

Above should be:
+       oldcxt = MemoryContextSwitchTo(ctx->mcxt);
+       pstats->stats = jsonAnalyzeBuildPathStats(pstats);
+       MemoryContextSwitchTo(oldcxt);

*Response from Tomas Vondra:*
The problem is "anl_context" is actually "Analyze", i.e. the context for
the whole ANALYZE command, for all the columns. But we only want to keep
those path stats while processing a particular column. At the end, after
processing all paths from a column, we need to "build" the final stats in
the column, and this result needs to go into "Analyze" context. But all the
partial results need to go into "Analyze Column" context.

*Point 4)*

+/*

+ * jsonAnalyzeCollectPath

+ * Extract a single path from JSON documents and collect its
values.

+ */

+static void

+jsonAnalyzeCollectPath(JsonAnalyzeContext *ctx, Jsonb *jb, void *param)

+ JsonPathAnlStats *pstats = (JsonPathAnlStats *) param;

+ JsonbValue jbvtmp;

+ JsonbValue *jbv = JsonValueInitBinary(&jbvtmp, jb);

+ JsonPathEntry *path;

+ JsonPathEntry **entries;

+ int i;

+ entries = palloc(sizeof(*entries) * pstats->depth);

+ /* Build entry array in direct order */

+ for (path = &pstats->path, i = pstats->depth - 1;

+ path->parent && i >= 0;

+ path = path->parent, i--)

+ entries[i] = path;

+ jsonAnalyzeCollectSubpath(ctx, pstats, jbv, entries, 0);

+ pfree(entries);

----many times, we are trying to palloc with zero size and entries is
pointing to invalid memory (because pstats->depth=0) so I think, we should
not try to palloc with 0??

*Fix:*

+ If (pstats->depth)

+ entries = palloc(sizeof(*entries) * pstats->depth);

From these points, we can say that we should rethink our design to collect
stats for all paths.

We can set limits(like MCV) for paths or we can give an explicit path to
collect stats for a particular path only or we can pass a subset of the
JSON values.

In the above case, there are total 851k paths, but we can collect stats for
only 1000 paths that are most common so this way we can minimize time and
memory also and we might even keep at
least frequencies for the non-analyzed paths.

Next, I will take the latest patches from Nikita's last email and I will do
more tests.

Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com

#18

Greg Stark

stark@mit.edu

almost 4 years ago

In reply to: Mahendra Singh Thalor (#17)

Re: Collecting statistics about contents of JSONB columns

This patch has bitrotted, presumably after the other JSON patchset was
applied. It looks like it's failing in the json header file so it may
be as simple as additional functions added on nearby lines.

Please rebase. Reminder, it's the last week of the commitfest so time
is of the essence....

#19

Justin Pryzby

pryzby@telsasoft.com

almost 4 years ago

In reply to: Nikita Glukhov (#16)

Re: Collecting statistics about contents of JSONB columns

I noticed some typos.

diff --git a/src/backend/utils/adt/jsonb_selfuncs.c b/src/backend/utils/adt/jsonb_selfuncs.c
index f5520f88a1d..d98cd7020a1 100644
--- a/src/backend/utils/adt/jsonb_selfuncs.c
+++ b/src/backend/utils/adt/jsonb_selfuncs.c
@@ -1342,7 +1342,7 @@ jsonSelectivityContains(JsonStats stats, Jsonb *jb)
 					path->stats = jsonStatsFindPath(stats, pathstr.data,
 													pathstr.len);

-				/* Appeend path string entry for array elements, get stats. */
+				/* Append path string entry for array elements, get stats. */
 				jsonPathAppendEntry(&pathstr, NULL);
 				pstats = jsonStatsFindPath(stats, pathstr.data, pathstr.len);
 				freq = jsonPathStatsGetFreq(pstats, 0.0);
@@ -1367,7 +1367,7 @@ jsonSelectivityContains(JsonStats stats, Jsonb *jb)
 			case WJB_END_ARRAY:
 			{
 				struct Path *p = path;
-				/* Absoulte selectivity of the path with its all subpaths */
+				/* Absolute selectivity of the path with its all subpaths */
 				Selectivity abs_sel = p->sel * p->freq;

 				/* Pop last path entry */
diff --git a/src/backend/utils/adt/jsonb_typanalyze.c b/src/backend/utils/adt/jsonb_typanalyze.c
index 7882db23a87..9a759aadafb 100644
--- a/src/backend/utils/adt/jsonb_typanalyze.c
+++ b/src/backend/utils/adt/jsonb_typanalyze.c
@@ -123,10 +123,9 @@ typedef struct JsonScalarStats
 /*
  * Statistics calculated for a set of values.
  *
- *
  * XXX This seems rather complicated and needs simplification. We're not
  * really using all the various JsonScalarStats bits, there's a lot of
- * duplication (e.g. each JsonScalarStats contains it's own array, which
+ * duplication (e.g. each JsonScalarStats contains its own array, which
  * has a copy of data from the one in "jsons").
  */
 typedef struct JsonValueStats
@@ -849,7 +848,7 @@ jsonAnalyzePathValues(JsonAnalyzeContext *ctx, JsonScalarStats *sstats,
 	stats->stanullfrac = (float4)(1.0 - freq);

 	/*
-	 * Similarly, we need to correct the MCV frequencies, becuse those are
+	 * Similarly, we need to correct the MCV frequencies, because those are
 	 * also calculated only from the non-null values. All we need to do is
 	 * simply multiply that with the non-NULL frequency.
 	 */
@@ -1015,7 +1014,7 @@ jsonAnalyzeBuildPathStats(JsonPathAnlStats *pstats)

 	/*
 	 * We keep array length stats here for queries like jsonpath '$.size() > 5'.
-	 * Object lengths stats can be useful for other query lanuages.
+	 * Object lengths stats can be useful for other query languages.
 	 */
 	if (vstats->arrlens.values.count)
 		jsonAnalyzeMakeScalarStats(&ps, "array_length", &vstats->arrlens.stats);
@@ -1069,7 +1068,7 @@ jsonAnalyzeCalcPathFreq(JsonAnalyzeContext *ctx, JsonPathAnlStats *pstats,
  * We're done with accumulating values for this path, so calculate the
  * statistics for the various arrays.
  *
- * XXX I wonder if we could introduce some simple heuristict on which
+ * XXX I wonder if we could introduce some simple heuristic on which
  * paths to keep, similarly to what we do for MCV lists. For example a
  * path that occurred just once is not very interesting, so we could
  * decide to ignore it and not build the stats. Although that won't
@@ -1414,7 +1413,7 @@ compute_json_stats(VacAttrStats *stats, AnalyzeAttrFetchFunc fetchfunc,

 	/*
 	 * Collect and analyze JSON path values in single or multiple passes.
-	 * Sigle-pass collection is faster but consumes much more memory than
+	 * Single-pass collection is faster but consumes much more memory than
 	 * collecting and analyzing by the one path at pass.
 	 */
 	if (ctx.single_pass)

#20

Mahendra Singh Thalor

mahi6run@gmail.com

over 3 years ago

In reply to: Greg Stark (#18)

6 attachment(s)

Re: Collecting statistics about contents of JSONB columns

On Fri, 1 Apr 2022 at 20:21, Greg Stark <stark@mit.edu> wrote:

This patch has bitrotted, presumably after the other JSON patchset was
applied. It looks like it's failing in the json header file so it may
be as simple as additional functions added on nearby lines.

Please rebase. Reminder, it's the last week of the commitfest so time
is of the essence....

Thanks, Greg for the report.

Here, I am attaching re-based patches of the v05 series. These patches
are re-based on the commit 7dd3ee508432730d15c5.

I noticed some typos.

diff --git a/src/backend/utils/adt/jsonb_selfuncs.c b/src/backend/utils/adt/jsonb_selfuncs.c
index f5520f88a1d..d98cd7020a1 100644

Thanks, Justin for the review. We will fix these comments in the next version.

Next, I am going to try to disable all-paths collection and implement
collection of most common paths (and/or hashed paths maybe).

Thanks, Nikita for the v04 series of patches. I tested on the top of
your patches and verified that time taken by analyse is reduced for
large complex json docs.

In v03 patches, it was more than 2 hours, and in v04 patches, it is 39
sec only (time for Tomas's test case).

I am waiting for your patches (disable all-paths collection and
implement collection of most common paths)

Just for testing purposes, I am posting re-based patches here.

--
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com

Attachments:

v05_0005-Export-scalarineqsel.patchapplication/octet-stream; name=v05_0005-Export-scalarineqsel.patchDownload

From 7015721388f081b78b4427f3073ff3fbb6672501 Mon Sep 17 00:00:00 2001
From: Nikita Glukhov <n.gluhov@postgrespro.ru>
Date: Sat, 12 Nov 2016 19:01:43 +0300
Subject: [PATCH 5/6] Export scalarineqsel()

---
 src/backend/utils/adt/selfuncs.c | 2 +-
 src/include/utils/selfuncs.h     | 4 ++++
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 03af7cc..ada17a6 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -573,7 +573,7 @@ neqsel(PG_FUNCTION_ARGS)
  * it will return an approximate estimate based on assuming that the constant
  * value falls in the middle of the bin identified by binary search.
  */
-static double
+double
 scalarineqsel(PlannerInfo *root, Oid operator, bool isgt, bool iseq,
 			  Oid collation,
 			  VariableStatData *vardata, Datum constval, Oid consttype)
diff --git a/src/include/utils/selfuncs.h b/src/include/utils/selfuncs.h
index cb7d510..be16606 100644
--- a/src/include/utils/selfuncs.h
+++ b/src/include/utils/selfuncs.h
@@ -254,6 +254,10 @@ extern List *add_predicate_to_index_quals(IndexOptInfo *index,
 extern void genericcostestimate(PlannerInfo *root, IndexPath *path,
 								double loop_count,
 								GenericCosts *costs);
+extern double scalarineqsel(PlannerInfo *root, Oid operator, bool isgt,
+							bool iseq, Oid collation,
+							VariableStatData *vardata, Datum constval,
+							Oid consttype);
 
 extern HeapTuple stats_form_tuple(StatsData *data);
 
-- 
1.8.3.1

v05_0003-Add-symbolic-names-for-some-jsonb-operators.patchapplication/octet-stream; name=v05_0003-Add-symbolic-names-for-some-jsonb-operators.patchDownload

From 4ac2ffeee36cf0f1d0945effbbc91d90d9822bac Mon Sep 17 00:00:00 2001
From: Nikita Glukhov <n.gluhov@postgrespro.ru>
Date: Sat, 12 Nov 2016 18:59:43 +0300
Subject: [PATCH 3/6] Add symbolic names for some jsonb operators

---
 src/include/catalog/pg_operator.dat | 45 +++++++++++++++++++++----------------
 1 file changed, 26 insertions(+), 19 deletions(-)

diff --git a/src/include/catalog/pg_operator.dat b/src/include/catalog/pg_operator.dat
index bc5f821..8e0e65a 100644
--- a/src/include/catalog/pg_operator.dat
+++ b/src/include/catalog/pg_operator.dat
@@ -2067,7 +2067,7 @@
 { oid => '1751', descr => 'negate',
   oprname => '-', oprkind => 'l', oprleft => '0', oprright => 'numeric',
   oprresult => 'numeric', oprcode => 'numeric_uminus' },
-{ oid => '1752', descr => 'equal',
+{ oid => '1752', oid_symbol => 'NumericEqOperator', descr => 'equal',
   oprname => '=', oprcanmerge => 't', oprcanhash => 't', oprleft => 'numeric',
   oprright => 'numeric', oprresult => 'bool', oprcom => '=(numeric,numeric)',
   oprnegate => '<>(numeric,numeric)', oprcode => 'numeric_eq',
@@ -2077,7 +2077,7 @@
   oprresult => 'bool', oprcom => '<>(numeric,numeric)',
   oprnegate => '=(numeric,numeric)', oprcode => 'numeric_ne',
   oprrest => 'neqsel', oprjoin => 'neqjoinsel' },
-{ oid => '1754', descr => 'less than',
+{ oid => '1754', oid_symbol => 'NumericLtOperator', descr => 'less than',
   oprname => '<', oprleft => 'numeric', oprright => 'numeric',
   oprresult => 'bool', oprcom => '>(numeric,numeric)',
   oprnegate => '>=(numeric,numeric)', oprcode => 'numeric_lt',
@@ -3172,70 +3172,77 @@
 { oid => '3967', descr => 'get value from json as text with path elements',
   oprname => '#>>', oprleft => 'json', oprright => '_text', oprresult => 'text',
   oprcode => 'json_extract_path_text' },
-{ oid => '3211', descr => 'get jsonb object field',
+{ oid => '3211', oid_symbol => 'JsonbObjectFieldOperator',
+  descr => 'get jsonb object field',
   oprname => '->', oprleft => 'jsonb', oprright => 'text', oprresult => 'jsonb',
   oprcode => 'jsonb_object_field' },
-{ oid => '3477', descr => 'get jsonb object field as text',
+{ oid => '3477', oid_symbol => 'JsonbObjectFieldTextOperator',
+  descr => 'get jsonb object field as text',
   oprname => '->>', oprleft => 'jsonb', oprright => 'text', oprresult => 'text',
   oprcode => 'jsonb_object_field_text' },
-{ oid => '3212', descr => 'get jsonb array element',
+{ oid => '3212', oid_symbol => 'JsonbArrayElementOperator',
+  descr => 'get jsonb array element',
   oprname => '->', oprleft => 'jsonb', oprright => 'int4', oprresult => 'jsonb',
   oprcode => 'jsonb_array_element' },
-{ oid => '3481', descr => 'get jsonb array element as text',
+{ oid => '3481', oid_symbol => 'JsonbArrayElementTextOperator',
+  descr => 'get jsonb array element as text',
   oprname => '->>', oprleft => 'jsonb', oprright => 'int4', oprresult => 'text',
   oprcode => 'jsonb_array_element_text' },
-{ oid => '3213', descr => 'get value from jsonb with path elements',
+{ oid => '3213', oid_symbol => 'JsonbExtractPathOperator',
+  descr => 'get value from jsonb with path elements',
   oprname => '#>', oprleft => 'jsonb', oprright => '_text',
   oprresult => 'jsonb', oprcode => 'jsonb_extract_path' },
-{ oid => '3206', descr => 'get value from jsonb as text with path elements',
+{ oid => '3206', oid_symbol => 'JsonbExtractPathTextOperator',
+  descr => 'get value from jsonb as text with path elements',
   oprname => '#>>', oprleft => 'jsonb', oprright => '_text',
   oprresult => 'text', oprcode => 'jsonb_extract_path_text' },
-{ oid => '3240', descr => 'equal',
+{ oid => '3240', oid_symbol => 'JsonbEqOperator', descr => 'equal',
   oprname => '=', oprcanmerge => 't', oprcanhash => 't', oprleft => 'jsonb',
   oprright => 'jsonb', oprresult => 'bool', oprcom => '=(jsonb,jsonb)',
   oprnegate => '<>(jsonb,jsonb)', oprcode => 'jsonb_eq', oprrest => 'eqsel',
   oprjoin => 'eqjoinsel' },
-{ oid => '3241', descr => 'not equal',
+{ oid => '3241', oid_symbol => 'JsonbNeOperator', descr => 'not equal',
   oprname => '<>', oprleft => 'jsonb', oprright => 'jsonb', oprresult => 'bool',
   oprcom => '<>(jsonb,jsonb)', oprnegate => '=(jsonb,jsonb)',
   oprcode => 'jsonb_ne', oprrest => 'neqsel', oprjoin => 'neqjoinsel' },
-{ oid => '3242', descr => 'less than',
+{ oid => '3242', oid_symbol => 'JsonbLtOperator', descr => 'less than',
   oprname => '<', oprleft => 'jsonb', oprright => 'jsonb', oprresult => 'bool',
   oprcom => '>(jsonb,jsonb)', oprnegate => '>=(jsonb,jsonb)',
   oprcode => 'jsonb_lt', oprrest => 'scalarltsel',
   oprjoin => 'scalarltjoinsel' },
-{ oid => '3243', descr => 'greater than',
+{ oid => '3243', oid_symbol => 'JsonbGtOperator', descr => 'greater than',
   oprname => '>', oprleft => 'jsonb', oprright => 'jsonb', oprresult => 'bool',
   oprcom => '<(jsonb,jsonb)', oprnegate => '<=(jsonb,jsonb)',
   oprcode => 'jsonb_gt', oprrest => 'scalargtsel',
   oprjoin => 'scalargtjoinsel' },
-{ oid => '3244', descr => 'less than or equal',
+{ oid => '3244', oid_symbol => 'JsonbLeOperator', descr => 'less than or equal',
   oprname => '<=', oprleft => 'jsonb', oprright => 'jsonb', oprresult => 'bool',
   oprcom => '>=(jsonb,jsonb)', oprnegate => '>(jsonb,jsonb)',
   oprcode => 'jsonb_le', oprrest => 'scalarlesel',
   oprjoin => 'scalarlejoinsel' },
-{ oid => '3245', descr => 'greater than or equal',
+{ oid => '3245', oid_symbol => 'JsonbGeOperator',
+  descr => 'greater than or equal',
   oprname => '>=', oprleft => 'jsonb', oprright => 'jsonb', oprresult => 'bool',
   oprcom => '<=(jsonb,jsonb)', oprnegate => '<(jsonb,jsonb)',
   oprcode => 'jsonb_ge', oprrest => 'scalargesel',
   oprjoin => 'scalargejoinsel' },
-{ oid => '3246', descr => 'contains',
+{ oid => '3246', oid_symbol => 'JsonbContainsOperator', descr => 'contains',
   oprname => '@>', oprleft => 'jsonb', oprright => 'jsonb', oprresult => 'bool',
   oprcom => '<@(jsonb,jsonb)', oprcode => 'jsonb_contains',
   oprrest => 'matchingsel', oprjoin => 'matchingjoinsel' },
-{ oid => '3247', descr => 'key exists',
+{ oid => '3247', oid_symbol => 'JsonbExistsOperator', descr => 'key exists',
   oprname => '?', oprleft => 'jsonb', oprright => 'text', oprresult => 'bool',
   oprcode => 'jsonb_exists', oprrest => 'matchingsel',
   oprjoin => 'matchingjoinsel' },
-{ oid => '3248', descr => 'any key exists',
+{ oid => '3248', oid_symbol => 'JsonbExistsAnyOperator', descr => 'any key exists',
   oprname => '?|', oprleft => 'jsonb', oprright => '_text', oprresult => 'bool',
   oprcode => 'jsonb_exists_any', oprrest => 'matchingsel',
   oprjoin => 'matchingjoinsel' },
-{ oid => '3249', descr => 'all keys exist',
+{ oid => '3249', oid_symbol => 'JsonbExistsAllOperator', descr => 'all keys exist',
   oprname => '?&', oprleft => 'jsonb', oprright => '_text', oprresult => 'bool',
   oprcode => 'jsonb_exists_all', oprrest => 'matchingsel',
   oprjoin => 'matchingjoinsel' },
-{ oid => '3250', descr => 'is contained by',
+{ oid => '3250', oid_symbol => 'JsonbContainedOperator', descr => 'is contained by',
   oprname => '<@', oprleft => 'jsonb', oprright => 'jsonb', oprresult => 'bool',
   oprcom => '@>(jsonb,jsonb)', oprcode => 'jsonb_contained',
   oprrest => 'matchingsel', oprjoin => 'matchingjoinsel' },
-- 
1.8.3.1

v05_0001-Add-pg_operator.oprstat-for-derived-operator-statist.patchapplication/octet-stream; name=v05_0001-Add-pg_operator.oprstat-for-derived-operator-statist.patchDownload

From 6d62301e790ad4b94e081e5b56b569c30931c5ec Mon Sep 17 00:00:00 2001
From: Nikita Glukhov <n.gluhov@postgrespro.ru>
Date: Sat, 12 Nov 2016 18:59:43 +0300
Subject: [PATCH 1/6] Add pg_operator.oprstat for derived operator statistics
 estimation

---
 src/backend/catalog/pg_operator.c      | 11 ++++++
 src/backend/commands/operatorcmds.c    | 61 ++++++++++++++++++++++++++++++++++
 src/backend/utils/adt/selfuncs.c       | 38 +++++++++++++++++++++
 src/backend/utils/cache/lsyscache.c    | 24 +++++++++++++
 src/include/catalog/pg_operator.h      |  4 +++
 src/include/utils/lsyscache.h          |  1 +
 src/test/regress/expected/oidjoins.out |  1 +
 7 files changed, 140 insertions(+)

diff --git a/src/backend/catalog/pg_operator.c b/src/backend/catalog/pg_operator.c
index 630bf3e..9205e62 100644
--- a/src/backend/catalog/pg_operator.c
+++ b/src/backend/catalog/pg_operator.c
@@ -256,6 +256,7 @@ OperatorShellMake(const char *operatorName,
 	values[Anum_pg_operator_oprcode - 1] = ObjectIdGetDatum(InvalidOid);
 	values[Anum_pg_operator_oprrest - 1] = ObjectIdGetDatum(InvalidOid);
 	values[Anum_pg_operator_oprjoin - 1] = ObjectIdGetDatum(InvalidOid);
+	values[Anum_pg_operator_oprstat - 1] = ObjectIdGetDatum(InvalidOid);
 
 	/*
 	 * create a new operator tuple
@@ -301,6 +302,7 @@ OperatorShellMake(const char *operatorName,
  *		negatorName				X negator operator
  *		restrictionId			X restriction selectivity procedure ID
  *		joinId					X join selectivity procedure ID
+ *		statsId					X statistics derivation procedure ID
  *		canMerge				merge join can be used with this operator
  *		canHash					hash join can be used with this operator
  *
@@ -333,6 +335,7 @@ OperatorCreate(const char *operatorName,
 			   List *negatorName,
 			   Oid restrictionId,
 			   Oid joinId,
+			   Oid statsId,
 			   bool canMerge,
 			   bool canHash)
 {
@@ -505,6 +508,7 @@ OperatorCreate(const char *operatorName,
 	values[Anum_pg_operator_oprcode - 1] = ObjectIdGetDatum(procedureId);
 	values[Anum_pg_operator_oprrest - 1] = ObjectIdGetDatum(restrictionId);
 	values[Anum_pg_operator_oprjoin - 1] = ObjectIdGetDatum(joinId);
+	values[Anum_pg_operator_oprstat - 1] = ObjectIdGetDatum(statsId);
 
 	pg_operator_desc = table_open(OperatorRelationId, RowExclusiveLock);
 
@@ -855,6 +859,13 @@ makeOperatorDependencies(HeapTuple tuple,
 		add_exact_object_address(&referenced, addrs);
 	}
 
+	/* Dependency on statistics derivation function */
+	if (OidIsValid(oper->oprstat))
+	{
+		ObjectAddressSet(referenced, ProcedureRelationId, oper->oprstat);
+		add_exact_object_address(&referenced, addrs);
+	}
+
 	record_object_address_dependencies(&myself, addrs, DEPENDENCY_NORMAL);
 	free_object_addresses(addrs);
 
diff --git a/src/backend/commands/operatorcmds.c b/src/backend/commands/operatorcmds.c
index a5924d7..adf13e6 100644
--- a/src/backend/commands/operatorcmds.c
+++ b/src/backend/commands/operatorcmds.c
@@ -52,6 +52,7 @@
 
 static Oid	ValidateRestrictionEstimator(List *restrictionName);
 static Oid	ValidateJoinEstimator(List *joinName);
+static Oid	ValidateStatisticsDerivator(List *joinName);
 
 /*
  * DefineOperator
@@ -78,10 +79,12 @@ DefineOperator(List *names, List *parameters)
 	List	   *commutatorName = NIL;	/* optional commutator operator name */
 	List	   *negatorName = NIL;	/* optional negator operator name */
 	List	   *restrictionName = NIL;	/* optional restrict. sel. function */
+	List	   *statisticsName = NIL;	/* optional stats estimat. procedure */
 	List	   *joinName = NIL; /* optional join sel. function */
 	Oid			functionOid;	/* functions converted to OID */
 	Oid			restrictionOid;
 	Oid			joinOid;
+	Oid			statisticsOid;
 	Oid			typeId[2];		/* to hold left and right arg */
 	int			nargs;
 	ListCell   *pl;
@@ -131,6 +134,8 @@ DefineOperator(List *names, List *parameters)
 			restrictionName = defGetQualifiedName(defel);
 		else if (strcmp(defel->defname, "join") == 0)
 			joinName = defGetQualifiedName(defel);
+		else if (strcmp(defel->defname, "statistics") == 0)
+			statisticsName = defGetQualifiedName(defel);
 		else if (strcmp(defel->defname, "hashes") == 0)
 			canHash = defGetBoolean(defel);
 		else if (strcmp(defel->defname, "merges") == 0)
@@ -246,6 +251,10 @@ DefineOperator(List *names, List *parameters)
 		joinOid = ValidateJoinEstimator(joinName);
 	else
 		joinOid = InvalidOid;
+	if (statisticsName)
+		statisticsOid = ValidateStatisticsDerivator(statisticsName);
+	else
+		statisticsOid = InvalidOid;
 
 	/*
 	 * now have OperatorCreate do all the work..
@@ -260,6 +269,7 @@ DefineOperator(List *names, List *parameters)
 					   negatorName, /* optional negator operator name */
 					   restrictionOid,	/* optional restrict. sel. function */
 					   joinOid, /* optional join sel. function name */
+					   statisticsOid, /* optional stats estimation procedure */
 					   canMerge,	/* operator merges */
 					   canHash);	/* operator hashes */
 }
@@ -358,6 +368,40 @@ ValidateJoinEstimator(List *joinName)
 }
 
 /*
+ * Look up a statistics estimator function by name, and verify that it has the
+ * correct signature and we have the permissions to attach it to an operator.
+ */
+static Oid
+ValidateStatisticsDerivator(List *statName)
+{
+	Oid			typeId[4];
+	Oid			statOid;
+	AclResult	aclresult;
+
+	typeId[0] = INTERNALOID;	/* PlannerInfo */
+	typeId[1] = INTERNALOID;	/* OpExpr */
+	typeId[2] = INT4OID;		/* varRelid */
+	typeId[3] = INTERNALOID;	/* VariableStatData */
+
+	statOid = LookupFuncName(statName, 4, typeId, false);
+
+	/* statistics estimators must return void */
+	if (get_func_rettype(statOid) != VOIDOID)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+				 errmsg("statistics estimator function %s must return type %s",
+						NameListToString(statName), "void")));
+
+	/* Require EXECUTE rights for the estimator */
+	aclresult = pg_proc_aclcheck(statOid, GetUserId(), ACL_EXECUTE);
+	if (aclresult != ACLCHECK_OK)
+		aclcheck_error(aclresult, OBJECT_FUNCTION,
+					   NameListToString(statName));
+
+	return statOid;
+}
+
+/*
  * Guts of operator deletion.
  */
 void
@@ -424,6 +468,9 @@ AlterOperator(AlterOperatorStmt *stmt)
 	List	   *joinName = NIL; /* optional join sel. function */
 	bool		updateJoin = false;
 	Oid			joinOid;
+	List	   *statName = NIL; /* optional statistics estimation procedure */
+	bool		updateStat = false;
+	Oid			statOid;
 
 	/* Look up the operator */
 	oprId = LookupOperWithArgs(stmt->opername, false);
@@ -454,6 +501,11 @@ AlterOperator(AlterOperatorStmt *stmt)
 			joinName = param;
 			updateJoin = true;
 		}
+		else if (pg_strcasecmp(defel->defname, "stats") == 0)
+		{
+			statName = param;
+			updateStat = true;
+		}
 
 		/*
 		 * The rest of the options that CREATE accepts cannot be changed.
@@ -496,6 +548,10 @@ AlterOperator(AlterOperatorStmt *stmt)
 		joinOid = ValidateJoinEstimator(joinName);
 	else
 		joinOid = InvalidOid;
+	if (statName)
+		statOid = ValidateStatisticsDerivator(statName);
+	else
+		statOid = InvalidOid;
 
 	/* Perform additional checks, like OperatorCreate does */
 	if (!(OidIsValid(oprForm->oprleft) && OidIsValid(oprForm->oprright)))
@@ -536,6 +592,11 @@ AlterOperator(AlterOperatorStmt *stmt)
 		replaces[Anum_pg_operator_oprjoin - 1] = true;
 		values[Anum_pg_operator_oprjoin - 1] = joinOid;
 	}
+	if (updateStat)
+	{
+		replaces[Anum_pg_operator_oprstat - 1] = true;
+		values[Anum_pg_operator_oprstat - 1] = statOid;
+	}
 
 	tup = heap_modify_tuple(tup, RelationGetDescr(catalog),
 							values, nulls, replaces);
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index fb4fb98..2f9e0a2 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -4950,6 +4950,30 @@ ReleaseDummy(HeapTuple tuple)
 }
 
 /*
+ * examine_operator_expression
+ *		Try to derive optimizer statistics for the operator expression using
+ *		operator's oprstat function.
+ *
+ * There can be another OpExpr in one of the arguments, and it will be called
+ * recursively from the oprstat procedure through the following chain:
+ * get_restriction_variable() => examine_variable() =>
+ * examine_operator_expression().
+ */
+static void
+examine_operator_expression(PlannerInfo *root, OpExpr *opexpr, int varRelid,
+							VariableStatData *vardata)
+{
+	RegProcedure oprstat = get_oprstat(opexpr->opno);
+
+	if (OidIsValid(oprstat))
+		OidFunctionCall4(oprstat,
+						 PointerGetDatum(root),
+						 PointerGetDatum(opexpr),
+						 Int32GetDatum(varRelid),
+						 PointerGetDatum(vardata));
+}
+
+/*
  * examine_variable
  *		Try to look up statistical data about an expression.
  *		Fill in a VariableStatData struct to describe the expression.
@@ -5364,6 +5388,20 @@ examine_variable(PlannerInfo *root, Node *node, int varRelid,
 				pos++;
 			}
 		}
+
+		/*
+		 * If there's no index or extended statistics matching the expression,
+		 * try deriving the statistics from statistics on arguments of the
+		 * operator expression (OpExpr). We do this last because it may be quite
+		 * expensive, and it's unclear how accurate the statistics will be.
+		 *
+		 * More restrictions on the OpExpr (e.g. that one of the arguments
+		 * has to be a Const or something) can be put by the operator itself
+		 * in its oprstat function.
+		 */
+		if (!vardata->statsTuple && IsA(basenode, OpExpr))
+			examine_operator_expression(root, (OpExpr *) basenode, varRelid,
+										vardata);
 	}
 }
 
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index 1b7e11b..48c5e12 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -1577,6 +1577,30 @@ get_oprjoin(Oid opno)
 		return (RegProcedure) InvalidOid;
 }
 
+/*
+ * get_oprstat
+ *
+ *		Returns procedure id for estimating statistics for an operator.
+ */
+RegProcedure
+get_oprstat(Oid opno)
+{
+	HeapTuple	tp;
+
+	tp = SearchSysCache1(OPEROID, ObjectIdGetDatum(opno));
+	if (HeapTupleIsValid(tp))
+	{
+		Form_pg_operator optup = (Form_pg_operator) GETSTRUCT(tp);
+		RegProcedure result;
+
+		result = optup->oprstat;
+		ReleaseSysCache(tp);
+		return result;
+	}
+	else
+		return (RegProcedure) InvalidOid;
+}
+
 /*				---------- FUNCTION CACHE ----------					 */
 
 /*
diff --git a/src/include/catalog/pg_operator.h b/src/include/catalog/pg_operator.h
index 51263f5..ff1bb33 100644
--- a/src/include/catalog/pg_operator.h
+++ b/src/include/catalog/pg_operator.h
@@ -73,6 +73,9 @@ CATALOG(pg_operator,2617,OperatorRelationId)
 
 	/* OID of join estimator, or 0 */
 	regproc		oprjoin BKI_DEFAULT(-) BKI_LOOKUP_OPT(pg_proc);
+
+	/* OID of statistics estimator, or 0 */
+	regproc		oprstat BKI_DEFAULT(-) BKI_LOOKUP_OPT(pg_proc);
 } FormData_pg_operator;
 
 /* ----------------
@@ -95,6 +98,7 @@ extern ObjectAddress OperatorCreate(const char *operatorName,
 									List *negatorName,
 									Oid restrictionId,
 									Oid joinId,
+									Oid statisticsId,
 									bool canMerge,
 									bool canHash);
 
diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h
index b8dd27d..cc08fc5 100644
--- a/src/include/utils/lsyscache.h
+++ b/src/include/utils/lsyscache.h
@@ -118,6 +118,7 @@ extern Oid	get_commutator(Oid opno);
 extern Oid	get_negator(Oid opno);
 extern RegProcedure get_oprrest(Oid opno);
 extern RegProcedure get_oprjoin(Oid opno);
+extern RegProcedure get_oprstat(Oid opno);
 extern char *get_func_name(Oid funcid);
 extern Oid	get_func_namespace(Oid funcid);
 extern Oid	get_func_rettype(Oid funcid);
diff --git a/src/test/regress/expected/oidjoins.out b/src/test/regress/expected/oidjoins.out
index 215eb89..111ea99 100644
--- a/src/test/regress/expected/oidjoins.out
+++ b/src/test/regress/expected/oidjoins.out
@@ -113,6 +113,7 @@ NOTICE:  checking pg_operator {oprnegate} => pg_operator {oid}
 NOTICE:  checking pg_operator {oprcode} => pg_proc {oid}
 NOTICE:  checking pg_operator {oprrest} => pg_proc {oid}
 NOTICE:  checking pg_operator {oprjoin} => pg_proc {oid}
+NOTICE:  checking pg_operator {oprstat} => pg_proc {oid}
 NOTICE:  checking pg_opfamily {opfmethod} => pg_am {oid}
 NOTICE:  checking pg_opfamily {opfnamespace} => pg_namespace {oid}
 NOTICE:  checking pg_opfamily {opfowner} => pg_authid {oid}
-- 
1.8.3.1

v05_0002-Add-stats_form_tuple.patchapplication/octet-stream; name=v05_0002-Add-stats_form_tuple.patchDownload

From 48359d6375ceb3655afbb9847ddae14288a8daea Mon Sep 17 00:00:00 2001
From: Nikita Glukhov <n.gluhov@postgrespro.ru>
Date: Wed, 7 Dec 2016 16:12:55 +0300
Subject: [PATCH 2/6] Add stats_form_tuple()

---
 src/backend/utils/adt/selfuncs.c | 55 ++++++++++++++++++++++++++++++++++++++++
 src/include/utils/selfuncs.h     | 22 ++++++++++++++++
 2 files changed, 77 insertions(+)

diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 2f9e0a2..03af7cc 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -7986,3 +7986,58 @@ brincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 
 	*indexPages = index->pages;
 }
+
+/*
+ * stats_form_tuple - Form pg_statistic tuple from StatsData.
+ *
+ * If 'data' parameter is NULL, form all-NULL tuple (nullfrac = 1.0).
+ */
+HeapTuple
+stats_form_tuple(StatsData *data)
+{
+	Relation	rel;
+	HeapTuple	tuple;
+	Datum		values[Natts_pg_statistic];
+	bool		nulls[Natts_pg_statistic];
+	int			i;
+
+	for (i = 0; i < Natts_pg_statistic; ++i)
+		nulls[i] = false;
+
+	values[Anum_pg_statistic_starelid - 1] = ObjectIdGetDatum(InvalidOid);
+	values[Anum_pg_statistic_staattnum - 1] = Int16GetDatum(0);
+	values[Anum_pg_statistic_stainherit - 1] = BoolGetDatum(false);
+	values[Anum_pg_statistic_stanullfrac - 1] =
+									Float4GetDatum(data ? data->nullfrac : 1.0);
+	values[Anum_pg_statistic_stawidth - 1] =
+									Int32GetDatum(data ? data->width : 0);
+	values[Anum_pg_statistic_stadistinct - 1] =
+									Float4GetDatum(data ? data->distinct : 0);
+
+	for (i = 0; i < STATISTIC_NUM_SLOTS; i++)
+	{
+		StatsSlot *slot = data ? &data->slots[i] : NULL;
+
+		values[Anum_pg_statistic_stakind1 + i - 1] =
+								Int16GetDatum(slot ? slot->kind : 0);
+
+		values[Anum_pg_statistic_staop1 + i - 1] =
+					ObjectIdGetDatum(slot ? slot->opid : InvalidOid);
+
+		if (slot && DatumGetPointer(slot->numbers))
+			values[Anum_pg_statistic_stanumbers1 + i - 1] = slot->numbers;
+		else
+			nulls[Anum_pg_statistic_stanumbers1 + i - 1] = true;
+
+		if (slot && DatumGetPointer(slot->values))
+			values[Anum_pg_statistic_stavalues1 + i - 1] = slot->values;
+		else
+			nulls[Anum_pg_statistic_stavalues1 + i - 1] = true;
+	}
+
+	rel = table_open(StatisticRelationId, AccessShareLock);
+	tuple = heap_form_tuple(RelationGetDescr(rel), values, nulls);
+	table_close(rel, NoLock);
+
+	return tuple;
+}
diff --git a/src/include/utils/selfuncs.h b/src/include/utils/selfuncs.h
index c313a08..cb7d510 100644
--- a/src/include/utils/selfuncs.h
+++ b/src/include/utils/selfuncs.h
@@ -16,6 +16,7 @@
 #define SELFUNCS_H
 
 #include "access/htup.h"
+#include "catalog/pg_statistic.h"
 #include "fmgr.h"
 #include "nodes/pathnodes.h"
 
@@ -133,6 +134,24 @@ typedef struct
 	double		num_sa_scans;	/* # indexscans from ScalarArrayOpExprs */
 } GenericCosts;
 
+/* Single pg_statistic slot */
+typedef struct StatsSlot
+{
+	int16	kind;		/* stakindN: statistic kind (STATISTIC_KIND_XXX) */
+	Oid		opid;		/* staopN: associated operator, if needed */
+	Datum	numbers;	/* stanumbersN: float4 array of numbers */
+	Datum	values;		/* stavaluesN: anyarray of values */
+} StatsSlot;
+
+/* Deformed pg_statistic tuple */
+typedef struct StatsData
+{
+	float4		nullfrac;	/* stanullfrac: fraction of NULL values  */
+	float4		distinct;	/* stadistinct: number of distinct non-NULL values */
+	int32		width;		/* stawidth: average width in bytes of non-NULL values */
+	StatsSlot	slots[STATISTIC_NUM_SLOTS]; /* slots for different statistic types */
+} StatsData;
+
 /* Hooks for plugins to get control when we ask for stats */
 typedef bool (*get_relation_stats_hook_type) (PlannerInfo *root,
 											  RangeTblEntry *rte,
@@ -236,6 +255,9 @@ extern void genericcostestimate(PlannerInfo *root, IndexPath *path,
 								double loop_count,
 								GenericCosts *costs);
 
+extern HeapTuple stats_form_tuple(StatsData *data);
+
+
 /* Functions in array_selfuncs.c */
 
 extern Selectivity scalararraysel_containment(PlannerInfo *root,
-- 
1.8.3.1

v05_0004-Add-helper-jsonb-functions-and-mac.patchapplication/octet-stream; name=v05_0004-Add-helper-jsonb-functions-and-mac.patchDownload

From 81df5424a9c69045f85da6782ae7485c0f66d03e Mon Sep 17 00:00:00 2001
From: mahendra <mahendra@localhost.localdomain>
Date: Tue, 5 Apr 2022 07:48:56 +0300
Subject: [PATCH 4/6] Add helper jsonb functions and
 macros

---
 src/backend/utils/adt/jsonb_util.c |  27 ++++++
 src/backend/utils/adt/jsonfuncs.c  |  10 ++-
 src/include/utils/jsonb.h          | 165 +++++++++++++++++++++++++++++++++++--
 3 files changed, 194 insertions(+), 8 deletions(-)

diff --git a/src/backend/utils/adt/jsonb_util.c b/src/backend/utils/adt/jsonb_util.c
index aa151a5..5a7831d 100644
--- a/src/backend/utils/adt/jsonb_util.c
+++ b/src/backend/utils/adt/jsonb_util.c
@@ -391,6 +391,22 @@ findJsonbValueFromContainer(JsonbContainer *container, uint32 flags,
 }
 
 /*
+ * findJsonbValueFromContainer() wrapper that sets up JsonbValue key string.
+ */
+JsonbValue *
+findJsonbValueFromContainerLen(JsonbContainer *container, uint32 flags,
+							   const char *key, uint32 keylen)
+{
+	JsonbValue	k;
+
+	k.type = jbvString;
+	k.val.string.val = key;
+	k.val.string.len = keylen;
+
+	return findJsonbValueFromContainer(container, flags, &k);
+}
+
+/*
  * Find value by key in Jsonb object and fetch it into 'res', which is also
  * returned.
  *
@@ -604,6 +620,17 @@ pushJsonbValue(JsonbParseState **pstate, JsonbIteratorToken seq,
 		return pushJsonbValueScalar(pstate, seq, jbval);
 	}
 
+	/*
+	 * XXX I'm not quite sure why we actually do this? Why do we need to change
+	 * how JsonbValue is converted to Jsonb for the statistics patch?
+	 */
+	/* push value from scalar container without its enclosing array */
+	if (*pstate && JsonbExtractScalar(jbval->val.binary.data, &v))
+	{
+		Assert(IsAJsonbScalar(&v));
+		return pushJsonbValueScalar(pstate, seq, &v);
+	}
+
 	/* unpack the binary and add each piece to the pstate */
 	it = JsonbIteratorInit(jbval->val.binary.data);
 
diff --git a/src/backend/utils/adt/jsonfuncs.c b/src/backend/utils/adt/jsonfuncs.c
index a682d9c..ed5dd17 100644
--- a/src/backend/utils/adt/jsonfuncs.c
+++ b/src/backend/utils/adt/jsonfuncs.c
@@ -5305,7 +5305,8 @@ iterate_jsonb_values(Jsonb *jb, uint32 flags, void *state,
 		if (type == WJB_KEY)
 		{
 			if (flags & jtiKey)
-				action(state, v.val.string.val, v.val.string.len);
+				action(state, unconstify(char *, v.val.string.val),
+					   v.val.string.len);
 
 			continue;
 		}
@@ -5320,7 +5321,8 @@ iterate_jsonb_values(Jsonb *jb, uint32 flags, void *state,
 		{
 			case jbvString:
 				if (flags & jtiString)
-					action(state, v.val.string.val, v.val.string.len);
+					action(state, unconstify(char *, v.val.string.val),
+						   v.val.string.len);
 				break;
 			case jbvNumeric:
 				if (flags & jtiNumeric)
@@ -5442,7 +5444,9 @@ transform_jsonb_string_values(Jsonb *jsonb, void *action_state,
 	{
 		if ((type == WJB_VALUE || type == WJB_ELEM) && v.type == jbvString)
 		{
-			out = transform_action(action_state, v.val.string.val, v.val.string.len);
+			out = transform_action(action_state,
+								   unconstify(char *, v.val.string.val),
+								   v.val.string.len);
 			v.val.string.val = VARDATA_ANY(out);
 			v.val.string.len = VARSIZE_ANY_EXHDR(out);
 			res = pushJsonbValue(&st, type, type < WJB_BEGIN_ARRAY ? &v : NULL);
diff --git a/src/include/utils/jsonb.h b/src/include/utils/jsonb.h
index bae466b..158c9ae 100644
--- a/src/include/utils/jsonb.h
+++ b/src/include/utils/jsonb.h
@@ -14,6 +14,7 @@
 
 #include "lib/stringinfo.h"
 #include "utils/array.h"
+#include "utils/builtins.h"
 #include "utils/numeric.h"
 
 /* Tokens used when sequentially processing a jsonb value */
@@ -229,8 +230,7 @@ typedef struct
 #define JB_ROOT_IS_OBJECT(jbp_) ((*(uint32 *) VARDATA(jbp_) & JB_FOBJECT) != 0)
 #define JB_ROOT_IS_ARRAY(jbp_)	((*(uint32 *) VARDATA(jbp_) & JB_FARRAY) != 0)
 
-
-enum jbvType
+typedef enum jbvType
 {
 	/* Scalar types */
 	jbvNull = 0x0,
@@ -250,7 +250,7 @@ enum jbvType
 	 * into JSON strings when outputted to json/jsonb.
 	 */
 	jbvDatetime = 0x20,
-};
+} JsonbValueType;
 
 /*
  * JsonbValue:	In-memory representation of Jsonb.  This is a convenient
@@ -269,7 +269,7 @@ struct JsonbValue
 		struct
 		{
 			int			len;
-			char	   *val;	/* Not necessarily null-terminated */
+			const char *val;	/* Not necessarily null-terminated */
 		}			string;		/* String primitive type */
 
 		struct
@@ -400,6 +400,10 @@ extern int	compareJsonbContainers(JsonbContainer *a, JsonbContainer *b);
 extern JsonbValue *findJsonbValueFromContainer(JsonbContainer *sheader,
 											   uint32 flags,
 											   JsonbValue *key);
+extern JsonbValue *findJsonbValueFromContainerLen(JsonbContainer *container,
+												  uint32 flags,
+												  const char *key,
+												  uint32 keylen);
 extern JsonbValue *getKeyJsonValueFromContainer(JsonbContainer *container,
 												const char *keyVal, int keyLen,
 												JsonbValue *res);
@@ -444,5 +448,156 @@ extern Datum jsonb_build_object_worker(int nargs, Datum *args, bool *nulls,
 									   bool unique_keys);
 extern Datum jsonb_build_array_worker(int nargs, Datum *args, bool *nulls,
 									  Oid *types, bool absent_on_null);
+/*
+ *  * XXX Not sure we want to add these functions to jsonb.h, which is the
+ *   * public API. Maybe it belongs rather to jsonb_typanalyze.c or elsewhere,
+ *    * closer to how it's used?
+ *     */
+
+/* helper inline functions for JsonbValue initialization */
+static inline JsonbValue *
+JsonValueInitObject(JsonbValue *val, int nPairs, int nPairsAllocated)
+{
+	val->type = jbvObject;
+	val->val.object.nPairs = nPairs;
+	val->val.object.pairs = nPairsAllocated ?
+							palloc(sizeof(JsonbPair) * nPairsAllocated) : NULL;
+
+	return val;
+}
+
+static inline JsonbValue *
+JsonValueInitArray(JsonbValue *val, int nElems, int nElemsAllocated,
+				   bool rawScalar)
+{
+	val->type = jbvArray;
+	val->val.array.nElems = nElems;
+	val->val.array.elems = nElemsAllocated ?
+							palloc(sizeof(JsonbValue) * nElemsAllocated) : NULL;
+	val->val.array.rawScalar = rawScalar;
+
+	return val;
+}
+
+static inline JsonbValue *
+JsonValueInitBinary(JsonbValue *val, Jsonb *jb)
+{
+	val->type = jbvBinary;
+	val->val.binary.data = &(jb)->root;
+	val->val.binary.len = VARSIZE_ANY_EXHDR(jb);
+	return val;
+}
+
+
+static inline JsonbValue *
+JsonValueInitString(JsonbValue *jbv, const char *str)
+{
+	jbv->type = jbvString;
+	jbv->val.string.len = strlen(str);
+	jbv->val.string.val = memcpy(palloc(jbv->val.string.len + 1), str,
+								 jbv->val.string.len + 1);
+	return jbv;
+}
+
+static inline JsonbValue *
+JsonValueInitStringWithLen(JsonbValue *jbv, const char *str, int len)
+{
+	jbv->type = jbvString;
+	jbv->val.string.val = str;
+	jbv->val.string.len = len;
+	return jbv;
+}
+
+static inline JsonbValue *
+JsonValueInitText(JsonbValue *jbv, text *txt)
+{
+	jbv->type = jbvString;
+	jbv->val.string.val = VARDATA_ANY(txt);
+	jbv->val.string.len = VARSIZE_ANY_EXHDR(txt);
+	return jbv;
+}
+
+static inline JsonbValue *
+JsonValueInitNumeric(JsonbValue *jbv, Numeric num)
+{
+	jbv->type = jbvNumeric;
+	jbv->val.numeric = num;
+	return jbv;
+}
+
+static inline JsonbValue *
+JsonValueInitInteger(JsonbValue *jbv, int64 i)
+{
+	jbv->type = jbvNumeric;
+	jbv->val.numeric = DatumGetNumeric(DirectFunctionCall1(
+											int8_numeric, Int64GetDatum(i)));
+	return jbv;
+}
+
+static inline JsonbValue *
+JsonValueInitFloat(JsonbValue *jbv, float4 f)
+{
+	jbv->type = jbvNumeric;
+	jbv->val.numeric = DatumGetNumeric(DirectFunctionCall1(
+											float4_numeric, Float4GetDatum(f)));
+	return jbv;
+}
+
+static inline JsonbValue *
+JsonValueInitDouble(JsonbValue *jbv, float8 f)
+{
+	jbv->type = jbvNumeric;
+	jbv->val.numeric = DatumGetNumeric(DirectFunctionCall1(
+											float8_numeric, Float8GetDatum(f)));
+	return jbv;
+}
+
+/* helper macros for jsonb building */
+#define pushJsonbKey(pstate, jbv, key) \
+		pushJsonbValue(pstate, WJB_KEY, JsonValueInitString(jbv, key))
+
+#define pushJsonbValueGeneric(Type, pstate, jbv, val) \
+		pushJsonbValue(pstate, WJB_VALUE, JsonValueInit##Type(jbv, val))
+
+#define pushJsonbElemGeneric(Type, pstate, jbv, val) \
+		pushJsonbValue(pstate, WJB_ELEM, JsonValueInit##Type(jbv, val))
+
+#define pushJsonbValueInteger(pstate, jbv, i) \
+		pushJsonbValueGeneric(Integer, pstate, jbv, i)
+
+#define pushJsonbValueFloat(pstate, jbv, f) \
+		pushJsonbValueGeneric(Float, pstate, jbv, f)
+
+#define pushJsonbElemFloat(pstate, jbv, f) \
+		pushJsonbElemGeneric(Float, pstate, jbv, f)
+
+#define pushJsonbElemString(pstate, jbv, txt) \
+		pushJsonbElemGeneric(String, pstate, jbv, txt)
+
+#define pushJsonbElemText(pstate, jbv, txt) \
+		pushJsonbElemGeneric(Text, pstate, jbv, txt)
+
+#define pushJsonbElemNumeric(pstate, jbv, num) \
+		pushJsonbElemGeneric(Numeric, pstate, jbv, num)
+
+#define pushJsonbElemInteger(pstate, jbv, num) \
+		pushJsonbElemGeneric(Integer, pstate, jbv, num)
+
+#define pushJsonbElemBinary(pstate, jbv, jbcont) \
+		pushJsonbElemGeneric(Binary, pstate, jbv, jbcont)
+
+#define pushJsonbKeyValueGeneric(Type, pstate, jbv, key, val) ( \
+		pushJsonbKey(pstate, jbv, key), \
+		pushJsonbValueGeneric(Type, pstate, jbv, val) \
+	)
+
+#define pushJsonbKeyValueString(pstate, jbv, key, val) \
+		pushJsonbKeyValueGeneric(String, pstate, jbv, key, val)
+
+#define pushJsonbKeyValueFloat(pstate, jbv, key, val) \
+		pushJsonbKeyValueGeneric(Float, pstate, jbv, key, val)
+
+#define pushJsonbKeyValueInteger(pstate, jbv, key, val) \
+		pushJsonbKeyValueGeneric(Integer, pstate, jbv, key, val)
 
-#endif							/* __JSONB_H__ */
+#endif                                                 /* __JSONB_H__ */
-- 
1.8.3.1

v05_0006-Add-jsonb-statistics.patchapplication/octet-stream; name=v05_0006-Add-jsonb-statistics.patchDownload

From 47bb4944fec83c4b47a3dfcac9851cf0450e179f Mon Sep 17 00:00:00 2001
From: mahendra <mahendra@localhost.localdomain>
Date: Thu, 7 Apr 2022 15:05:57 +0300
Subject: [PATCH 6/6] Add jsonb statistics

---
 src/backend/catalog/system_functions.sql  |   36 +
 src/backend/catalog/system_views.sql      |   56 +
 src/backend/utils/adt/Makefile            |    2 +
 src/backend/utils/adt/jsonb_selfuncs.c    | 1582 ++++++++++++++++++++++++++++
 src/backend/utils/adt/jsonb_typanalyze.c  | 1627 +++++++++++++++++++++++++++++
 src/backend/utils/adt/jsonpath_exec.c     |    2 +-
 src/include/catalog/pg_operator.dat       |   17 +-
 src/include/catalog/pg_proc.dat           |   11 +
 src/include/catalog/pg_statistic.h        |    2 +
 src/include/catalog/pg_type.dat           |    2 +-
 src/include/utils/json_selfuncs.h         |  113 ++
 src/test/regress/expected/jsonb_stats.out |  713 +++++++++++++
 src/test/regress/expected/rules.out       |   32 +
 src/test/regress/parallel_schedule        |    2 +-
 src/test/regress/sql/jsonb_stats.sql      |  249 +++++
 15 files changed, 4435 insertions(+), 11 deletions(-)
 create mode 100644 src/backend/utils/adt/jsonb_selfuncs.c
 create mode 100644 src/backend/utils/adt/jsonb_typanalyze.c
 create mode 100644 src/include/utils/json_selfuncs.h
 create mode 100644 src/test/regress/expected/jsonb_stats.out
 create mode 100644 src/test/regress/sql/jsonb_stats.sql

diff --git a/src/backend/catalog/system_functions.sql b/src/backend/catalog/system_functions.sql
index 81bac6f..0b9f68e 100644
--- a/src/backend/catalog/system_functions.sql
+++ b/src/backend/catalog/system_functions.sql
@@ -594,6 +594,42 @@ LANGUAGE internal
 STRICT IMMUTABLE PARALLEL SAFE
 AS 'unicode_is_normalized';
 
+-- XXX is this function immutable / parallel safe?
+-- XXX do we actually need to cast to text and then to jsonb?
+CREATE FUNCTION pg_json_path_stats(tab regclass, path_index integer) RETURNS text
+AS $$
+	SELECT jsonb_pretty((
+		CASE
+			WHEN stakind1 = 8 THEN stavalues1
+			WHEN stakind2 = 8 THEN stavalues2
+			WHEN stakind3 = 8 THEN stavalues3
+			WHEN stakind4 = 8 THEN stavalues4
+			WHEN stakind5 = 8 THEN stavalues5
+		END::text::jsonb[])[$2])
+	FROM pg_statistic
+	WHERE starelid = $1
+$$ LANGUAGE 'sql';
+
+-- XXX is this function immutable / parallel safe?
+-- XXX do we actually need to cast to text and then to jsonb?
+CREATE FUNCTION pg_json_path_stats(tab regclass, path text) RETURNS text
+AS $$
+	SELECT jsonb_pretty(pathstats)
+	FROM (
+		SELECT unnest(
+			CASE
+				WHEN stakind1 = 8 THEN stavalues1
+				WHEN stakind2 = 8 THEN stavalues2
+				WHEN stakind3 = 8 THEN stavalues3
+				WHEN stakind4 = 8 THEN stavalues4
+				WHEN stakind5 = 8 THEN stavalues5
+			END::text::jsonb[]) pathstats
+		FROM pg_statistic
+		WHERE starelid = $1
+	) paths
+	WHERE pathstats->>'path' = $2
+$$ LANGUAGE 'sql';
+
 --
 -- The default permissions for functions mean that anyone can execute them.
 -- A number of functions shouldn't be executable by just anyone, but rather
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 9eaa51d..ee35239 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -364,6 +364,62 @@ CREATE VIEW pg_stats_ext_exprs WITH (security_barrier) AS
 -- unprivileged users may read pg_statistic_ext but not pg_statistic_ext_data
 REVOKE ALL ON pg_statistic_ext_data FROM public;
 
+-- XXX This probably needs to do the same checks as pg_stats, i.e.
+--    WHERE NOT attisdropped
+--    AND has_column_privilege(c.oid, a.attnum, 'select')
+--    AND (c.relrowsecurity = false OR NOT row_security_active(c.oid));
+CREATE VIEW pg_stats_json AS
+	SELECT
+		nspname AS schemaname,
+		relname AS tablename,
+		attname AS attname,
+
+		path->>'path' AS json_path,
+
+		stainherit AS inherited,
+
+		(path->'json'->>'nullfrac')::float4 AS null_frac,
+		(path->'json'->>'width')::float4 AS avg_width,
+		(path->'json'->>'distinct')::float4 AS n_distinct,
+
+		ARRAY(SELECT val FROM jsonb_array_elements(
+				path->'json'->'mcv'->'values') val)::anyarray
+			AS most_common_vals,
+
+		ARRAY(SELECT num::text::float4 FROM jsonb_array_elements(
+				path->'json'->'mcv'->'numbers') num)
+			AS most_common_freqs,
+
+		ARRAY(SELECT val FROM jsonb_array_elements(
+				path->'json'->'histogram'->'values') val)
+			AS histogram_bounds,
+
+		ARRAY(SELECT val::text::int FROM jsonb_array_elements(
+				path->'array_length'->'mcv'->'values') val)
+			AS most_common_array_lengths,
+
+		ARRAY(SELECT num::text::float4 FROM jsonb_array_elements(
+				path->'array_length'->'mcv'->'numbers') num)
+			AS most_common_array_length_freqs,
+
+		(path->'json'->>'correlation')::float4 AS correlation
+
+	FROM
+		pg_statistic s JOIN pg_class c ON (c.oid = s.starelid)
+		JOIN pg_attribute a ON (c.oid = attrelid AND attnum = s.staattnum)
+		LEFT JOIN pg_namespace n ON (n.oid = c.relnamespace),
+		LATERAL (
+			SELECT unnest((CASE
+					WHEN stakind1 = 8 THEN stavalues1
+					WHEN stakind2 = 8 THEN stavalues2
+					WHEN stakind3 = 8 THEN stavalues3
+					WHEN stakind4 = 8 THEN stavalues4
+					WHEN stakind5 = 8 THEN stavalues5
+				END ::text::jsonb[])[2:]) AS path
+		) paths;
+
+-- no need to revoke any privileges, we've already revoked accss to pg_statistic
+
 CREATE VIEW pg_publication_tables AS
     SELECT
         P.pubname AS pubname,
diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile
index 7c722ea..072a529 100644
--- a/src/backend/utils/adt/Makefile
+++ b/src/backend/utils/adt/Makefile
@@ -51,6 +51,8 @@ OBJS = \
 	jsonb.o \
 	jsonb_gin.o \
 	jsonb_op.o \
+	jsonb_selfuncs.o \
+	jsonb_typanalyze.o \
 	jsonb_util.o \
 	jsonfuncs.o \
 	jsonbsubs.o \
diff --git a/src/backend/utils/adt/jsonb_selfuncs.c b/src/backend/utils/adt/jsonb_selfuncs.c
new file mode 100644
index 0000000..f5520f8
--- /dev/null
+++ b/src/backend/utils/adt/jsonb_selfuncs.c
@@ -0,0 +1,1582 @@
+/*-------------------------------------------------------------------------
+ *
+ * jsonb_selfuncs.c
+ *	  Functions for selectivity estimation of jsonb operators
+ *
+ * Copyright (c) 2016-2022, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/adt/jsonb_selfuncs.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include <math.h>
+
+#include "fmgr.h"
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "catalog/pg_operator.h"
+#include "catalog/pg_statistic.h"
+#include "catalog/pg_type.h"
+#include "common/string.h"
+#include "nodes/primnodes.h"
+#include "utils/builtins.h"
+#include "utils/json.h"
+#include "utils/jsonb.h"
+#include "utils/json_selfuncs.h"
+#include "utils/lsyscache.h"
+#include "utils/rel.h"
+#include "utils/selfuncs.h"
+
+#define DEFAULT_JSON_CONTAINS_SEL	0.001
+
+/*
+ * jsonGetField
+ *		Given a JSONB document and a key, extract the JSONB value for the key.
+ */
+static inline Datum
+jsonGetField(Datum obj, const char *field)
+{
+	Jsonb 	   *jb = DatumGetJsonbP(obj);
+	JsonbValue *jbv = findJsonbValueFromContainerLen(&jb->root, JB_FOBJECT,
+													 field, strlen(field));
+	return jbv ? JsonbPGetDatum(JsonbValueToJsonb(jbv)) : PointerGetDatum(NULL);
+}
+
+/*
+ * jsonGetFloat4
+ *		Given a JSONB value, interpret it as a float4 value.
+ *
+ * This expects the JSONB value to be a numeric, because that's how we store
+ * floats in JSONB, and we cast it to float4.
+ */
+static inline float4
+jsonGetFloat4(Datum jsonb, float4 default_val)
+{
+	Jsonb	   *jb;
+	JsonbValue	jv;
+
+	if (!DatumGetPointer(jsonb))
+		return default_val;
+
+	jb = DatumGetJsonbP(jsonb);
+
+	if (!JsonbExtractScalar(&jb->root, &jv) || jv.type != jbvNumeric)
+		return default_val;
+
+	return DatumGetFloat4(DirectFunctionCall1(numeric_float4,
+											  NumericGetDatum(jv.val.numeric)));
+}
+
+/*
+ * jsonStatsInit
+ *		Given a pg_statistic tuple, expand STATISTIC_KIND_JSON into JsonStats.
+ */
+bool
+jsonStatsInit(JsonStats data, const VariableStatData *vardata)
+{
+	Jsonb	   *jb;
+	JsonbValue	prefix;
+
+	if (!vardata->statsTuple)
+		return false;
+
+	data->statsTuple = vardata->statsTuple;
+	memset(&data->attslot, 0, sizeof(data->attslot));
+
+	/* Were there just NULL values in the column? No JSON stats, but still useful. */
+	if (((Form_pg_statistic) GETSTRUCT(data->statsTuple))->stanullfrac >= 1.0)
+	{
+		data->nullfrac = 1.0;
+		return true;
+	}
+
+	/* Do we have the JSON stats built in the pg_statistic? */
+	if (!get_attstatsslot(&data->attslot, data->statsTuple,
+						  STATISTIC_KIND_JSON, InvalidOid,
+						  ATTSTATSSLOT_NUMBERS | ATTSTATSSLOT_VALUES))
+		return false;
+
+	/*
+	 * Valid JSON stats should have at least 2 elements in values:
+	 *  0th - root path prefix
+	 *  1st - root path stats
+	 */
+	if (data->attslot.nvalues < 2)
+	{
+		free_attstatsslot(&data->attslot);
+		return false;
+	}
+
+	/* XXX If the ACL check was not OK, would we even get here? */
+	data->acl_ok = vardata->acl_ok;
+	data->rel = vardata->rel;
+	data->nullfrac =
+		data->attslot.nnumbers > 0 ? data->attslot.numbers[0] : 0.0;
+	data->pathdatums = data->attslot.values + 1;
+	data->npaths = data->attslot.nvalues - 1;
+
+	/* Extract root path prefix */
+	jb = DatumGetJsonbP(data->attslot.values[0]);
+	if (!JsonbExtractScalar(&jb->root, &prefix) || prefix.type != jbvString)
+	{
+		free_attstatsslot(&data->attslot);
+		return false;
+	}
+
+	data->prefix = prefix.val.string.val;
+	data->prefixlen = prefix.val.string.len;
+
+	/* Create path cache, initialze only two fields that acting as flags */
+	data->paths = palloc(sizeof(*data->paths) * data->npaths);
+
+	for (int i = 0; i < data->npaths; i++)
+	{
+		data->paths[i].data = NULL;
+		data->paths[i].path = NULL;
+	}
+
+	return true;
+}
+
+/*
+ * jsonStatsRelease
+ *		Release resources (statistics slot) associated with the JsonStats value.
+ */
+void
+jsonStatsRelease(JsonStats data)
+{
+	free_attstatsslot(&data->attslot);
+}
+
+/*
+ * jsonPathStatsAllocSpecialStats
+ *		Allocate a copy of JsonPathStats for accessing special (length etc.)
+ *		stats for a given JSON path.
+ */
+static JsonPathStats
+jsonPathStatsAllocSpecialStats(JsonPathStats pstats, JsonPathStatsType type)
+{
+	JsonPathStats stats;
+
+	if (!pstats)
+		return NULL;
+
+	/* copy and replace stats type */
+	stats = palloc(sizeof(*stats));
+	*stats = *pstats;
+	stats->type = type;
+
+	return stats;
+}
+
+/*
+ * jsonPathStatsGetArrayLengthStats
+ *		Extract statistics of array lengths for the path.
+ */
+JsonPathStats
+jsonPathStatsGetArrayLengthStats(JsonPathStats pstats)
+{
+	/*
+	 * The array length statistics is relevant only for values that are arrays.
+	 * So if we observed no such values, we know there can't be such
+	 * statistics and so we simply return NULL.
+	 */
+	if (jsonPathStatsGetTypeFreq(pstats, jbvArray, 0.0) <= 0.0)
+		return NULL;
+
+	return jsonPathStatsAllocSpecialStats(pstats, JsonPathStatsArrayLength);
+}
+
+/*
+ * jsonPathStatsGetObjectLengthStats
+ *		Extract statistics of object length for the path.
+ */
+JsonPathStats
+jsonPathStatsGetObjectLengthStats(JsonPathStats pstats)
+{
+	/*
+	 * The object length statistics is relevant only for values that are arrays.
+	 * So if we observed no such values, we know there can't be such
+	 * statistics and so we simply return NULL.
+	 */
+	if (jsonPathStatsGetTypeFreq(pstats, jbvObject, 0.0) <= 0.0)
+		return NULL;
+
+	return jsonPathStatsAllocSpecialStats(pstats, JsonPathStatsObjectLength);
+}
+
+/*
+ * jsonPathStatsGetPath
+ *		Try to use cached path name or extract it from per-path stats datum.
+ *
+ * Returns true on succces, false on error.
+ */
+static inline bool
+jsonPathStatsGetPath(JsonPathStats stats, Datum pathdatum,
+					 const char **path, int *pathlen)
+{
+	*path = stats->path;
+
+	if (*path)
+		/* use cached path name */
+		*pathlen = stats->pathlen;
+	else
+	{
+		Jsonb	   *jsonb = DatumGetJsonbP(pathdatum);
+		JsonbValue	pathkey;
+		JsonbValue *pathval;
+
+		/* extract path from the statistics represented as jsonb document */
+		JsonValueInitStringWithLen(&pathkey, "path", 4);
+		pathval = findJsonbValueFromContainer(&jsonb->root, JB_FOBJECT, &pathkey);
+
+		if (!pathval || pathval->type != jbvString)
+			return false;	/* XXX invalid stats data, maybe throw error */
+
+		/* cache extracted path name */
+		*path = stats->path = pathval->val.string.val;
+		*pathlen = stats->pathlen = pathval->val.string.len;
+	}
+
+	return true;
+}
+
+/* Context for bsearch()ing paths */
+typedef struct JsonPathStatsSearchContext
+{
+	JsonStats	stats;
+	const char *path;
+	int			pathlen;
+} JsonPathStatsSearchContext;
+
+/*
+ * jsonPathStatsCompare
+ *		Compare two JsonPathStats structs, so that we can sort them.
+ *
+ * We do this so that we can search for stats for a given path simply by
+ * bsearch().
+ *
+ * XXX We never build two structs for the same path, so we know the paths
+ * are different - one may be a prefix of the other, but then we sort the
+ * strings by length.
+ */
+static int
+jsonPathStatsCompare(const void *pv1, const void *pv2)
+{
+	JsonPathStatsSearchContext const *cxt = pv1;
+	Datum const *pathdatum = (Datum const *) pv2;
+	int			index = pathdatum - cxt->stats->pathdatums;
+	JsonPathStats stats = &cxt->stats->paths[index];
+	const char *path;
+	int			pathlen;
+	int			res;
+
+	if (!jsonPathStatsGetPath(stats, *pathdatum, &path, &pathlen))
+		return 1;	/* XXX invalid stats data */
+
+	/* compare the shared part first, then compare by length */
+	res = strncmp(cxt->path, path, Min(cxt->pathlen, pathlen));
+
+	return res ? res : cxt->pathlen - pathlen;
+}
+
+/*
+ * jsonStatsFindPath
+ *		Find stats for a given path.
+ *
+ * The stats are sorted by path, so we can simply do bsearch().
+ * This is low-level function and jsdata->prefix is not considered, the caller
+ * should handle it by itself.
+ */
+static JsonPathStats
+jsonStatsFindPath(JsonStats jsdata, const char *path, int pathlen)
+{
+	JsonPathStatsSearchContext cxt;
+	JsonPathStats stats;
+	Datum	   *pdatum;
+	int			index;
+
+	cxt.stats = jsdata;
+	cxt.path = path;
+	cxt.pathlen = pathlen;
+
+	pdatum = bsearch(&cxt, jsdata->pathdatums, jsdata->npaths,
+					 sizeof(*jsdata->pathdatums), jsonPathStatsCompare);
+
+	if (!pdatum)
+		return NULL;
+
+	index = pdatum - jsdata->pathdatums;
+	stats = &jsdata->paths[index];
+
+	Assert(stats->path);
+	Assert(stats->pathlen == pathlen);
+
+	/* Init all fields if needed (stats->data == NULL means uninitialized) */
+	if (!stats->data)
+	{
+		stats->data = jsdata;
+		stats->datum = pdatum;
+		stats->type = JsonPathStatsValues;
+	}
+
+	return stats;
+}
+
+/*
+ * jsonStatsGetPathByStr
+ *		Find stats for a given path string considering jsdata->prefix.
+ */
+JsonPathStats
+jsonStatsGetPathByStr(JsonStats jsdata, const char *subpath, int subpathlen)
+{
+	JsonPathStats stats;
+	char	   *path;
+	int			pathlen;
+
+	if (jsdata->nullfrac >= 1.0)
+		return NULL;
+
+	pathlen = jsdata->prefixlen + subpathlen - 1;
+	path = palloc(pathlen);
+
+	memcpy(path, jsdata->prefix, jsdata->prefixlen);
+	memcpy(&path[jsdata->prefixlen], &subpath[1], subpathlen - 1);
+
+	stats = jsonStatsFindPath(jsdata, path, pathlen);
+
+	if (!stats)
+		pfree(path);
+
+	return stats;
+}
+
+/*
+ * jsonStatsGetRootPath
+ *		Find JSON stats for root prefix path.
+ */
+static JsonPathStats
+jsonStatsGetRootPath(JsonStats jsdata)
+{
+	if (jsdata->nullfrac >= 1.0)
+		return NULL;
+
+	return jsonStatsFindPath(jsdata, jsdata->prefix, jsdata->prefixlen);
+}
+
+#define jsonStatsGetRootArrayPath(jsdata) \
+		jsonStatsGetPathByStr(jsdata, JSON_PATH_ROOT_ARRAY, JSON_PATH_ROOT_ARRAY_LEN)
+
+/*
+ * jsonPathAppendEntry
+ *		Append entry (represented as simple string) to a path.
+ *
+ * NULL entry is treated as wildcard array accessor "[*]".
+ */
+void
+jsonPathAppendEntry(StringInfo path, const char *entry)
+{
+	if (entry)
+	{
+		appendStringInfoCharMacro(path, '.');
+		escape_json(path, entry);
+	}
+	else
+		appendStringInfoString(path, "[*]");
+}
+
+/*
+ * jsonPathAppendEntryWithLen
+ *		Append string (represented as string + length) to a path.
+ */
+static void
+jsonPathAppendEntryWithLen(StringInfo path, const char *entry, int len)
+{
+	char *tmpentry = pnstrdup(entry, len);
+	jsonPathAppendEntry(path, tmpentry);
+	pfree(tmpentry);
+}
+
+/*
+ * jsonPathStatsGetSubpath
+ *		Find JSON path stats for object key or array elements (if 'key' = NULL).
+ */
+JsonPathStats
+jsonPathStatsGetSubpath(JsonPathStats pstats, const char *key)
+{
+	JsonPathStats spstats;
+	StringInfoData str;
+
+	initStringInfo(&str);
+	appendBinaryStringInfo(&str, pstats->path, pstats->pathlen);
+	jsonPathAppendEntry(&str, key);
+
+	spstats = jsonStatsFindPath(pstats->data, str.data, str.len);
+	if (!spstats)
+		pfree(str.data);
+
+	return spstats;
+}
+
+/*
+ * jsonPathStatsGetArrayIndexSelectivity
+ *		Given stats for a path, determine selectivity for an array index.
+ */
+Selectivity
+jsonPathStatsGetArrayIndexSelectivity(JsonPathStats pstats, int index)
+{
+	JsonPathStats lenstats = jsonPathStatsGetArrayLengthStats(pstats);
+	JsonbValue	tmpjbv;
+	Jsonb	   *jb;
+
+	/*
+	 * If we have no array length stats, assume all documents match.
+	 *
+	 * XXX Shouldn't this use a default smaller than 1.0? What do the selfuncs
+	 * for regular arrays use?
+	 */
+	if (!lenstats)
+		return 1.0;
+
+	jb = JsonbValueToJsonb(JsonValueInitInteger(&tmpjbv, index));
+
+	/* calculate fraction of elements smaller than the index */
+	return jsonSelectivity(lenstats, JsonbPGetDatum(jb), JsonbGtOperator);
+}
+
+/*
+ * jsonStatsGetPath
+ *		Find JSON statistics for a given path.
+ *
+ * 'path' is an array of text datums of length 'pathlen' (can be zero).
+ */
+static JsonPathStats
+jsonStatsGetPath(JsonStats jsdata, Datum *path, int pathlen,
+				 bool try_arrays_indexes, float4 *nullfrac)
+{
+	JsonPathStats pstats = jsonStatsGetRootPath(jsdata);
+	Selectivity	sel = 1.0;
+
+	for (int i = 0; pstats && i < pathlen; i++)
+	{
+		char	   *key = TextDatumGetCString(path[i]);
+		char	   *tail;
+		int			index;
+
+		if (!try_arrays_indexes)
+		{
+			/* Find object key stats */
+			pstats = jsonPathStatsGetSubpath(pstats, key);
+			pfree(key);
+			continue;
+		}
+
+		/* Try to interpret path entry as integer array index */
+		errno = 0;
+		index = strtoint(key, &tail, 10);
+
+		if (tail == key || *tail != '\0' || errno != 0)
+		{
+			/* Find object key stats */
+			pstats = jsonPathStatsGetSubpath(pstats, key);
+		}
+		else
+		{
+			/* Find array index stats */
+			/* FIXME consider object key "index" also */
+			JsonPathStats arrstats = jsonPathStatsGetSubpath(pstats, NULL);
+
+			if (arrstats)
+			{
+				float4		arrfreq = jsonPathStatsGetFreq(pstats, 0.0);
+
+				sel *= jsonPathStatsGetArrayIndexSelectivity(pstats, index);
+
+				if (arrfreq > 0.0)
+					sel /= arrfreq;
+			}
+
+			pstats = arrstats;
+		}
+
+		pfree(key);
+	}
+
+	*nullfrac = 1.0 - sel;
+
+	return pstats;
+}
+
+/*
+ * jsonPathStatsGetNextSubpathStats
+ *		Iterate all collected subpaths of a given path.
+ *
+ * This function can be useful for estimation of selectivity of jsonpath
+ * '.*' and  '.**' operators.
+ *
+ * The next found subpath is written into *pkeystats, which should be set to
+ * NULL before the first call.
+ *
+ * If keysOnly is true, emit only top-level object-key subpaths.
+ *
+ * Returns false on the end of iteration and true otherwise.
+ */
+bool
+jsonPathStatsGetNextSubpathStats(JsonPathStats stats, JsonPathStats *pkeystats,
+								 bool keysOnly)
+{
+	JsonPathStats keystats = *pkeystats;
+	/* compute next index */
+	int			index =
+		(keystats ? keystats->datum : stats->datum) - stats->data->pathdatums + 1;
+
+	if (stats->type != JsonPathStatsValues)
+		return false;	/* length stats doe not have subpaths */
+
+	for (; index < stats->data->npaths; index++)
+	{
+		Datum	   *pathdatum = &stats->data->pathdatums[index];
+		const char *path;
+		int			pathlen;
+
+		keystats = &stats->data->paths[index];
+
+		if (!jsonPathStatsGetPath(keystats, *pathdatum, &path, &pathlen))
+			break;	/* invalid path stats */
+
+		/* Break, if subpath does not start from a desired prefix */
+		if (pathlen <= stats->pathlen ||
+			memcmp(path, stats->path, stats->pathlen))
+			break;
+
+		if (keysOnly)
+		{
+			const char *c = &path[stats->pathlen];
+
+			if (*c == '[')
+			{
+				Assert(c[1] == '*' && c[2] == ']');
+
+#if 0	/* TODO add separate flag for requesting top-level array accessors */
+				/* skip if it is not last key in the path */
+				if (pathlen > stats->pathlen + 3)
+#endif
+					continue;	/* skip array accessors */
+			}
+			else if (*c == '.')
+			{
+				/* find end of '."key"' */
+				const char *pathend = path + pathlen - 1;
+
+				if (++c >= pathend || *c != '"')
+					break;		/* invalid path */
+
+				while (++c <= pathend && *c != '"')
+					if (*c == '\\')	/* handle escaped chars */
+						c++;
+
+				if (c > pathend)
+					break;		/* invalid path */
+
+				/* skip if it is not last key in the path */
+				if (c < pathend)
+					continue;
+			}
+			else
+				continue;	/* invalid path */
+		}
+
+		/* Init path stats if needed */
+		if (!keystats->data)
+		{
+			keystats->data = stats->data;
+			keystats->datum = pathdatum;
+			keystats->type = JsonPathStatsValues;
+		}
+
+		*pkeystats = keystats;
+
+		return true;
+	}
+
+	return false;
+}
+
+/*
+ * jsonStatsConvertArray
+ *		Convert a JSONB array into an array of some regular data type.
+ *
+ * The "type" identifies what elements are in the input JSONB array, while
+ * typid determines the target type.
+ */
+static Datum
+jsonStatsConvertArray(Datum jsonbValueArray, JsonStatType type, Oid typid,
+					  float4 multiplier)
+{
+	Datum	   *values;
+	Jsonb	   *jbvals;
+	JsonbValue	jbv;
+	JsonbIterator *it;
+	JsonbIteratorToken r;
+	int			nvalues;
+	int			i;
+	int16		typlen;
+	bool		typbyval;
+	char		typalign;
+
+	if (!DatumGetPointer(jsonbValueArray))
+		return PointerGetDatum(NULL);
+
+	jbvals = DatumGetJsonbP(jsonbValueArray);
+
+	nvalues = JsonContainerSize(&jbvals->root);
+
+	values = palloc(sizeof(Datum) * nvalues);
+
+	for (i = 0, it = JsonbIteratorInit(&jbvals->root);
+		(r = JsonbIteratorNext(&it, &jbv, true)) != WJB_DONE;)
+	{
+		if (r == WJB_ELEM)
+		{
+			Datum value;
+
+			switch (type)
+			{
+				case JsonStatJsonb:
+				case JsonStatJsonbWithoutSubpaths:
+					value = JsonbPGetDatum(JsonbValueToJsonb(&jbv));
+					break;
+
+				case JsonStatText:
+				case JsonStatString:
+					Assert(jbv.type == jbvString);
+					value = PointerGetDatum(
+								cstring_to_text_with_len(jbv.val.string.val,
+														 jbv.val.string.len));
+					break;
+
+				case JsonStatNumeric:
+					Assert(jbv.type == jbvNumeric);
+					value = NumericGetDatum(jbv.val.numeric);
+					break;
+
+				case JsonStatFloat4:
+					Assert(jbv.type == jbvNumeric);
+					value = DirectFunctionCall1(numeric_float4,
+												NumericGetDatum(jbv.val.numeric));
+					value = Float4GetDatum(DatumGetFloat4(value) * multiplier);
+					break;
+
+				default:
+					elog(ERROR, "invalid json stat type %d", type);
+					value = (Datum) 0;
+					break;
+			}
+
+			Assert(i < nvalues);
+			values[i++] = value;
+		}
+	}
+
+	Assert(i == nvalues);
+
+	get_typlenbyvalalign(typid, &typlen, &typbyval, &typalign);
+
+	return PointerGetDatum(
+		construct_array(values, nvalues, typid, typlen, typbyval, typalign));
+}
+
+/*
+ * jsonPathStatsExtractData
+ *		Extract pg_statistics values from statistics for a single path.
+ *
+ * Extract ordinary MCV, Histogram, Correlation slots for a requested stats
+ * type. If requested stats for JSONB, include also transformed JSON slot for
+ * a path and possibly for its subpaths.
+ */
+static bool
+jsonPathStatsExtractData(JsonPathStats pstats, JsonStatType stattype,
+						 float4 nullfrac, StatsData *statdata)
+{
+	Datum		data;
+	Datum		nullf;
+	Datum		dist;
+	Datum		width;
+	Datum		mcv;
+	Datum		hst;
+	Datum		corr;
+	Oid			type;
+	Oid			eqop;
+	Oid			ltop;
+	const char *key;
+	StatsSlot  *slot = statdata->slots;
+
+	nullfrac = 1.0 - (1.0 - pstats->data->nullfrac) * (1.0 - nullfrac);
+
+	/*
+	 * Depending on requested statistics type, select:
+	 *	- stavalues data type
+	 *	- corresponding eq/lt operators
+	 *	- JSONB field, containing stats slots for this statistics type
+	 */
+	switch (stattype)
+	{
+		case JsonStatJsonb:
+		case JsonStatJsonbWithoutSubpaths:
+			key = pstats->type == JsonPathStatsArrayLength ? "array_length" :
+				  pstats->type == JsonPathStatsObjectLength ? "object_length" :
+				  "json";
+			type = JSONBOID;
+			eqop = JsonbEqOperator;
+			ltop = JsonbLtOperator;
+			break;
+		case JsonStatText:
+			key = "text";
+			type = TEXTOID;
+			eqop = TextEqualOperator;
+			ltop = TextLessOperator;
+			break;
+		case JsonStatString:
+			key = "string";
+			type = TEXTOID;
+			eqop = TextEqualOperator;
+			ltop = TextLessOperator;
+			break;
+		case JsonStatNumeric:
+			key = "numeric";
+			type = NUMERICOID;
+			eqop = NumericEqOperator;
+			ltop = NumericLtOperator;
+			break;
+		case JsonStatFloat4:	/* special internal stats type */
+		default:
+			elog(ERROR, "invalid json statistic type %d", stattype);
+			break;
+	}
+
+	/* Extract object containing slots */
+	data = jsonGetField(*pstats->datum, key);
+
+	if (!DatumGetPointer(data))
+		return false;
+
+	nullf = jsonGetField(data, "nullfrac");
+	dist = jsonGetField(data, "distinct");
+	width = jsonGetField(data, "width");
+	mcv = jsonGetField(data, "mcv");
+	hst = jsonGetField(data, "histogram");
+	corr = jsonGetField(data, "correlation");
+
+	statdata->nullfrac = jsonGetFloat4(nullf, 0);
+	statdata->distinct = jsonGetFloat4(dist, 0);
+	statdata->width = (int32) jsonGetFloat4(width, 0);
+
+	statdata->nullfrac += (1.0 - statdata->nullfrac) * nullfrac;
+
+	/* Include MCV slot if exists */
+	if (DatumGetPointer(mcv))
+	{
+		slot->kind = STATISTIC_KIND_MCV;
+		slot->opid = eqop;
+		slot->numbers = jsonStatsConvertArray(jsonGetField(mcv, "numbers"),
+											  JsonStatFloat4, FLOAT4OID,
+											  1.0 - nullfrac);
+		slot->values  = jsonStatsConvertArray(jsonGetField(mcv, "values"),
+											  stattype, type, 0);
+		slot++;
+	}
+
+	/* Include Histogram slot if exists */
+	if (DatumGetPointer(hst))
+	{
+		slot->kind = STATISTIC_KIND_HISTOGRAM;
+		slot->opid = ltop;
+		slot->numbers = jsonStatsConvertArray(jsonGetField(hst, "numbers"),
+											  JsonStatFloat4, FLOAT4OID, 1.0);
+		slot->values  = jsonStatsConvertArray(jsonGetField(hst, "values"),
+											  stattype, type, 0);
+		slot++;
+	}
+
+	/* Include Correlation slot if exists */
+	if (DatumGetPointer(corr))
+	{
+		Datum		correlation = Float4GetDatum(jsonGetFloat4(corr, 0));
+
+		slot->kind = STATISTIC_KIND_CORRELATION;
+		slot->opid = ltop;
+		slot->numbers = PointerGetDatum(construct_array(&correlation, 1,
+														FLOAT4OID, 4, true,
+														'i'));
+		slot++;
+	}
+
+	/* Include JSON statistics for a given path and possibly for its subpaths */
+	if ((stattype == JsonStatJsonb ||
+		 stattype == JsonStatJsonbWithoutSubpaths) &&
+		jsonAnalyzeBuildSubPathsData(pstats->data->pathdatums,
+									 pstats->data->npaths,
+									 pstats->datum - pstats->data->pathdatums,
+									 pstats->path,
+									 pstats->pathlen,
+									 stattype == JsonStatJsonb,
+									 nullfrac,
+									 &slot->values,
+									 &slot->numbers))
+	{
+		slot->kind = STATISTIC_KIND_JSON;
+		slot++;
+	}
+
+	return true;
+}
+
+static float4
+jsonPathStatsGetFloat(JsonPathStats pstats, const char *key, float4 defaultval)
+{
+	if (!pstats)
+		return defaultval;
+
+	return jsonGetFloat4(jsonGetField(*pstats->datum, key), defaultval);
+}
+
+float4
+jsonPathStatsGetFreq(JsonPathStats pstats, float4 defaultfreq)
+{
+	return jsonPathStatsGetFloat(pstats, "freq", defaultfreq);
+}
+
+float4
+jsonPathStatsGetAvgArraySize(JsonPathStats pstats)
+{
+	return jsonPathStatsGetFloat(pstats, "avg_array_length", 1.0);
+}
+
+/*
+ * jsonPathStatsGetTypeFreq
+ *		Get frequency of different JSON object types for a given path.
+ *
+ * JSON documents don't have any particular schema, and the same path may point
+ * to values with different types in multiple documents. Consider for example
+ * two documents {"a" : "b"} and {"a" : 100} which have both a string and int
+ * for the same path. So we track the frequency of different JSON types for
+ * each path, so that we can consider this later.
+ */
+float4
+jsonPathStatsGetTypeFreq(JsonPathStats pstats, JsonbValueType type,
+						 float4 defaultfreq)
+{
+	const char *key;
+
+	if (!pstats)
+		return defaultfreq;
+
+	/*
+	 * When dealing with (object/array) length stats, we only really care about
+	 * objects and arrays.
+	 *
+	 * Lengths are always numeric, so simply return 0 if requested frequency
+	 * of non-numeric values.
+	 */
+	if (pstats->type == JsonPathStatsArrayLength)
+	{
+		if (type != jbvNumeric)
+			return 0.0;
+
+		return jsonPathStatsGetFloat(pstats, "freq_array", defaultfreq);
+	}
+
+	if (pstats->type == JsonPathStatsObjectLength)
+	{
+		if (type != jbvNumeric)
+			return 0.0;
+
+		return jsonPathStatsGetFloat(pstats, "freq_object", defaultfreq);
+	}
+
+	/* Which JSON type are we interested in? Pick the right freq_type key. */
+	switch (type)
+	{
+		case jbvNull:
+			key = "freq_null";
+			break;
+		case jbvString:
+			key = "freq_string";
+			break;
+		case jbvNumeric:
+			key = "freq_numeric";
+			break;
+		case jbvBool:
+			key = "freq_boolean";
+			break;
+		case jbvObject:
+			key = "freq_object";
+			break;
+		case jbvArray:
+			key = "freq_array";
+			break;
+		default:
+			elog(ERROR, "Invalid jsonb value type: %d", type);
+			break;
+	}
+
+	return jsonPathStatsGetFloat(pstats, key, defaultfreq);
+}
+
+/*
+ * jsonPathStatsFormTuple
+ *		For a pg_statistic tuple representing JSON statistics.
+ *
+ * XXX Maybe it's a bit expensive to first build StatsData and then transform it
+ * again while building the tuple. Could it be done in a single step? Would it be
+ * more efficient? Not sure how expensive it actually is, though.
+ */
+static HeapTuple
+jsonPathStatsFormTuple(JsonPathStats pstats, JsonStatType type, float4 nullfrac)
+{
+	StatsData	statdata;
+
+	if (!pstats || !pstats->datum)
+		return NULL;
+
+	/*
+	 * If it is the ordinary root path stats, there is no need to transform
+	 * the tuple, it can be simply copied.
+	 */
+	if (pstats->datum == &pstats->data->pathdatums[0] &&
+		pstats->type == JsonPathStatsValues)
+		return heap_copytuple(pstats->data->statsTuple);
+
+	MemSet(&statdata, 0, sizeof(statdata));
+
+	if (!jsonPathStatsExtractData(pstats, type, nullfrac, &statdata))
+		return NULL;
+
+	return stats_form_tuple(&statdata);
+}
+
+/*
+ * jsonStatsGetPathTuple
+ *		Extract JSON statistics for a text[] path and form pg_statistics tuple.
+ */
+static HeapTuple
+jsonStatsGetPathTuple(JsonStats jsdata, JsonStatType type,
+					  Datum *path, int pathlen, bool try_arrays_indexes)
+{
+	float4			nullfrac;
+	JsonPathStats	pstats = jsonStatsGetPath(jsdata, path, pathlen,
+											  try_arrays_indexes, &nullfrac);
+
+	return jsonPathStatsFormTuple(pstats, type, nullfrac);
+}
+
+/*
+ * jsonStatsGetArrayIndexStatsTuple
+ *		Extract JSON statistics for a array index and form pg_statistics tuple.
+ */
+static HeapTuple
+jsonStatsGetArrayIndexStatsTuple(JsonStats jsdata, JsonStatType type, int32 index)
+{
+	/* Extract statistics for root array elements */
+	JsonPathStats arrstats = jsonStatsGetRootArrayPath(jsdata);
+	JsonPathStats rootstats;
+	Selectivity	index_sel;
+
+	if (!arrstats)
+		return NULL;
+
+	/* Compute relative selectivity of 'EXISTS($[index])' */
+	rootstats = jsonStatsGetRootPath(jsdata);
+	index_sel = jsonPathStatsGetArrayIndexSelectivity(rootstats, index);
+	index_sel /= jsonPathStatsGetFreq(arrstats, 0.0);
+
+	/* Form pg_statistics tuple, taking into account array index selectivity */
+	return jsonPathStatsFormTuple(arrstats, type, 1.0 - index_sel);
+}
+
+/*
+ * jsonStatsGetPathFreq
+ *		Return frequency of a path (fraction of documents containing it).
+ */
+static float4
+jsonStatsGetPathFreq(JsonStats jsdata, Datum *path, int pathlen,
+					 bool try_array_indexes)
+{
+	float4		nullfrac;
+	JsonPathStats pstats = jsonStatsGetPath(jsdata, path, pathlen,
+											try_array_indexes, &nullfrac);
+	float4		freq = (1.0 - nullfrac) * jsonPathStatsGetFreq(pstats, 0.0);
+
+	CLAMP_PROBABILITY(freq);
+	return freq;
+}
+
+/*
+ * jsonbStatsVarOpConst
+ *		Prepare optimizer statistics for a given operator, from JSON stats.
+ *
+ * This handles only OpExpr expressions, with variable and a constant. We get
+ * the constant as is, and the variable is represented by statistics fetched
+ * by get_restriction_variable().
+ *
+ * opid    - OID of the operator (input parameter)
+ * resdata - pointer to calculated statistics for result of operator
+ * vardata - statistics for the restriction variable
+ * cnst    - constant from the operator expression
+ *
+ * Returns true when useful optimizer statistics have been calculated.
+ */
+static bool
+jsonbStatsVarOpConst(Oid opid, VariableStatData *resdata,
+					 const VariableStatData *vardata, Const *cnst)
+{
+	JsonStatData jsdata;
+	JsonStatType statype = JsonStatJsonb;
+
+	if (!jsonStatsInit(&jsdata, vardata))
+		return false;
+
+	switch (opid)
+	{
+		case JsonbObjectFieldTextOperator:
+			statype = JsonStatText;
+			/* FALLTHROUGH */
+		case JsonbObjectFieldOperator:
+		{
+			if (cnst->consttype != TEXTOID)
+			{
+				jsonStatsRelease(&jsdata);
+				return false;
+			}
+
+			resdata->statsTuple = jsonStatsGetPathTuple(&jsdata, statype,
+														&cnst->constvalue, 1,
+														false);
+			break;
+		}
+
+		case JsonbArrayElementTextOperator:
+			statype = JsonStatText;
+			/* FALLTHROUGH */
+		case JsonbArrayElementOperator:
+		{
+			if (cnst->consttype != INT4OID)
+			{
+				jsonStatsRelease(&jsdata);
+				return false;
+			}
+
+			resdata->statsTuple =
+				jsonStatsGetArrayIndexStatsTuple(&jsdata, statype,
+												 DatumGetInt32(cnst->constvalue));
+			break;
+		}
+
+		case JsonbExtractPathTextOperator:
+			statype = JsonStatText;
+			/* FALLTHROUGH */
+		case JsonbExtractPathOperator:
+		{
+			Datum	   *path;
+			bool	   *nulls;
+			int			pathlen;
+			bool		have_nulls = false;
+
+			if (cnst->consttype != TEXTARRAYOID)
+			{
+				jsonStatsRelease(&jsdata);
+				return false;
+			}
+
+			deconstruct_array(DatumGetArrayTypeP(cnst->constvalue), TEXTOID,
+							  -1, false, 'i', &path, &nulls, &pathlen);
+
+			for (int i = 0; i < pathlen; i++)
+			{
+				if (nulls[i])
+				{
+					have_nulls = true;
+					break;
+				}
+			}
+
+			if (!have_nulls)
+				resdata->statsTuple = jsonStatsGetPathTuple(&jsdata, statype,
+															path, pathlen,
+															true);
+
+			pfree(path);
+			pfree(nulls);
+			break;
+		}
+
+		default:
+			jsonStatsRelease(&jsdata);
+			return false;
+	}
+
+	if (!resdata->statsTuple)
+		resdata->statsTuple = stats_form_tuple(NULL);	/* form all-NULL tuple */
+
+	resdata->acl_ok = vardata->acl_ok;
+	resdata->freefunc = heap_freetuple;
+	Assert(resdata->rel == vardata->rel);
+	Assert(resdata->atttype ==
+		(statype == JsonStatJsonb ? JSONBOID :
+		 statype == JsonStatText ? TEXTOID :
+		 /* statype == JsonStatFreq */ BOOLOID));
+
+	jsonStatsRelease(&jsdata);
+	return true;
+}
+
+/*
+ * jsonb_stats
+ *		Statistics estimation procedure for JSONB data type.
+ *
+ * This only supports OpExpr expressions, with (Var op Const) shape.
+ *
+ * Var really can be a chain of OpExprs with derived statistics
+ * (jsonb_column -> 'key1' -> key2'), because get_restriction_variable()
+ * already handles this case.
+ */
+Datum
+jsonb_stats(PG_FUNCTION_ARGS)
+{
+	PlannerInfo *root = (PlannerInfo *) PG_GETARG_POINTER(0);
+	OpExpr	   *opexpr = (OpExpr *) PG_GETARG_POINTER(1);
+	int			varRelid = PG_GETARG_INT32(2);
+	VariableStatData *resdata	= (VariableStatData *) PG_GETARG_POINTER(3);
+	VariableStatData vardata;
+	Node	   *constexpr;
+	bool		varonleft;
+
+	/* should only be called for OpExpr expressions */
+	Assert(IsA(opexpr, OpExpr));
+
+	/* Is the expression simple enough? (Var op Const) or similar? */
+	if (!get_restriction_variable(root, opexpr->args, varRelid,
+								  &vardata, &constexpr, &varonleft))
+		PG_RETURN_VOID();
+
+	/* XXX Could we also get varonleft=false in useful cases? */
+	if (IsA(constexpr, Const) && varonleft)
+		jsonbStatsVarOpConst(opexpr->opno, resdata, &vardata,
+							 (Const *) constexpr);
+
+	ReleaseVariableStats(vardata);
+
+	PG_RETURN_VOID();
+}
+
+/*
+ * jsonSelectivity
+ *		Use JSON statistics to estimate selectivity for (in)equalities.
+ *
+ * The statistics is represented as (arrays of) JSON values etc. so we
+ * need to pass the right operators to the functions.
+ */
+Selectivity
+jsonSelectivity(JsonPathStats stats, Datum scalar, Oid operator)
+{
+	VariableStatData vardata;
+	Selectivity sel;
+
+	if (!stats)
+		return 0.0;
+
+	vardata.atttype = JSONBOID;
+	vardata.atttypmod = -1;
+	vardata.isunique = false;
+	vardata.rel = stats->data->rel;
+	vardata.var = NULL;
+	vardata.vartype = JSONBOID;
+	vardata.acl_ok = stats->data->acl_ok;
+	vardata.statsTuple = jsonPathStatsFormTuple(stats,
+												JsonStatJsonbWithoutSubpaths, 0.0);
+
+	if (operator == JsonbEqOperator)
+		sel = var_eq_const(&vardata, operator, InvalidOid, scalar, false, true, false);
+	else
+		sel = scalarineqsel(NULL, operator,
+							/* is it greater or greater-or-equal? */
+							operator == JsonbGtOperator ||
+							operator == JsonbGeOperator,
+							/* is it equality? */
+							operator == JsonbLeOperator ||
+							operator == JsonbGeOperator,
+							InvalidOid,
+							&vardata, scalar, JSONBOID);
+
+	if (vardata.statsTuple)
+		heap_freetuple(vardata.statsTuple);
+
+	return sel;
+}
+
+/*
+ * jsonAccumulateSubPathSelectivity
+ *		Transform absolute subpath selectivity into relative and accumulate it
+ *		into parent path simply by multiplication of relative selectivities.
+ */
+static void
+jsonAccumulateSubPathSelectivity(Selectivity subpath_abs_sel,
+								 Selectivity path_freq,
+								 Selectivity *path_relative_sel,
+								 JsonPathStats array_path_stats)
+{
+	Selectivity sel = subpath_abs_sel / path_freq;	/* relative selectivity */
+
+	/* XXX Try to take into account array length */
+	if (array_path_stats)
+		sel = 1.0 - pow(1.0 - sel,
+						jsonPathStatsGetAvgArraySize(array_path_stats));
+
+	/* Accumulate selectivity of subpath into parent path */
+	*path_relative_sel *= sel;
+}
+
+/*
+ * jsonSelectivityContains
+ *		Estimate selectivity for containment operator on JSON.
+ *
+ * Iterate through query jsonb elements, build paths to its leaf elements,
+ * calculate selectivies of 'path == scalar' in leaves, multiply relative
+ * selectivities of subpaths at each path level, propagate computed
+ * selectivities to the root.
+ */
+static Selectivity
+jsonSelectivityContains(JsonStats stats, Jsonb *jb)
+{
+	JsonbValue		v;
+	JsonbIterator  *it;
+	JsonbIteratorToken r;
+	StringInfoData	pathstr;	/* path string */
+	struct Path					/* path stack entry */
+	{
+		struct Path *parent;	/* parent entry */
+		int			len;		/* associated length of pathstr */
+		Selectivity	freq;		/* absolute frequence of path */
+		Selectivity	sel;		/* relative selectivity of subpaths */
+		JsonPathStats stats;	/* statistics for the path */
+		bool		is_array_accesor;	/* is it '[*]' ? */
+	}			root,			/* root path entry */
+			   *path = &root;	/* path entry stack */
+	Selectivity	sel;			/* resulting selectivity */
+	Selectivity	scalarSel;		/* selectivity of 'jsonb == scalar' */
+
+	/* Initialize root path string */
+	initStringInfo(&pathstr);
+	appendBinaryStringInfo(&pathstr, stats->prefix, stats->prefixlen);
+
+	/* Initialize root path entry */
+	root.parent = NULL;
+	root.len = pathstr.len;
+	root.stats = jsonStatsFindPath(stats, pathstr.data, pathstr.len);
+	root.freq = jsonPathStatsGetFreq(root.stats, 0.0);
+	root.sel = 1.0;
+	root.is_array_accesor = pathstr.data[pathstr.len - 1] == ']';
+
+	/* Return 0, if NULL fraction is 1. */
+	if (root.freq <= 0.0)
+		return 0.0;
+
+	/*
+	 * Selectivity of query 'jsonb @> scalar' consists of  selectivities of
+	 * 'jsonb == scalar' and 'jsonb[*] == scalar'.  Selectivity of
+	 * 'jsonb[*] == scalar' will be computed in root.sel, but for
+	 * 'jsonb == scalar' we need additional computation.
+	 */
+	if (JsonContainerIsScalar(&jb->root))
+		scalarSel = jsonSelectivity(root.stats, JsonbPGetDatum(jb),
+									JsonbEqOperator);
+	else
+		scalarSel = 0.0;
+
+	it = JsonbIteratorInit(&jb->root);
+
+	while ((r = JsonbIteratorNext(&it, &v, false)) != WJB_DONE)
+	{
+		switch (r)
+		{
+			case WJB_BEGIN_OBJECT:
+			{
+				struct Path *p;
+				Selectivity freq =
+					jsonPathStatsGetTypeFreq(path->stats, jbvObject, 0.0);
+
+				/* If there are no objects, selectivity is 0. */
+				if (freq <= 0.0)
+					return 0.0;
+
+				/*
+				 * Push path entry for object keys, actual key names are
+				 * appended later in WJB_KEY case.
+				 */
+				p = palloc(sizeof(*p));
+				p->len = pathstr.len;
+				p->parent = path;
+				p->stats = NULL;
+				p->freq = freq;
+				p->sel = 1.0;
+				p->is_array_accesor = false;
+				path = p;
+				break;
+			}
+
+			case WJB_BEGIN_ARRAY:
+			{
+				struct Path *p;
+				JsonPathStats pstats;
+				Selectivity freq;
+
+				/*
+				 * First, find stats for the parent path if needed, it will be
+				 * used in jsonAccumulateSubPathSelectivity().
+				 */
+				if (!path->stats)
+					path->stats = jsonStatsFindPath(stats, pathstr.data,
+													pathstr.len);
+
+				/* Appeend path string entry for array elements, get stats. */
+				jsonPathAppendEntry(&pathstr, NULL);
+				pstats = jsonStatsFindPath(stats, pathstr.data, pathstr.len);
+				freq = jsonPathStatsGetFreq(pstats, 0.0);
+
+				/* If there are no arrays, return 0 or scalar selectivity */
+				if (freq <= 0.0)
+					return scalarSel;
+
+				/* Push path entry for array elements. */
+				p = palloc(sizeof(*p));
+				p->len = pathstr.len;
+				p->parent = path;
+				p->stats = pstats;
+				p->freq = freq;
+				p->sel = 1.0;
+				p->is_array_accesor = true;
+				path = p;
+				break;
+			}
+
+			case WJB_END_OBJECT:
+			case WJB_END_ARRAY:
+			{
+				struct Path *p = path;
+				/* Absoulte selectivity of the path with its all subpaths */
+				Selectivity abs_sel = p->sel * p->freq;
+
+				/* Pop last path entry */
+				path = path->parent;
+				pfree(p);
+				pathstr.len = path->len;
+				pathstr.data[pathstr.len] = '\0';
+
+				/* Accumulate selectivity into parent path */
+				jsonAccumulateSubPathSelectivity(abs_sel, path->freq,
+												 &path->sel,
+												 path->is_array_accesor ?
+												 path->parent->stats : NULL);
+				break;
+			}
+
+			case WJB_KEY:
+			{
+				/* Remove previous key in the path string */
+				pathstr.len = path->parent->len;
+				pathstr.data[pathstr.len] = '\0';
+
+				/* Append current key to path string */
+				jsonPathAppendEntryWithLen(&pathstr, v.val.string.val,
+										   v.val.string.len);
+				path->len = pathstr.len;
+				break;
+			}
+
+			case WJB_VALUE:
+			case WJB_ELEM:
+			{
+				/*
+				 * Extract statistics for a path.  Array elements share the
+				 * same statistics that was extracted in WJB_BEGIN_ARRAY.
+				 */
+				JsonPathStats pstats = r == WJB_ELEM ? path->stats :
+					jsonStatsFindPath(stats, pathstr.data, pathstr.len);
+				Selectivity abs_sel;	/* Absolute selectivity of 'path == scalar' */
+
+				if (pstats)
+				{
+					/* Make scalar jsonb datum and compute selectivity */
+					Datum		scalar = JsonbPGetDatum(JsonbValueToJsonb(&v));
+
+					abs_sel = jsonSelectivity(pstats, scalar, JsonbEqOperator);
+				}
+				else
+					abs_sel = 0.0;
+
+				/* Accumulate selectivity into parent path */
+				jsonAccumulateSubPathSelectivity(abs_sel, path->freq,
+												 &path->sel,
+												 path->is_array_accesor ?
+												 path->parent->stats : NULL);
+				break;
+			}
+
+			default:
+				break;
+		}
+	}
+
+	/* Compute absolute selectivity for root, including raw scalar case. */
+	sel = root.sel * root.freq + scalarSel;
+	CLAMP_PROBABILITY(sel);
+	return sel;
+}
+
+/*
+ * jsonSelectivityExists
+ *		Estimate selectivity for JSON "exists" operator.
+ */
+static Selectivity
+jsonSelectivityExists(JsonStats stats, Datum key)
+{
+	JsonPathStats rootstats;
+	JsonPathStats arrstats;
+	JsonbValue	jbvkey;
+	Datum		jbkey;
+	Selectivity keysel;
+	Selectivity scalarsel;
+	Selectivity arraysel;
+	Selectivity sel;
+
+	JsonValueInitStringWithLen(&jbvkey,
+							   VARDATA_ANY(key), VARSIZE_ANY_EXHDR(key));
+
+	jbkey = JsonbPGetDatum(JsonbValueToJsonb(&jbvkey));
+
+	keysel = jsonStatsGetPathFreq(stats, &key, 1, false);
+
+	rootstats = jsonStatsGetRootPath(stats);
+	scalarsel = jsonSelectivity(rootstats, jbkey, JsonbEqOperator);
+
+	arrstats = jsonStatsGetRootArrayPath(stats);
+	arraysel = jsonSelectivity(arrstats, jbkey, JsonbEqOperator);
+	arraysel = 1.0 - pow(1.0 - arraysel,
+						 jsonPathStatsGetAvgArraySize(rootstats));
+
+	sel = keysel + scalarsel + arraysel;
+	CLAMP_PROBABILITY(sel);
+	return sel;
+}
+
+static Selectivity
+jsonb_sel_internal(JsonStats stats, Oid operator, Const *cnst, bool varonleft)
+{
+	switch (operator)
+	{
+		case JsonbExistsOperator:
+			if (!varonleft || cnst->consttype != TEXTOID)
+				break;
+
+			return jsonSelectivityExists(stats, cnst->constvalue);
+
+		case JsonbExistsAnyOperator:
+		case JsonbExistsAllOperator:
+		{
+			Datum	   *keys;
+			bool	   *nulls;
+			Selectivity	freq = 1.0;
+			int			nkeys;
+			int			i;
+			bool		all = operator == JsonbExistsAllOperator;
+
+			if (!varonleft || cnst->consttype != TEXTARRAYOID)
+				break;
+
+			deconstruct_array(DatumGetArrayTypeP(cnst->constvalue), TEXTOID,
+							  -1, false, 'i', &keys, &nulls, &nkeys);
+
+			for (i = 0; i < nkeys; i++)
+				if (!nulls[i])
+				{
+					Selectivity pathfreq = jsonSelectivityExists(stats,
+																 keys[i]);
+					freq *= all ? pathfreq : (1.0 - pathfreq);
+				}
+
+			pfree(keys);
+			pfree(nulls);
+
+			if (!all)
+				freq = 1.0 - freq;
+
+			return freq;
+		}
+
+		case JsonbContainedOperator:
+			if (varonleft || cnst->consttype != JSONBOID)
+				break;
+
+			return jsonSelectivityContains(stats,
+										   DatumGetJsonbP(cnst->constvalue));
+
+		case JsonbContainsOperator:
+			if (!varonleft || cnst->consttype != JSONBOID)
+				break;
+
+			return jsonSelectivityContains(stats,
+										   DatumGetJsonbP(cnst->constvalue));
+
+		default:
+			break;
+	}
+
+	return DEFAULT_JSON_CONTAINS_SEL;
+}
+
+/*
+ * jsonb_sel
+ *		The main procedure estimating selectivity for all JSONB operators.
+ */
+Datum
+jsonb_sel(PG_FUNCTION_ARGS)
+{
+	PlannerInfo *root = (PlannerInfo *) PG_GETARG_POINTER(0);
+	Oid			operator = PG_GETARG_OID(1);
+	List	   *args = (List *) PG_GETARG_POINTER(2);
+	int			varRelid = PG_GETARG_INT32(3);
+	double		sel = DEFAULT_JSON_CONTAINS_SEL;
+	Node	   *other;
+	bool		varonleft;
+	VariableStatData vardata;
+
+	if (get_restriction_variable(root, args, varRelid,
+								  &vardata, &other, &varonleft))
+	{
+		if (IsA(other, Const))
+		{
+			Const	   *cnst = (Const *) other;
+
+			if (cnst->constisnull)
+				sel = 0.0;
+			else
+			{
+				JsonStatData stats;
+
+				if (jsonStatsInit(&stats, &vardata))
+				{
+					sel = jsonb_sel_internal(&stats, operator, cnst, varonleft);
+					jsonStatsRelease(&stats);
+				}
+			}
+		}
+
+		ReleaseVariableStats(vardata);
+	}
+
+	PG_RETURN_FLOAT8((float8) sel);
+}
diff --git a/src/backend/utils/adt/jsonb_typanalyze.c b/src/backend/utils/adt/jsonb_typanalyze.c
new file mode 100644
index 0000000..7882db2
--- /dev/null
+++ b/src/backend/utils/adt/jsonb_typanalyze.c
@@ -0,0 +1,1627 @@
+/*-------------------------------------------------------------------------
+ *
+ * jsonb_typanalyze.c
+ *	  Functions for gathering statistics from jsonb columns
+ *
+ * Copyright (c) 2016-2022, PostgreSQL Global Development Group
+ *
+ * Functions in this module are used to analyze contents of JSONB columns
+ * and build optimizer statistics. In principle we extract paths from all
+ * sampled documents and calculate the usual statistics (MCV, histogram)
+ * for each path - in principle each path is treated as a column.
+ *
+ * Because we're not enforcing any JSON schema, the documents may differ
+ * a lot - the documents may contain large number of different keys, the
+ * types of values may be entirely different, etc. This makes it more
+ * challenging than building stats for regular columns. For example not
+ * only do we need to decide which values to keep in the MCV, but also
+ * which paths to keep (in case the documents are so variable we can't
+ * keep all paths).
+ *
+ * The statistics is stored in pg_statistic, in a slot with a new stakind
+ * value (STATISTIC_KIND_JSON). The statistics is serialized as an array
+ * of JSONB values, eash element storing statistics for one path.
+ *
+ * For each path, we store the following keys:
+ *
+ * - path         - path this stats is for, serialized as jsonpath
+ * - freq         - frequency of documents containing this path
+ * - json         - the regular per-column stats (MCV, histogram, ...)
+ * - freq_null    - frequency of JSON null values
+ * - freq_array   - frequency of JSON array values
+ * - freq_object  - frequency of JSON object values
+ * - freq_string  - frequency of JSON string values
+ * - freq_numeric - frequency of JSON numeric values
+ *
+ * This is stored in the stavalues array.
+ *
+ * The first element of stavalues is a path prefix.  It is used for avoiding
+ * path transformations when the derived statistics for the chains of ->
+ * operators is computed.
+ *
+ * The per-column stats (stored in the "json" key) have additional internal
+ * structure, to allow storing multiple stakind types (histogram, mcv). See
+ * jsonAnalyzeMakeScalarStats for details.
+ *
+ *
+ * XXX It's a bit weird the "regular" stats are stored in the "json" key,
+ * while the JSON stats (frequencies of different JSON types) are right
+ * at the top level.
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/adt/jsonb_typanalyze.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "fmgr.h"
+#include "access/hash.h"
+#include "access/detoast.h"
+#include "catalog/pg_attribute.h"
+#include "catalog/pg_operator.h"
+#include "catalog/pg_type.h"
+#include "commands/vacuum.h"
+#include "utils/builtins.h"
+#include "utils/guc.h"
+#include "utils/hsearch.h"
+#include "utils/json.h"
+#include "utils/jsonb.h"
+#include "utils/json_selfuncs.h"
+#include "utils/lsyscache.h"
+#include "utils/memutils.h"
+
+typedef struct JsonPathEntry JsonPathEntry;
+
+/*
+ * Element of a path in the JSON document (i.e. not jsonpath). Elements
+ * are linked together to build longer paths.
+ *
+ * 'entry' can be not zero-terminated when it is pointing to JSONB keys, so
+ * 'len' is necessary.  'len' is also used for faster entry comparison, to
+ * distinguish array entries ('len' == -1).
+ */
+typedef struct JsonPathEntry
+{
+	JsonPathEntry  *parent;
+	const char	   *entry;		/* element of the path as a string */
+	int				len;		/* length of entry string (may be 0 or -1) */
+	uint32			hash;		/* hash of the whole path (with parent) */
+	char	   *pathstr;		/* full path string */
+	int			depth;			/* nesting level, i.e. path length */
+} JsonPathEntry;
+
+#define JsonPathEntryIsArray(entry) ((entry)->len == -1)
+
+/*
+ * An array containing a dynamic number of values extracted from JSON documents.
+ * All values should have the same data type:
+ *		jsonb   - ordinary path stats, values of different JSON types
+ *		int32   - array/object length stats
+ *		text    - separate stats fro strings
+ *		numeric - separate stats fro numbers
+ */
+typedef struct JsonValues
+{
+	Datum	   *buf;
+	int			count;
+	int			allocated;
+} JsonValues;
+
+/*
+ * Scalar statistics built for an array of values, extracted from a JSON
+ * document (for one particular path).
+ */
+typedef struct JsonScalarStats
+{
+	JsonValues		values;
+	VacAttrStats	stats;
+} JsonScalarStats;
+
+/*
+ * Statistics calculated for a set of values.
+ *
+ *
+ * XXX This seems rather complicated and needs simplification. We're not
+ * really using all the various JsonScalarStats bits, there's a lot of
+ * duplication (e.g. each JsonScalarStats contains it's own array, which
+ * has a copy of data from the one in "jsons").
+ */
+typedef struct JsonValueStats
+{
+	JsonScalarStats	jsons;		/* stats for all JSON types together */
+
+#ifdef JSON_ANALYZE_SCALARS		/* XXX */
+	JsonScalarStats	strings;	/* stats for JSON strings */
+	JsonScalarStats	numerics;	/* stats for JSON numerics */
+#endif
+
+	JsonScalarStats	arrlens;	/* stats of array lengths */
+	JsonScalarStats	objlens;	/* stats of object lengths */
+
+	int				nnulls;		/* number of JSON null values */
+	int				ntrue;		/* number of JSON true values */
+	int				nfalse;		/* number of JSON false values */
+	int				nobjects;	/* number of JSON objects */
+	int				narrays;	/* number of JSON arrays */
+	int				nstrings;	/* number of JSON strings */
+	int				nnumerics;	/* number of JSON numerics */
+
+	int64			narrelems;	/* total number of array elements
+								 * (for avg. array length) */
+} JsonValueStats;
+
+typedef struct JsonPathDocBitmap
+{
+	bool		is_list;
+	int			size;
+	int			allocated;
+	union
+	{
+		int32	   *list;
+		uint8	   *bitmap;
+	}			data;
+} JsonPathDocBitmap;
+
+/* JSON path and list of documents containing it */
+typedef struct JsonPathAnlDocs
+{
+	JsonPathEntry path;
+	JsonPathDocBitmap bitmap;
+} JsonPathAnlDocs;
+
+/* Main structure for analyzed JSON path  */
+typedef struct JsonPathAnlStats
+{
+	JsonPathEntry path;
+	double		freq;		/* frequence of the path */
+	JsonValueStats vstats;	/* collected values and raw computed stats */
+	Jsonb	   *stats;		/* stats converted into jsonb form */
+} JsonPathAnlStats;
+
+/* Some parent path stats counters that used for frequency calculations */
+typedef struct JsonPathParentStats
+{
+	double		freq;
+	int			count;
+	int			narrays;
+} JsonPathParentStats;
+
+/* various bits needed while analyzing JSON */
+typedef struct JsonAnalyzeContext
+{
+	VacAttrStats		   *stats;
+	MemoryContext			mcxt;
+	AnalyzeAttrFetchFunc	fetchfunc;
+	HTAB				   *pathshash;
+	JsonPathAnlStats	   *root;
+	double					totalrows;
+	double					total_width;
+	int						samplerows;
+	int						current_rownum;
+	int						target;
+	int						null_cnt;
+	int						analyzed_cnt;
+	int						maxdepth;
+	bool					scalarsOnly;
+	bool					single_pass;
+} JsonAnalyzeContext;
+
+/*
+ * JsonPathMatch
+ *		Determine when two JSON paths (list of JsonPathEntry) match.
+ *
+ * Returned int instead of bool, because it is an implementation of
+ * HashCompareFunc.
+ */
+static int
+JsonPathEntryMatch(const void *key1, const void *key2, Size keysize)
+{
+	const JsonPathEntry *path1 = key1;
+	const JsonPathEntry *path2 = key2;
+
+	return path1->parent != path2->parent ||
+		   path1->len != path2->len ||
+		   (path1->len > 0 &&
+			strncmp(path1->entry, path2->entry, path1->len));
+}
+
+/*
+ * JsonPathHash
+ *		Calculate hash of the path entry.
+ *
+ * Parent hash should be already calculated.
+ */
+static uint32
+JsonPathEntryHash(const void *key, Size keysize)
+{
+	const JsonPathEntry	   *path = key;
+	uint32					hash = path->parent ? path->parent->hash : 0;
+
+	hash = (hash << 1) | (hash >> 31);
+	hash ^= path->len < 0 ? 0 :
+		DatumGetUInt32(hash_any((const unsigned char *) path->entry, path->len));
+
+	return hash;
+}
+
+static void
+jsonStatsBitmapInit(JsonPathDocBitmap *bitmap)
+{
+	memset(bitmap, 0, sizeof(*bitmap));
+	bitmap->is_list = true;
+}
+
+static void
+jsonStatsBitmapAdd(JsonAnalyzeContext *cxt, JsonPathDocBitmap *bitmap, int doc)
+{
+	/* Use more compact list representation if not too many bits set */
+	if (bitmap->is_list)
+	{
+		int		   *list = bitmap->data.list;
+
+#if 1	/* Enable list representation */
+		if (bitmap->size > 0 && list[bitmap->size - 1] == doc)
+			return;
+
+		if (bitmap->size < cxt->samplerows / sizeof(list[0]) / 8)
+		{
+			if (bitmap->size >= bitmap->allocated)
+			{
+				MemoryContext oldcxt = MemoryContextSwitchTo(cxt->mcxt);
+
+				if (bitmap->allocated)
+				{
+					bitmap->allocated *= 2;
+					list = repalloc(list, sizeof(list[0]) * bitmap->allocated);
+				}
+				else
+				{
+					bitmap->allocated = 8;
+					list = palloc(sizeof(list[0]) * bitmap->allocated);
+				}
+
+				bitmap->data.list = list;
+
+				MemoryContextSwitchTo(oldcxt);
+			}
+
+			list[bitmap->size++] = doc;
+			return;
+		}
+#endif
+		/* convert list to bitmap */
+		bitmap->allocated = (cxt->samplerows + 7) / 8;
+		bitmap->data.bitmap = MemoryContextAllocZero(cxt->mcxt, bitmap->allocated);
+		bitmap->is_list = false;
+
+		if (list)
+		{
+			for (int i = 0; i < bitmap->size; i++)
+			{
+				int			d = list[i];
+
+				bitmap->data.bitmap[d / 8] |= (1 << (d % 8));
+			}
+
+			pfree(list);
+		}
+	}
+
+	/* set bit in bitmap */
+	if (doc < cxt->samplerows &&
+		!(bitmap->data.bitmap[doc / 8] & (1 << (doc % 8))))
+	{
+		bitmap->data.bitmap[doc / 8] |= (1 << (doc % 8));
+		bitmap->size++;
+	}
+}
+
+static bool
+jsonStatsBitmapNext(JsonPathDocBitmap *bitmap, int *pbit)
+{
+	uint8	   *bmp = bitmap->data.bitmap;
+	uint8	   *pb;
+	uint8	   *pb_end = &bmp[bitmap->allocated];
+	int			bit = *pbit;
+
+	Assert(!bitmap->is_list);
+
+	if (bit < 0)
+	{
+		pb = bmp;
+		bit = 0;
+	}
+	else
+	{
+		++bit;
+		pb = &bmp[bit / 8];
+		bit %= 8;
+	}
+
+	for (; pb < pb_end; pb++, bit = 0)
+	{
+		uint8		b;
+
+		/* Skip zero bytes */
+		if (!bit)
+		{
+			while (!*pb)
+			{
+				if (++pb >= pb_end)
+					return false;
+			}
+		}
+
+		b = *pb;
+
+		/* Skip zero bits */
+		while (bit < 8 && !(b & (1 << bit)))
+			bit++;
+
+		if (bit >= 8)
+			continue;	/* Non-zero bit not found, go to next byte */
+
+		/* Output next non-zero bit */
+		*pbit = (pb - bmp) * 8 + bit;
+		return true;
+	}
+
+	return false;
+}
+
+static void
+jsonStatsAnlInit(JsonPathAnlStats *stats)
+{
+	/* initialize the stats counter for this path entry */
+	memset(&stats->vstats, 0, sizeof(JsonValueStats));
+	stats->stats = NULL;
+	stats->freq = 0.0;
+}
+
+/*
+ * jsonAnalyzeAddPath
+ *		Add an entry for a JSON path to the working list of statistics.
+ *
+ * Returns a pointer to JsonPathAnlStats (which might have already existed
+ * if the path was in earlier document), which can then be populated or
+ * updated.
+ */
+static inline JsonPathEntry *
+jsonAnalyzeAddPath(JsonAnalyzeContext *ctx, JsonPathEntry *parent,
+				   const char *entry, int len)
+{
+	JsonPathEntry path;
+	JsonPathEntry *stats;
+	bool		found;
+
+	/* Init path entry */
+	path.parent = parent;
+	path.entry = entry;
+	path.len = len;
+	path.hash = JsonPathEntryHash(&path, 0);
+
+	/* See if we already saw this path earlier. */
+	stats = hash_search_with_hash_value(ctx->pathshash, &path, path.hash,
+										HASH_ENTER, &found);
+
+	/*
+	 * Nope, it's the first time we see this path, so initialize all the
+	 * fields (path string, counters, ...).
+	 */
+	if (!found)
+	{
+		JsonPathEntry *parent = stats->parent;
+		const char *ppath = parent->pathstr;
+		StringInfoData si;
+		MemoryContext oldcxt;
+
+		oldcxt = MemoryContextSwitchTo(ctx->mcxt);
+
+		/* NULL entries are treated as wildcard array accessors "[*]" */
+		if (stats->entry)
+			/* Copy path entry name into the right MemoryContext */
+			stats->entry = pnstrdup(stats->entry, stats->len);
+
+		MemoryContextSwitchTo(oldcxt);
+
+		/* Initialze full path string */
+		initStringInfo(&si);
+		appendStringInfoString(&si, ppath);
+		jsonPathAppendEntry(&si, stats->entry);
+
+		MemoryContextSwitchTo(ctx->mcxt);
+		stats->pathstr = pstrdup(si.data);
+		MemoryContextSwitchTo(oldcxt);
+
+		pfree(si.data);
+
+		if (ctx->single_pass)
+			jsonStatsAnlInit((JsonPathAnlStats *) stats);
+		else
+			jsonStatsBitmapInit(&((JsonPathAnlDocs *) stats)->bitmap);
+
+		stats->depth = parent->depth + 1;
+
+		/* update maximal depth */
+		if (ctx->maxdepth < stats->depth)
+			ctx->maxdepth = stats->depth;
+	}
+
+	return stats;
+}
+
+/*
+ * JsonValuesAppend
+ *		Add a JSON value to the dynamic array (enlarge it if needed).
+ *
+ * XXX This is likely one of the problems - the documents may be pretty
+ * large, with a lot of different values for each path. At that point
+ * it's problematic to keep all of that in memory at once. So maybe we
+ * need to introduce some sort of compaction (e.g. we could try
+ * deduplicating the values), limit on size of the array or something.
+ */
+static inline void
+JsonValuesAppend(JsonValues *values, Datum value, int initialSize)
+{
+	if (values->count >= values->allocated)
+	{
+		if (values->allocated)
+		{
+			values->allocated = values->allocated * 2;
+			values->buf = repalloc(values->buf,
+									sizeof(values->buf[0]) * values->allocated);
+		}
+		else
+		{
+			values->allocated = initialSize;
+			values->buf = palloc(sizeof(values->buf[0]) * values->allocated);
+		}
+	}
+
+	values->buf[values->count++] = value;
+}
+
+/*
+ * jsonAnalyzeJsonValue
+ *		Process a value extracted from the document (for a given path).
+ */
+static inline void
+jsonAnalyzeJsonValue(JsonAnalyzeContext *ctx, JsonValueStats *vstats,
+					 JsonbValue *jv)
+{
+	JsonbValue *jbv;
+	JsonbValue	jbvtmp;
+	Jsonb	   *jb;
+	Datum		value;
+	MemoryContext oldcxt = NULL;
+
+	/* XXX if analyzing only scalar values, make containers empty */
+	if (ctx->scalarsOnly && jv->type == jbvBinary)
+	{
+		if (JsonContainerIsObject(jv->val.binary.data))
+			jbv = JsonValueInitObject(&jbvtmp, 0, 0);
+		else
+		{
+			Assert(JsonContainerIsArray(jv->val.binary.data));
+			jbv = JsonValueInitArray(&jbvtmp, 0, 0, false);
+		}
+	}
+	else
+		jbv = jv;
+
+	jb = JsonbValueToJsonb(jbv);
+
+	if (ctx->single_pass)
+	{
+		oldcxt = MemoryContextSwitchTo(ctx->stats->anl_context);
+		jb = memcpy(palloc(VARSIZE(jb)), jb, VARSIZE(jb));
+	}
+
+	/* always add it to the "global" JSON stats, shared by all types */
+	JsonValuesAppend(&vstats->jsons.values,
+					 JsonbPGetDatum(jb),
+					 ctx->target);
+
+	/* also update the type-specific counters */
+	switch (jv->type)
+	{
+		case jbvNull:
+			vstats->nnulls++;
+			break;
+
+		case jbvBool:
+			if (jv->val.boolean)
+				vstats->ntrue++;
+			else
+				vstats->nfalse++;
+			break;
+
+		case jbvString:
+			vstats->nstrings++;
+#ifdef JSON_ANALYZE_SCALARS
+			value = PointerGetDatum(
+						cstring_to_text_with_len(jv->val.string.val,
+												 jv->val.string.len));
+			JsonValuesAppend(&vstats->strings.values, value, ctx->target);
+#endif
+			break;
+
+		case jbvNumeric:
+			vstats->nnumerics++;
+#ifdef JSON_ANALYZE_SCALARS
+			value = PointerGetDatum(jv->val.numeric);
+			JsonValuesAppend(&vstats->numerics.values, value, ctx->target);
+#endif
+			break;
+
+		case jbvBinary:
+			if (JsonContainerIsObject(jv->val.binary.data))
+			{
+				uint32		size = JsonContainerSize(jv->val.binary.data);
+
+				value = DatumGetInt32(size);
+				vstats->nobjects++;
+				JsonValuesAppend(&vstats->objlens.values, value, ctx->target);
+			}
+			else if (JsonContainerIsArray(jv->val.binary.data))
+			{
+				uint32		size = JsonContainerSize(jv->val.binary.data);
+
+				value = DatumGetInt32(size);
+				vstats->narrays++;
+				JsonValuesAppend(&vstats->arrlens.values, value, ctx->target);
+				vstats->narrelems += size;
+			}
+			break;
+
+		default:
+			elog(ERROR, "invalid scalar json value type %d", jv->type);
+			break;
+	}
+
+	if (ctx->single_pass)
+		MemoryContextSwitchTo(oldcxt);
+}
+
+/*
+ * jsonAnalyzeCollectPaths
+ *		Parse the JSON document and collect all paths and their values.
+ */
+static void
+jsonAnalyzeCollectPaths(JsonAnalyzeContext *ctx, Jsonb *jb, void *param)
+{
+	JsonbValue			jv;
+	JsonbIterator	   *it;
+	JsonbIteratorToken	tok;
+	JsonPathEntry	   *stats = &ctx->root->path;
+	int					doc = ctx->current_rownum;
+	bool				collect_values = (bool)(intptr_t) param;
+	bool				scalar = false;
+
+	if (collect_values && !JB_ROOT_IS_SCALAR(jb))
+		jsonAnalyzeJsonValue(ctx, &((JsonPathAnlStats *) stats)->vstats,
+							 JsonValueInitBinary(&jv, jb));
+
+	it = JsonbIteratorInit(&jb->root);
+
+	while ((tok = JsonbIteratorNext(&it, &jv, true)) != WJB_DONE)
+	{
+		switch (tok)
+		{
+			case WJB_BEGIN_OBJECT:
+				/*
+				 * Read next token to see if the object is empty or not.
+				 * If not, make stats for the first key.  Subsequent WJB_KEYs
+				 * and WJB_END_OBJECT will expect that stats will be pointing
+				 * to the key of current object.
+				 */
+				tok = JsonbIteratorNext(&it, &jv, true);
+
+				if (tok == WJB_END_OBJECT)
+					/* Empty object, simply skip stats initialization. */
+					break;
+
+				if (tok != WJB_KEY)
+					elog(ERROR, "unexpected jsonb iterator token: %d", tok);
+
+				stats = jsonAnalyzeAddPath(ctx, stats,
+										   jv.val.string.val,
+										   jv.val.string.len);
+				break;
+
+			case WJB_BEGIN_ARRAY:
+				/* Make stats for non-scalar array and use it for all elements */
+				if (!(scalar = jv.val.array.rawScalar))
+					stats = jsonAnalyzeAddPath(ctx, stats, NULL, -1);
+				break;
+
+			case WJB_END_ARRAY:
+				if (scalar)
+					break;
+				/* FALLTHROUGH */
+			case WJB_END_OBJECT:
+				/* Reset to parent stats */
+				stats = stats->parent;
+				break;
+
+			case WJB_KEY:
+				/*
+				 * Stats should point to the previous key of current object,
+				 * use its parent path as a base path.
+				 */
+				stats = jsonAnalyzeAddPath(ctx, stats->parent,
+										   jv.val.string.val,
+										   jv.val.string.len);
+				break;
+
+			case WJB_VALUE:
+			case WJB_ELEM:
+				if (collect_values)
+					jsonAnalyzeJsonValue(ctx,
+										 &((JsonPathAnlStats *) stats)->vstats,
+										 &jv);
+				else if (stats != &ctx->root->path)
+					jsonStatsBitmapAdd(ctx,
+									   &((JsonPathAnlDocs *) stats)->bitmap,
+									   doc);
+
+				/*
+				 * Manually recurse into container by creating child iterator.
+				 * We use skipNested=true to give jsonAnalyzeJsonValue()
+				 * ability to access jbvBinary containers.
+				 */
+				if (jv.type == jbvBinary)
+				{
+					JsonbIterator *it2 = JsonbIteratorInit(jv.val.binary.data);
+
+					it2->parent = it;
+					it = it2;
+				}
+				break;
+
+			default:
+				break;
+		}
+	}
+}
+
+/*
+ * jsonAnalyzeCollectSubpath
+ *		Recursively extract trailing part of a path and collect its values.
+ */
+static void
+jsonAnalyzeCollectSubpath(JsonAnalyzeContext *ctx, JsonPathAnlStats *pstats,
+						  JsonbValue *jbv, JsonPathEntry **entries,
+						  int start_entry)
+{
+	JsonbValue	scalar;
+	int			i;
+
+	for (i = start_entry; i < pstats->path.depth; i++)
+	{
+		JsonPathEntry  *entry = entries[i];
+		JsonbContainer *jbc = jbv->val.binary.data;
+		JsonbValueType	type = jbv->type;
+
+		if (i > start_entry)
+			pfree(jbv);
+
+		if (type != jbvBinary)
+			return;
+
+		if (JsonPathEntryIsArray(entry))
+		{
+			JsonbIterator	   *it;
+			JsonbIteratorToken	r;
+			JsonbValue			elem;
+
+			if (!JsonContainerIsArray(jbc) || JsonContainerIsScalar(jbc))
+				return;
+
+			it = JsonbIteratorInit(jbc);
+
+			while ((r = JsonbIteratorNext(&it, &elem, true)) != WJB_DONE)
+			{
+				if (r == WJB_ELEM)
+					jsonAnalyzeCollectSubpath(ctx, pstats, &elem, entries, i + 1);
+			}
+
+			return;
+		}
+		else
+		{
+			if (!JsonContainerIsObject(jbc))
+				return;
+
+			jbv = findJsonbValueFromContainerLen(jbc, JB_FOBJECT,
+												 entry->entry, entry->len);
+
+			if (!jbv)
+				return;
+		}
+	}
+
+	if (i == start_entry &&
+		jbv->type == jbvBinary &&
+		JsonbExtractScalar(jbv->val.binary.data, &scalar))
+		jbv = &scalar;
+
+	jsonAnalyzeJsonValue(ctx, &pstats->vstats, jbv);
+
+	if (i > start_entry)
+		pfree(jbv);
+}
+
+/*
+ * jsonAnalyzeCollectPath
+ *		Extract a single path from JSON documents and collect its values.
+ */
+static void
+jsonAnalyzeCollectPath(JsonAnalyzeContext *ctx, Jsonb *jb, void *param)
+{
+	JsonPathAnlStats *pstats = (JsonPathAnlStats *) param;
+	JsonbValue	jbvtmp;
+	JsonbValue *jbv = JsonValueInitBinary(&jbvtmp, jb);
+	JsonPathEntry *path;
+	JsonPathEntry **entries;
+	int			i;
+
+	entries = palloc(sizeof(*entries) * pstats->path.depth);
+
+	/* Build entry array in direct order */
+	for (path = &pstats->path, i = pstats->path.depth - 1;
+		 path->parent && i >= 0;
+		 path = path->parent, i--)
+		entries[i] = path;
+
+	jsonAnalyzeCollectSubpath(ctx, pstats, jbv, entries, 0);
+
+	pfree(entries);
+}
+
+static Datum
+jsonAnalyzePathFetch(VacAttrStatsP stats, int rownum, bool *isnull)
+{
+	*isnull = false;
+	return stats->exprvals[rownum];
+}
+
+/*
+ * jsonAnalyzePathValues
+ *		Calculate per-column statistics for values for a single path.
+ *
+ * We have already accumulated all the values for the path, so we simply
+ * call the typanalyze function for the proper data type, and then
+ * compute_stats (which points to compute_scalar_stats or so).
+ */
+static void
+jsonAnalyzePathValues(JsonAnalyzeContext *ctx, JsonScalarStats *sstats,
+					  Oid typid, double freq, bool use_anl_context)
+{
+	JsonValues			   *values = &sstats->values;
+	VacAttrStats		   *stats = &sstats->stats;
+	FormData_pg_attribute	attr;
+	FormData_pg_type		type;
+	int						i;
+
+	if (!sstats->values.count)
+		return;
+
+	get_typlenbyvalalign(typid, &type.typlen, &type.typbyval, &type.typalign);
+
+	attr.attstattarget = ctx->target;
+
+	stats->attr = &attr;
+	stats->attrtypid = typid;
+	stats->attrtypmod = -1;
+	stats->attrtype = &type;
+	stats->anl_context = use_anl_context ? ctx->stats->anl_context : CurrentMemoryContext;
+
+	stats->exprvals = values->buf;
+
+	/*
+	 * The fields describing the stats->stavalues[n] element types default to
+	 * the type of the data being analyzed, but the type-specific typanalyze
+	 * function can change them if it wants to store something else.
+	 */
+	for (i = 0; i < STATISTIC_NUM_SLOTS; i++)
+	{
+		stats->statypid[i] = stats->attrtypid;
+		stats->statyplen[i] = stats->attrtype->typlen;
+		stats->statypbyval[i] = stats->attrtype->typbyval;
+		stats->statypalign[i] = stats->attrtype->typalign;
+	}
+
+	std_typanalyze(stats);
+
+	stats->compute_stats(stats, jsonAnalyzePathFetch,
+						 values->count,
+						 ctx->totalrows / ctx->samplerows * values->count);
+
+	/*
+	 * We've only kept the non-null values, so compute_stats will always
+	 * leave this as 1.0. But we have enough info to calculate the correct
+	 * value.
+	 */
+	stats->stanullfrac = (float4)(1.0 - freq);
+
+	/*
+	 * Similarly, we need to correct the MCV frequencies, becuse those are
+	 * also calculated only from the non-null values. All we need to do is
+	 * simply multiply that with the non-NULL frequency.
+	 */
+	for (i = 0; i < STATISTIC_NUM_SLOTS; i++)
+	{
+		if (stats->stakind[i] == STATISTIC_KIND_MCV)
+		{
+			int j;
+			for (j = 0; j < stats->numnumbers[i]; j++)
+				stats->stanumbers[i][j] *= freq;
+		}
+	}
+}
+
+/*
+ * jsonAnalyzeMakeScalarStats
+ *		Serialize scalar stats into a JSON representation.
+ *
+ * We simply produce a JSON document with a list of predefined keys:
+ *
+ * - nullfrac
+ * - distinct
+ * - width
+ * - correlation
+ * - mcv or histogram
+ *
+ * For the mcv / histogram, we store a nested values / numbers.
+ */
+static JsonbValue *
+jsonAnalyzeMakeScalarStats(JsonbParseState **ps, const char *name,
+							const VacAttrStats *stats)
+{
+	JsonbValue	val;
+	int			i;
+	int			j;
+
+	pushJsonbKey(ps, &val, name);
+
+	pushJsonbValue(ps, WJB_BEGIN_OBJECT, NULL);
+
+	pushJsonbKeyValueFloat(ps, &val, "nullfrac", stats->stanullfrac);
+	pushJsonbKeyValueFloat(ps, &val, "distinct", stats->stadistinct);
+	pushJsonbKeyValueInteger(ps, &val, "width", stats->stawidth);
+
+	for (i = 0; i < STATISTIC_NUM_SLOTS; i++)
+	{
+		if (!stats->stakind[i])
+			break;
+
+		switch (stats->stakind[i])
+		{
+			case STATISTIC_KIND_MCV:
+				pushJsonbKey(ps, &val, "mcv");
+				break;
+
+			case STATISTIC_KIND_HISTOGRAM:
+				pushJsonbKey(ps, &val, "histogram");
+				break;
+
+			case STATISTIC_KIND_CORRELATION:
+				pushJsonbKeyValueFloat(ps, &val, "correlation",
+									   stats->stanumbers[i][0]);
+				continue;
+
+			default:
+				elog(ERROR, "unexpected stakind %d", stats->stakind[i]);
+				break;
+		}
+
+		pushJsonbValue(ps, WJB_BEGIN_OBJECT, NULL);
+
+		if (stats->numvalues[i] > 0)
+		{
+			pushJsonbKey(ps, &val, "values");
+			pushJsonbValue(ps, WJB_BEGIN_ARRAY, NULL);
+			for (j = 0; j < stats->numvalues[i]; j++)
+			{
+				Datum v = stats->stavalues[i][j];
+				if (stats->attrtypid == JSONBOID)
+					pushJsonbElemBinary(ps, &val, DatumGetJsonbP(v));
+				else if (stats->attrtypid == TEXTOID)
+					pushJsonbElemText(ps, &val, DatumGetTextP(v));
+				else if (stats->attrtypid == NUMERICOID)
+					pushJsonbElemNumeric(ps, &val, DatumGetNumeric(v));
+				else if (stats->attrtypid == INT4OID)
+					pushJsonbElemInteger(ps, &val, DatumGetInt32(v));
+				else
+					elog(ERROR, "unexpected stat value type %d",
+						 stats->attrtypid);
+			}
+			pushJsonbValue(ps, WJB_END_ARRAY, NULL);
+		}
+
+		if (stats->numnumbers[i] > 0)
+		{
+			pushJsonbKey(ps, &val, "numbers");
+			pushJsonbValue(ps, WJB_BEGIN_ARRAY, NULL);
+			for (j = 0; j < stats->numnumbers[i]; j++)
+				pushJsonbElemFloat(ps, &val, stats->stanumbers[i][j]);
+			pushJsonbValue(ps, WJB_END_ARRAY, NULL);
+		}
+
+		pushJsonbValue(ps, WJB_END_OBJECT, NULL);
+	}
+
+	return pushJsonbValue(ps, WJB_END_OBJECT, NULL);
+}
+
+static void
+pushJsonbKeyValueFloatNonZero(JsonbParseState **ps, JsonbValue *jbv,
+							  const char *field, double val)
+{
+	if (val != 0.0)
+		pushJsonbKeyValueFloat(ps, jbv, field, val);
+}
+
+/*
+ * jsonAnalyzeBuildPathStats
+ *		Serialize statistics for a particular json path.
+ *
+ * This includes both the per-column stats (stored in "json" key) and the
+ * JSON specific stats (like frequencies of different object types).
+ */
+static Jsonb *
+jsonAnalyzeBuildPathStats(JsonPathAnlStats *pstats)
+{
+	const JsonValueStats *vstats = &pstats->vstats;
+	float4				freq = pstats->freq;
+	bool				fullstats = true;	/* pstats->path.parent != NULL */
+	JsonbValue			val;
+	JsonbValue		   *jbv;
+	JsonbParseState	   *ps = NULL;
+
+	pushJsonbValue(&ps, WJB_BEGIN_OBJECT, NULL);
+
+	pushJsonbKeyValueString(&ps, &val, "path", pstats->path.pathstr);
+
+	pushJsonbKeyValueFloat(&ps, &val, "freq", freq);
+
+	pushJsonbKeyValueFloatNonZero(&ps, &val, "freq_null",
+								  freq * vstats->nnulls /
+								  vstats->jsons.values.count);
+
+	pushJsonbKeyValueFloatNonZero(&ps, &val, "freq_boolean",
+								  freq * (vstats->nfalse + vstats->ntrue) /
+								  vstats->jsons.values.count);
+
+	pushJsonbKeyValueFloatNonZero(&ps, &val, "freq_string",
+								  freq * vstats->nstrings /
+								  vstats->jsons.values.count);
+
+	pushJsonbKeyValueFloatNonZero(&ps, &val, "freq_numeric",
+								  freq * vstats->nnumerics /
+								  vstats->jsons.values.count);
+
+	pushJsonbKeyValueFloatNonZero(&ps, &val, "freq_array",
+								  freq * vstats->narrays /
+								  vstats->jsons.values.count);
+
+	pushJsonbKeyValueFloatNonZero(&ps, &val, "freq_object",
+								  freq * vstats->nobjects /
+								  vstats->jsons.values.count);
+
+	/*
+	 * We keep array length stats here for queries like jsonpath '$.size() > 5'.
+	 * Object lengths stats can be useful for other query lanuages.
+	 */
+	if (vstats->arrlens.values.count)
+		jsonAnalyzeMakeScalarStats(&ps, "array_length", &vstats->arrlens.stats);
+
+	if (vstats->objlens.values.count)
+		jsonAnalyzeMakeScalarStats(&ps, "object_length", &vstats->objlens.stats);
+
+	if (vstats->narrays)
+		pushJsonbKeyValueFloat(&ps, &val, "avg_array_length",
+							   (float4) vstats->narrelems / vstats->narrays);
+
+	if (fullstats)
+	{
+#ifdef JSON_ANALYZE_SCALARS
+		jsonAnalyzeMakeScalarStats(&ps, "string", &vstats->strings.stats);
+		jsonAnalyzeMakeScalarStats(&ps, "numeric", &vstats->numerics.stats);
+#endif
+		jsonAnalyzeMakeScalarStats(&ps, "json", &vstats->jsons.stats);
+	}
+
+	jbv = pushJsonbValue(&ps, WJB_END_OBJECT, NULL);
+
+	return JsonbValueToJsonb(jbv);
+}
+
+/*
+ * jsonAnalyzeCalcPathFreq
+ *		Calculate path frequency, i.e. how many documents contain this path.
+ */
+static void
+jsonAnalyzeCalcPathFreq(JsonAnalyzeContext *ctx, JsonPathAnlStats *pstats,
+						JsonPathParentStats *parent)
+{
+	if (pstats->path.parent)
+	{
+		int			count = JsonPathEntryIsArray(&pstats->path)	?
+			parent->narrays : pstats->vstats.jsons.values.count;
+
+		pstats->freq = parent->freq * count / parent->count;
+
+		CLAMP_PROBABILITY(pstats->freq);
+	}
+	else
+		pstats->freq = (double) ctx->analyzed_cnt / ctx->samplerows;
+}
+
+/*
+ * jsonAnalyzePath
+ *		Build statistics for values accumulated for this path.
+ *
+ * We're done with accumulating values for this path, so calculate the
+ * statistics for the various arrays.
+ *
+ * XXX I wonder if we could introduce some simple heuristict on which
+ * paths to keep, similarly to what we do for MCV lists. For example a
+ * path that occurred just once is not very interesting, so we could
+ * decide to ignore it and not build the stats. Although that won't
+ * save much, because there'll be very few values accumulated.
+ */
+static Jsonb *
+jsonAnalyzePath(JsonAnalyzeContext *ctx, JsonPathAnlStats *pstats,
+				JsonPathParentStats *parent_stats)
+{
+	JsonValueStats	   *vstats = &pstats->vstats;
+	Jsonb			   *stats;
+
+	jsonAnalyzeCalcPathFreq(ctx, pstats, parent_stats);
+
+	/* values combining all object types */
+	jsonAnalyzePathValues(ctx, &vstats->jsons, JSONBOID, pstats->freq,
+						  /* store root stats in analyze context */
+						  !parent_stats);
+
+	/*
+	 * Lengths and array lengths.  We divide counts by the total number of json
+	 * values to compute correct nullfrac (i.e. not all jsons have lengths).
+	 */
+	jsonAnalyzePathValues(ctx, &vstats->arrlens, INT4OID,
+						  pstats->freq * vstats->arrlens.values.count /
+						  vstats->jsons.values.count, false);
+	jsonAnalyzePathValues(ctx, &vstats->objlens, INT4OID,
+						  pstats->freq * vstats->objlens.values.count /
+						  vstats->jsons.values.count, false);
+
+#ifdef JSON_ANALYZE_SCALARS
+	/* stats for values of string/numeric types only */
+	jsonAnalyzePathValues(ctx, &vstats->strings, TEXTOID, pstats->freq, false);
+	jsonAnalyzePathValues(ctx, &vstats->numerics, NUMERICOID, pstats->freq, false);
+#endif
+
+	/* Build jsonb with path stats */
+	stats = jsonAnalyzeBuildPathStats(pstats);
+
+	/* Copy stats to non-temporary context */
+	return memcpy(MemoryContextAlloc(ctx->stats->anl_context, VARSIZE(stats)),
+				  stats, VARSIZE(stats));
+}
+
+/*
+ * JsonPathStatsCompare
+ *		Compare two path stats (by path string).
+ *
+ * We store the stats sorted by path string, and this is the comparator.
+ */
+static int
+JsonPathStatsCompare(const void *pv1, const void *pv2)
+{
+	return strcmp((*((const JsonPathEntry **) pv1))->pathstr,
+				  (*((const JsonPathEntry **) pv2))->pathstr);
+}
+
+/*
+ * jsonAnalyzeSortPaths
+ *		Reads all stats stored in the hash table and sorts them.
+ */
+static JsonPathEntry **
+jsonAnalyzeSortPaths(JsonAnalyzeContext *ctx, int *p_npaths)
+{
+	HASH_SEQ_STATUS	hseq;
+	JsonPathEntry *path;
+	JsonPathEntry **paths;
+	int			npaths;
+
+	npaths = hash_get_num_entries(ctx->pathshash) + 1;
+	paths = MemoryContextAlloc(ctx->mcxt, sizeof(*paths) * npaths);
+
+	paths[0] = &ctx->root->path;
+
+	hash_seq_init(&hseq, ctx->pathshash);
+
+	for (int i = 1; (path = hash_seq_search(&hseq)) != NULL; i++)
+		paths[i] = path;
+
+	pg_qsort(paths, npaths, sizeof(*paths), JsonPathStatsCompare);
+
+	*p_npaths = npaths;
+	return paths;
+}
+
+/*
+ * jsonAnalyzeBuildPathStatsArray
+ *		Build jsonb datum array for path stats, that will be used as stavalues.
+ *
+ * The first element is a path prefix.
+ */
+static Datum *
+jsonAnalyzeBuildPathStatsArray(Jsonb **pstats, int npaths, int *nvals,
+							   const char *prefix, int prefixlen)
+{
+	Datum	   *values = palloc(sizeof(Datum) * (npaths + 1));
+	JsonbValue *jbvprefix = palloc(sizeof(JsonbValue));
+	int			i;
+
+	JsonValueInitStringWithLen(jbvprefix,
+							   memcpy(palloc(prefixlen), prefix, prefixlen),
+							   prefixlen);
+
+	values[0] = JsonbPGetDatum(JsonbValueToJsonb(jbvprefix));
+
+	for (i = 0; i < npaths; i++)
+		values[i + 1] = JsonbPGetDatum(pstats[i]);
+
+	*nvals = npaths + 1;
+
+	return values;
+}
+
+/*
+ * jsonAnalyzeMakeStats
+ *		Build stavalues jsonb array for the root path prefix.
+ */
+static Datum *
+jsonAnalyzeMakeStats(JsonAnalyzeContext *ctx, Jsonb **paths,
+					 int npaths, int *numvalues)
+{
+	Datum	   *values;
+	MemoryContext oldcxt = MemoryContextSwitchTo(ctx->stats->anl_context);
+
+	values = jsonAnalyzeBuildPathStatsArray(paths, npaths, numvalues,
+											JSON_PATH_ROOT, JSON_PATH_ROOT_LEN);
+
+	MemoryContextSwitchTo(oldcxt);
+
+	return values;
+}
+
+/*
+ * jsonAnalyzeBuildSubPathsData
+ *		Build statvalues and stanumbers arrays for the subset of paths starting
+ *		from a given prefix.
+ *
+ * pathsDatums[index] should point to the desired path.
+ */
+bool
+jsonAnalyzeBuildSubPathsData(Datum *pathsDatums, int npaths, int index,
+							 const char	*path, int pathlen,
+							 bool includeSubpaths, float4 nullfrac,
+							 Datum *pvals, Datum *pnums)
+{
+	Jsonb	  **pvalues = palloc(sizeof(*pvalues) * npaths);
+	Datum	   *values;
+	Datum		numbers[1];
+	JsonbValue	pathkey;
+	int			nsubpaths = 0;
+	int			nvalues;
+	int			i;
+
+	JsonValueInitStringWithLen(&pathkey, "path", 4);
+
+	for (i = index; i < npaths; i++)
+	{
+		/* Extract path name */
+		Jsonb	   *jb = DatumGetJsonbP(pathsDatums[i]);
+		JsonbValue *jbv = findJsonbValueFromContainer(&jb->root, JB_FOBJECT,
+													  &pathkey);
+
+		/* Check if path name starts with a given prefix */
+		if (!jbv || jbv->type != jbvString ||
+			jbv->val.string.len < pathlen ||
+			memcmp(jbv->val.string.val, path, pathlen))
+			break;
+
+		pfree(jbv);
+
+		/* Collect matching path */
+		pvalues[nsubpaths] = jb;
+
+		nsubpaths++;
+
+		/*
+		 * The path should go before its subpaths, so if subpaths are not
+		 * needed the loop is broken after the first matching path.
+		 */
+		if (!includeSubpaths)
+			break;
+	}
+
+	if (!nsubpaths)
+	{
+		pfree(pvalues);
+		return false;
+	}
+
+	/* Construct new array from the selected paths */
+	values = jsonAnalyzeBuildPathStatsArray(pvalues, nsubpaths, &nvalues,
+											path, pathlen);
+	*pvals = PointerGetDatum(construct_array(values, nvalues, JSONBOID, -1,
+											 false, 'i'));
+
+	pfree(pvalues);
+	pfree(values);
+
+	numbers[0] = Float4GetDatum(nullfrac);
+	*pnums = PointerGetDatum(construct_array(numbers, 1, FLOAT4OID, 4,
+											 true /*FLOAT4PASSBYVAL*/, 'i'));
+
+	return true;
+}
+
+/*
+ * jsonAnalyzeInit
+ *		Initialize the analyze context so that we can start adding paths.
+ */
+static void
+jsonAnalyzeInit(JsonAnalyzeContext *ctx, VacAttrStats *stats,
+				AnalyzeAttrFetchFunc fetchfunc,
+				int samplerows, double totalrows, bool single_pass)
+{
+	HASHCTL	hash_ctl;
+
+	memset(ctx, 0, sizeof(*ctx));
+
+	ctx->stats = stats;
+	ctx->fetchfunc = fetchfunc;
+	ctx->mcxt = CurrentMemoryContext;
+	ctx->samplerows = samplerows;
+	ctx->totalrows = totalrows;
+	ctx->target = stats->attr->attstattarget;
+	ctx->scalarsOnly = false;
+	ctx->single_pass = single_pass;
+
+	MemSet(&hash_ctl, 0, sizeof(hash_ctl));
+	hash_ctl.keysize = sizeof(JsonPathEntry);
+	hash_ctl.entrysize = ctx->single_pass ? sizeof(JsonPathAnlStats) : sizeof(JsonPathAnlDocs);
+	hash_ctl.hash = JsonPathEntryHash;
+	hash_ctl.match = JsonPathEntryMatch;
+	hash_ctl.hcxt = ctx->mcxt;
+
+	ctx->pathshash = hash_create("JSON analyze path table", 100, &hash_ctl,
+					HASH_ELEM | HASH_FUNCTION | HASH_COMPARE | HASH_CONTEXT);
+
+	ctx->root = MemoryContextAllocZero(ctx->mcxt, sizeof(JsonPathAnlStats));
+	ctx->root->path.pathstr = JSON_PATH_ROOT;
+}
+
+/*
+ * jsonAnalyzePass
+ *		One analysis pass over the JSON column.
+ *
+ * Performs one analysis pass on the JSON documents, and passes them to the
+ * custom analyzefunc.
+ */
+static void
+jsonAnalyzePass(JsonAnalyzeContext *ctx,
+				void (*analyzefunc)(JsonAnalyzeContext *, Jsonb *, void *),
+				void *analyzearg,
+				JsonPathDocBitmap *bitmap)
+{
+	MemoryContext	tmpcxt = AllocSetContextCreate(CurrentMemoryContext,
+												"Json Analyze Pass Context",
+												ALLOCSET_DEFAULT_MINSIZE,
+												ALLOCSET_DEFAULT_INITSIZE,
+												ALLOCSET_DEFAULT_MAXSIZE);
+	MemoryContext	oldcxt = MemoryContextSwitchTo(tmpcxt);
+	int			row_num = -1;
+
+	ctx->null_cnt = 0;
+	ctx->analyzed_cnt = 0;
+	ctx->total_width = 0;
+
+	/* Loop over the jsonbs. */
+	for (int i = 0; i < (bitmap ? bitmap->size : ctx->samplerows); i++)
+	{
+		Datum		value;
+		Jsonb	   *jb;
+		Size		width;
+		bool		isnull;
+
+		vacuum_delay_point();
+
+		if (bitmap)
+		{
+			if (bitmap->is_list)
+				row_num = bitmap->data.list[i];
+			else if (!jsonStatsBitmapNext(bitmap, &row_num))
+				break;
+		}
+		else
+			row_num = i;
+
+		value = ctx->fetchfunc(ctx->stats, row_num, &isnull);
+
+		if (isnull)
+		{
+			/* json is null, just count that */
+			ctx->null_cnt++;
+			continue;
+		}
+
+		width = toast_raw_datum_size(value);
+
+		ctx->total_width += VARSIZE_ANY(DatumGetPointer(value)); /* FIXME raw width? */
+
+		/* Skip too-large values. */
+#define JSON_WIDTH_THRESHOLD (100 * 1024)
+
+		if (width > JSON_WIDTH_THRESHOLD)
+			continue;
+
+		ctx->analyzed_cnt++;
+
+		jb = DatumGetJsonbP(value);
+
+		if (!ctx->single_pass)
+			MemoryContextSwitchTo(oldcxt);
+
+		ctx->current_rownum = row_num;
+		analyzefunc(ctx, jb, analyzearg);
+
+		if (!ctx->single_pass)
+			oldcxt = MemoryContextSwitchTo(tmpcxt);
+
+		MemoryContextReset(tmpcxt);
+	}
+
+	MemoryContextSwitchTo(oldcxt);
+}
+
+/*
+ * compute_json_stats() -- compute statistics for a json column
+ */
+static void
+compute_json_stats(VacAttrStats *stats, AnalyzeAttrFetchFunc fetchfunc,
+				   int samplerows, double totalrows)
+{
+	JsonAnalyzeContext	ctx;
+	JsonPathEntry **paths;
+	Jsonb	  **pstats;
+	int			npaths;
+	int			root_analyzed_cnt;
+	int			root_null_cnt;
+	double		root_total_width;
+
+	jsonAnalyzeInit(&ctx, stats, fetchfunc, samplerows, totalrows,
+					false /* FIXME make GUC or simply remove */);
+
+	/*
+	 * Collect and analyze JSON path values in single or multiple passes.
+	 * Sigle-pass collection is faster but consumes much more memory than
+	 * collecting and analyzing by the one path at pass.
+	 */
+	if (ctx.single_pass)
+	{
+		/* Collect all values of all paths */
+		jsonAnalyzePass(&ctx, jsonAnalyzeCollectPaths, (void *)(intptr_t) true, NULL);
+
+		root_analyzed_cnt = ctx.analyzed_cnt;
+		root_null_cnt = ctx.null_cnt;
+		root_total_width = ctx.total_width;
+
+		/*
+		 * Now that we're done with processing the documents, we sort the paths
+		 * we extracted and calculate stats for each of them.
+		 *
+		 * XXX I wonder if we could do this in two phases, to maybe not collect
+		 * (or even accumulate) values for paths that are not interesting.
+		 */
+		paths = jsonAnalyzeSortPaths(&ctx, &npaths);
+		pstats = palloc(sizeof(*pstats) * npaths);
+
+		for (int i = 0; i < npaths; i++)
+		{
+			JsonPathAnlStats *astats = (JsonPathAnlStats *) paths[i];
+			JsonPathAnlStats *parent = (JsonPathAnlStats *) paths[i]->parent;
+			JsonPathParentStats parent_stats;
+
+			if (parent)
+			{
+				parent_stats.freq = parent->freq;
+				parent_stats.count = parent->vstats.jsons.values.count;
+				parent_stats.narrays = parent->vstats.narrays;
+			}
+
+			pstats[i] = jsonAnalyzePath(&ctx, astats,
+										parent ? &parent_stats : NULL);
+		}
+	}
+	else
+	{
+		MemoryContext	oldcxt;
+		MemoryContext	tmpcxt = AllocSetContextCreate(CurrentMemoryContext,
+													"Json Analyze Tmp Context",
+													ALLOCSET_DEFAULT_MINSIZE,
+													ALLOCSET_DEFAULT_INITSIZE,
+													ALLOCSET_DEFAULT_MAXSIZE);
+		JsonPathParentStats *stack;
+
+		elog(DEBUG1, "analyzing %s attribute \"%s\"",
+			stats->attrtypid == JSONBOID ? "jsonb" : "json",
+			NameStr(stats->attr->attname));
+
+		elog(DEBUG1, "collecting json paths");
+
+		oldcxt = MemoryContextSwitchTo(tmpcxt);
+
+		/* Collect all paths first without accumulating any Values, sort them */
+		jsonAnalyzePass(&ctx, jsonAnalyzeCollectPaths, (void *)(intptr_t) false, NULL);
+		paths = jsonAnalyzeSortPaths(&ctx, &npaths);
+		pstats = MemoryContextAlloc(oldcxt, sizeof(*pstats) * npaths);
+		stack = MemoryContextAlloc(oldcxt, sizeof(*stack) * (ctx.maxdepth + 1));
+
+		root_analyzed_cnt = ctx.analyzed_cnt;
+		root_null_cnt = ctx.null_cnt;
+		root_total_width = ctx.total_width;
+
+		/*
+		 * Next, process each path independently to save memory (we don't want
+		 * to accumulate all values for all paths, with a lot of duplicities).
+		 */
+		MemoryContextReset(tmpcxt);
+
+		for (int i = 0; i < npaths; i++)
+		{
+			JsonPathEntry *path = paths[i];
+			JsonPathAnlStats astats_tmp;
+			JsonPathAnlStats *astats;
+
+			if (!i)
+				astats = ctx.root;
+			else
+			{
+				astats = &astats_tmp;
+				jsonStatsAnlInit(astats);
+				astats->path = *path;
+			}
+
+			elog(DEBUG1, "analyzing json path (%d/%d) %s",
+				 i + 1, npaths, path->pathstr);
+
+			jsonAnalyzePass(&ctx, jsonAnalyzeCollectPath, astats,
+							/* root has no bitmap */
+							i > 0 ? &((JsonPathAnlDocs *) path)->bitmap : NULL);
+
+			pstats[i] = jsonAnalyzePath(&ctx, astats,
+										path->depth ? &stack[path->depth - 1] : NULL);
+
+			/* Save parent stats in the stack */
+			stack[path->depth].freq = astats->freq;
+			stack[path->depth].count = astats->vstats.jsons.values.count;
+			stack[path->depth].narrays = astats->vstats.narrays;
+
+			MemoryContextReset(tmpcxt);
+		}
+
+		MemoryContextSwitchTo(oldcxt);
+
+		MemoryContextDelete(tmpcxt);
+	}
+
+	/* We can only compute real stats if we found some non-null values. */
+	if (root_null_cnt >= samplerows)
+	{
+		/* We found only nulls; assume the column is entirely null */
+		stats->stats_valid = true;
+		stats->stanullfrac = 1.0;
+		stats->stawidth = 0;		/* "unknown" */
+		stats->stadistinct = 0.0;	/* "unknown" */
+	}
+	else if (!root_analyzed_cnt)
+	{
+		int	nonnull_cnt = samplerows - root_null_cnt;
+
+		/* We found some non-null values, but they were all too wide */
+		stats->stats_valid = true;
+		/* Do the simple null-frac and width stats */
+		stats->stanullfrac = (double) root_null_cnt / (double) samplerows;
+		stats->stawidth = root_total_width / (double) nonnull_cnt;
+		/* Assume all too-wide values are distinct, so it's a unique column */
+		stats->stadistinct = -1.0 * (1.0 - stats->stanullfrac);
+	}
+	else
+	{
+		VacAttrStats   *jsstats = &ctx.root->vstats.jsons.stats;
+		int				i;
+		int				empty_slot = -1;
+
+		stats->stats_valid = true;
+
+		stats->stanullfrac	= jsstats->stanullfrac;
+		stats->stawidth		= jsstats->stawidth;
+		stats->stadistinct	= jsstats->stadistinct;
+
+		/*
+		 * We need to store the statistics the statistics slots. We simply
+		 * store the regular stats in the first slots, and then we put the
+		 * JSON stats into the first empty slot.
+		 */
+		for (i = 0; i < STATISTIC_NUM_SLOTS; i++)
+		{
+			/* once we hit an empty slot, we're done */
+			if (!jsstats->staop[i])
+			{
+				empty_slot = i;		/* remember the empty slot */
+				break;
+			}
+
+			stats->stakind[i] 		= jsstats->stakind[i];
+			stats->staop[i] 		= jsstats->staop[i];
+			stats->stanumbers[i] 	= jsstats->stanumbers[i];
+			stats->stavalues[i] 	= jsstats->stavalues[i];
+			stats->statypid[i] 		= jsstats->statypid[i];
+			stats->statyplen[i] 	= jsstats->statyplen[i];
+			stats->statypbyval[i] 	= jsstats->statypbyval[i];
+			stats->statypalign[i] 	= jsstats->statypalign[i];
+			stats->numnumbers[i] 	= jsstats->numnumbers[i];
+			stats->numvalues[i] 	= jsstats->numvalues[i];
+		}
+
+		Assert((empty_slot >= 0) && (empty_slot < STATISTIC_NUM_SLOTS));
+
+		stats->stakind[empty_slot] = STATISTIC_KIND_JSON;
+		stats->staop[empty_slot] = InvalidOid;
+		stats->numnumbers[empty_slot] = 1;
+		stats->stanumbers[empty_slot] = MemoryContextAlloc(stats->anl_context,
+														   sizeof(float4));
+		stats->stanumbers[empty_slot][0] = 0.0; /* nullfrac */
+		stats->stavalues[empty_slot] =
+			jsonAnalyzeMakeStats(&ctx, pstats, npaths,
+								 &stats->numvalues[empty_slot]);
+
+		/* We are storing jsonb values */
+		stats->statypid[empty_slot] = JSONBOID;
+		get_typlenbyvalalign(stats->statypid[empty_slot],
+							 &stats->statyplen[empty_slot],
+							 &stats->statypbyval[empty_slot],
+							 &stats->statypalign[empty_slot]);
+	}
+}
+
+/*
+ * json_typanalyze -- typanalyze function for jsonb
+ */
+Datum
+jsonb_typanalyze(PG_FUNCTION_ARGS)
+{
+	VacAttrStats *stats = (VacAttrStats *) PG_GETARG_POINTER(0);
+	Form_pg_attribute attr = stats->attr;
+
+	/* If the attstattarget column is negative, use the default value */
+	/* NB: it is okay to scribble on stats->attr since it's a copy */
+	if (attr->attstattarget < 0)
+		attr->attstattarget = default_statistics_target;
+
+	stats->compute_stats = compute_json_stats;
+	/* see comment about the choice of minrows in commands/analyze.c */
+	stats->minrows = 300 * attr->attstattarget;
+
+	PG_RETURN_BOOL(true);
+}
diff --git a/src/backend/utils/adt/jsonpath_exec.c b/src/backend/utils/adt/jsonpath_exec.c
index c55b3aa..613d16b 100644
--- a/src/backend/utils/adt/jsonpath_exec.c
+++ b/src/backend/utils/adt/jsonpath_exec.c
@@ -1793,7 +1793,7 @@ executeLikeRegex(JsonPathItem *jsp, JsonbValue *str, JsonbValue *rarg,
 		cxt->cflags = jspConvertRegexFlags(jsp->content.like_regex.flags);
 	}
 
-	if (RE_compile_and_execute(cxt->regex, str->val.string.val,
+	if (RE_compile_and_execute(cxt->regex, unconstify(char *, str->val.string.val),
 							   str->val.string.len,
 							   cxt->cflags, DEFAULT_COLLATION_OID, 0, NULL))
 		return jpbTrue;
diff --git a/src/include/catalog/pg_operator.dat b/src/include/catalog/pg_operator.dat
index 8e0e65a..9805bb1 100644
--- a/src/include/catalog/pg_operator.dat
+++ b/src/include/catalog/pg_operator.dat
@@ -3175,7 +3175,7 @@
 { oid => '3211', oid_symbol => 'JsonbObjectFieldOperator',
   descr => 'get jsonb object field',
   oprname => '->', oprleft => 'jsonb', oprright => 'text', oprresult => 'jsonb',
-  oprcode => 'jsonb_object_field' },
+  oprcode => 'jsonb_object_field', oprstat => 'jsonb_stats' },
 { oid => '3477', oid_symbol => 'JsonbObjectFieldTextOperator',
   descr => 'get jsonb object field as text',
   oprname => '->>', oprleft => 'jsonb', oprright => 'text', oprresult => 'text',
@@ -3183,7 +3183,7 @@
 { oid => '3212', oid_symbol => 'JsonbArrayElementOperator',
   descr => 'get jsonb array element',
   oprname => '->', oprleft => 'jsonb', oprright => 'int4', oprresult => 'jsonb',
-  oprcode => 'jsonb_array_element' },
+  oprcode => 'jsonb_array_element', oprstat => 'jsonb_stats' },
 { oid => '3481', oid_symbol => 'JsonbArrayElementTextOperator',
   descr => 'get jsonb array element as text',
   oprname => '->>', oprleft => 'jsonb', oprright => 'int4', oprresult => 'text',
@@ -3191,7 +3191,8 @@
 { oid => '3213', oid_symbol => 'JsonbExtractPathOperator',
   descr => 'get value from jsonb with path elements',
   oprname => '#>', oprleft => 'jsonb', oprright => '_text',
-  oprresult => 'jsonb', oprcode => 'jsonb_extract_path' },
+  oprresult => 'jsonb', oprcode => 'jsonb_extract_path',
+  oprstat => 'jsonb_stats' },
 { oid => '3206', oid_symbol => 'JsonbExtractPathTextOperator',
   descr => 'get value from jsonb as text with path elements',
   oprname => '#>>', oprleft => 'jsonb', oprright => '_text',
@@ -3229,23 +3230,23 @@
 { oid => '3246', oid_symbol => 'JsonbContainsOperator', descr => 'contains',
   oprname => '@>', oprleft => 'jsonb', oprright => 'jsonb', oprresult => 'bool',
   oprcom => '<@(jsonb,jsonb)', oprcode => 'jsonb_contains',
-  oprrest => 'matchingsel', oprjoin => 'matchingjoinsel' },
+  oprrest => 'jsonb_sel', oprjoin => 'matchingjoinsel' },
 { oid => '3247', oid_symbol => 'JsonbExistsOperator', descr => 'key exists',
   oprname => '?', oprleft => 'jsonb', oprright => 'text', oprresult => 'bool',
-  oprcode => 'jsonb_exists', oprrest => 'matchingsel',
+  oprcode => 'jsonb_exists', oprrest => 'jsonb_sel',
   oprjoin => 'matchingjoinsel' },
 { oid => '3248', oid_symbol => 'JsonbExistsAnyOperator', descr => 'any key exists',
   oprname => '?|', oprleft => 'jsonb', oprright => '_text', oprresult => 'bool',
-  oprcode => 'jsonb_exists_any', oprrest => 'matchingsel',
+  oprcode => 'jsonb_exists_any', oprrest => 'jsonb_sel',
   oprjoin => 'matchingjoinsel' },
 { oid => '3249', oid_symbol => 'JsonbExistsAllOperator', descr => 'all keys exist',
   oprname => '?&', oprleft => 'jsonb', oprright => '_text', oprresult => 'bool',
-  oprcode => 'jsonb_exists_all', oprrest => 'matchingsel',
+  oprcode => 'jsonb_exists_all', oprrest => 'jsonb_sel',
   oprjoin => 'matchingjoinsel' },
 { oid => '3250', oid_symbol => 'JsonbContainedOperator', descr => 'is contained by',
   oprname => '<@', oprleft => 'jsonb', oprright => 'jsonb', oprresult => 'bool',
   oprcom => '@>(jsonb,jsonb)', oprcode => 'jsonb_contained',
-  oprrest => 'matchingsel', oprjoin => 'matchingjoinsel' },
+  oprrest => 'jsonb_sel', oprjoin => 'matchingjoinsel' },
 { oid => '3284', descr => 'concatenate',
   oprname => '||', oprleft => 'jsonb', oprright => 'jsonb',
   oprresult => 'jsonb', oprcode => 'jsonb_concat' },
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 2530443..30ece2c 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11839,4 +11839,15 @@
   prorettype => 'bytea', proargtypes => 'pg_brin_minmax_multi_summary',
   prosrc => 'brin_minmax_multi_summary_send' },
 
+# jsonb statistics
+{ oid => '8526', descr => 'jsonb typanalyze',
+  proname => 'jsonb_typanalyze', provolatile => 's', prorettype => 'bool',
+  proargtypes => 'internal', prosrc => 'jsonb_typanalyze' },
+{ oid => '8527', descr => 'jsonb selectivity estimation',
+  proname => 'jsonb_sel', provolatile => 's', prorettype => 'float8',
+  proargtypes => 'internal oid internal int4', prosrc => 'jsonb_sel' },
+{ oid => '8528', descr => 'jsonb statsistics estimation',
+  proname => 'jsonb_stats', provolatile => 's', prorettype => 'void',
+  proargtypes => 'internal internal int4 internal', prosrc => 'jsonb_stats' },
+
 ]
diff --git a/src/include/catalog/pg_statistic.h b/src/include/catalog/pg_statistic.h
index cdf7448..c4f53eb 100644
--- a/src/include/catalog/pg_statistic.h
+++ b/src/include/catalog/pg_statistic.h
@@ -277,6 +277,8 @@ DECLARE_FOREIGN_KEY((starelid, staattnum), pg_attribute, (attrelid, attnum));
  */
 #define STATISTIC_KIND_BOUNDS_HISTOGRAM  7
 
+#define STATISTIC_KIND_JSON 8
+
 #endif							/* EXPOSE_TO_CLIENT_CODE */
 
 #endif							/* PG_STATISTIC_H */
diff --git a/src/include/catalog/pg_type.dat b/src/include/catalog/pg_type.dat
index df45879..b867db4 100644
--- a/src/include/catalog/pg_type.dat
+++ b/src/include/catalog/pg_type.dat
@@ -445,7 +445,7 @@
   typname => 'jsonb', typlen => '-1', typbyval => 'f', typcategory => 'U',
   typsubscript => 'jsonb_subscript_handler', typinput => 'jsonb_in',
   typoutput => 'jsonb_out', typreceive => 'jsonb_recv', typsend => 'jsonb_send',
-  typalign => 'i', typstorage => 'x' },
+  typanalyze => 'jsonb_typanalyze', typalign => 'i', typstorage => 'x' },
 { oid => '4072', array_type_oid => '4073', descr => 'JSON path',
   typname => 'jsonpath', typlen => '-1', typbyval => 'f', typcategory => 'U',
   typinput => 'jsonpath_in', typoutput => 'jsonpath_out',
diff --git a/src/include/utils/json_selfuncs.h b/src/include/utils/json_selfuncs.h
new file mode 100644
index 0000000..9a36567
--- /dev/null
+++ b/src/include/utils/json_selfuncs.h
@@ -0,0 +1,113 @@
+/*-------------------------------------------------------------------------
+ *
+ * json_selfuncs.h
+ *	  JSON cost estimation functions.
+ *
+ *
+ * Portions Copyright (c) 2016-2022, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *    src/include/utils/json_selfuncs.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef JSON_SELFUNCS_H_
+#define JSON_SELFUNCS_H 1
+
+#include "postgres.h"
+#include "access/htup.h"
+#include "utils/jsonb.h"
+#include "utils/lsyscache.h"
+#include "utils/selfuncs.h"
+
+#define JSON_PATH_ROOT "$"
+#define JSON_PATH_ROOT_LEN 1
+
+#define JSON_PATH_ROOT_ARRAY "$[*]"
+#define JSON_PATH_ROOT_ARRAY_LEN 4
+
+typedef enum
+{
+	JsonPathStatsValues,
+	JsonPathStatsArrayLength,
+	JsonPathStatsObjectLength
+} JsonPathStatsType;
+
+typedef struct JsonStatData JsonStatData, *JsonStats;
+
+/* Per-path JSON stats */
+typedef struct JsonPathStatsData
+{
+	JsonStats	data;			/* pointer to per-column control structure */
+	Datum	   *datum;			/* pointer to JSONB datum with stats data */
+	const char *path;			/* path string, points directly to JSONB data */
+	int			pathlen;		/* path length */
+	JsonPathStatsType type;		/* type of stats (values, lengths etc.) */
+} JsonPathStatsData, *JsonPathStats;
+
+/* Per-column JSON stats */
+struct JsonStatData
+{
+	HeapTuple	statsTuple;		/* original pg_statistic tuple */
+	AttStatsSlot attslot;		/* data extracted from STATISTIC_KIND_JSON
+								 * slot of statsTuple */
+	RelOptInfo *rel;			/* Relation, or NULL if not identifiable */
+	Datum	   *pathdatums;		/* path JSONB datums */
+	JsonPathStatsData *paths;	/* cached paths */
+	int			npaths;			/* number of paths */
+	float4		nullfrac;		/* NULL fraction */
+	const char *prefix;			/* global path prefix which needs to be used
+								 * for searching in pathdatums */
+	int			prefixlen;		/* path prefix length */
+	bool		acl_ok;			/* ACL check is Ok */
+};
+
+typedef enum JsonStatType
+{
+	JsonStatJsonb,
+	JsonStatJsonbWithoutSubpaths,
+	JsonStatText,
+	JsonStatFloat4,
+	JsonStatString,
+	JsonStatNumeric,
+	JsonStatFreq,
+} JsonStatType;
+
+extern bool jsonStatsInit(JsonStats stats, const VariableStatData *vardata);
+extern void jsonStatsRelease(JsonStats data);
+
+extern JsonPathStats jsonStatsGetPathByStr(JsonStats stats,
+										   const char *path, int pathlen);
+
+extern JsonPathStats jsonPathStatsGetSubpath(JsonPathStats stats,
+											 const char *subpath);
+
+extern bool jsonPathStatsGetNextSubpathStats(JsonPathStats stats,
+											 JsonPathStats *keystats,
+											 bool keysOnly);
+
+extern JsonPathStats jsonPathStatsGetArrayLengthStats(JsonPathStats pstats);
+extern JsonPathStats jsonPathStatsGetObjectLengthStats(JsonPathStats pstats);
+
+extern float4 jsonPathStatsGetFreq(JsonPathStats pstats, float4 defaultfreq);
+
+extern float4 jsonPathStatsGetTypeFreq(JsonPathStats pstats,
+									JsonbValueType type, float4 defaultfreq);
+
+extern float4 jsonPathStatsGetAvgArraySize(JsonPathStats pstats);
+
+extern Selectivity jsonPathStatsGetArrayIndexSelectivity(JsonPathStats pstats,
+														 int index);
+
+extern Selectivity jsonSelectivity(JsonPathStats stats, Datum scalar, Oid oper);
+
+extern void jsonPathAppendEntry(StringInfo path, const char *entry);
+
+extern bool jsonAnalyzeBuildSubPathsData(Datum *pathsDatums,
+										 int npaths, int index,
+										 const char	*path, int pathlen,
+										 bool includeSubpaths, float4 nullfrac,
+										 Datum *pvals, Datum *pnums);
+
+#endif /* JSON_SELFUNCS_H */
diff --git a/src/test/regress/expected/jsonb_stats.out b/src/test/regress/expected/jsonb_stats.out
new file mode 100644
index 0000000..c7b1e64
--- /dev/null
+++ b/src/test/regress/expected/jsonb_stats.out
@@ -0,0 +1,713 @@
+CREATE OR REPLACE FUNCTION explain_jsonb(sql_query text)
+RETURNS TABLE(explain_line json) AS
+$$
+BEGIN
+	RETURN QUERY EXECUTE 'EXPLAIN (ANALYZE, FORMAT json) ' || sql_query;
+END;
+$$ LANGUAGE plpgsql;
+CREATE OR REPLACE FUNCTION get_plan_and_actual_rows(sql_query text)
+RETURNS TABLE(plan integer, actual integer) AS
+$$
+	SELECT
+		(plan->>'Plan Rows')::integer plan,
+		(plan->>'Actual Rows')::integer actual
+	FROM (
+		SELECT explain_jsonb(sql_query) #> '{0,Plan,Plans,0}'
+	) p(plan)
+$$ LANGUAGE sql;
+CREATE OR REPLACE FUNCTION check_estimate(sql_query text, accuracy real)
+RETURNS boolean AS
+$$
+	SELECT plan BETWEEN actual / (1 + accuracy) AND (actual + 1) * (1 + accuracy)
+	FROM (SELECT * FROM get_plan_and_actual_rows(sql_query)) x
+$$ LANGUAGE sql;
+CREATE OR REPLACE FUNCTION check_estimate2(sql_query text, accuracy real)
+RETURNS TABLE(min integer, max integer) AS
+$$
+	SELECT (actual * (1 - accuracy))::integer, ((actual + 1) * (1 + accuracy))::integer
+	FROM (SELECT * FROM get_plan_and_actual_rows(sql_query)) x
+$$ LANGUAGE sql;
+CREATE TABLE jsonb_stats_test(js jsonb);
+INSERT INTO jsonb_stats_test SELECT NULL FROM generate_series(1, 1000);
+INSERT INTO jsonb_stats_test SELECT 'null' FROM generate_series(1, 200);
+INSERT INTO jsonb_stats_test SELECT 'true' FROM generate_series(1, 300);
+INSERT INTO jsonb_stats_test SELECT 'false' FROM generate_series(1, 500);
+INSERT INTO jsonb_stats_test SELECT '12345' FROM generate_series(1, 100);
+INSERT INTO jsonb_stats_test SELECT (1000 * (i % 10))::text::jsonb FROM generate_series(1, 400) i;
+INSERT INTO jsonb_stats_test SELECT i::text::jsonb FROM generate_series(1, 500) i;
+INSERT INTO jsonb_stats_test SELECT '"foo"' FROM generate_series(1, 100);
+INSERT INTO jsonb_stats_test SELECT format('"bar%s"', i % 10)::jsonb FROM generate_series(1, 400) i;
+INSERT INTO jsonb_stats_test SELECT format('"baz%s"', i)::jsonb FROM generate_series(1, 500) i;
+INSERT INTO jsonb_stats_test SELECT '{}' FROM generate_series(1, 100);
+INSERT INTO jsonb_stats_test SELECT jsonb_build_object('foo', 'bar') FROM generate_series(1, 100);
+INSERT INTO jsonb_stats_test SELECT jsonb_build_object('foo', 'baz' || (i % 10)) FROM generate_series(1, 300) i;
+INSERT INTO jsonb_stats_test SELECT jsonb_build_object('foo', i % 10) FROM generate_series(1, 200) i;
+INSERT INTO jsonb_stats_test SELECT jsonb_build_object('"foo \"bar"', i % 10) FROM generate_series(1, 200) i;
+INSERT INTO jsonb_stats_test SELECT '[]' FROM generate_series(1, 100);
+INSERT INTO jsonb_stats_test SELECT '["foo"]' FROM generate_series(1, 200);
+INSERT INTO jsonb_stats_test SELECT '[12345]' FROM generate_series(1, 300);
+INSERT INTO jsonb_stats_test SELECT '[["foo"]]' FROM generate_series(1, 200);
+INSERT INTO jsonb_stats_test SELECT '[{"key": "foo"}]' FROM generate_series(1, 200);
+INSERT INTO jsonb_stats_test SELECT '[null, "foo"]' FROM generate_series(1, 200);
+INSERT INTO jsonb_stats_test SELECT '[null, 12345]' FROM generate_series(1, 300);
+INSERT INTO jsonb_stats_test SELECT '[null, ["foo"]]' FROM generate_series(1, 200);
+INSERT INTO jsonb_stats_test SELECT '[null, {"key": "foo"}]' FROM generate_series(1, 200);
+-- Build random variable-length integer arrays
+SELECT setseed(0.0);
+ setseed 
+---------
+ 
+(1 row)
+
+INSERT INTO jsonb_stats_test
+SELECT jsonb_build_object('array',
+	jsonb_build_array())
+FROM generate_series(1, 1000);
+INSERT INTO jsonb_stats_test
+SELECT jsonb_build_object('array',
+	jsonb_build_array(
+		floor(random() * 10)::int))
+FROM generate_series(1, 4000);
+INSERT INTO jsonb_stats_test
+SELECT jsonb_build_object('array',
+	jsonb_build_array(
+		floor(random() * 10)::int,
+		floor(random() * 10)::int))
+FROM generate_series(1, 3000);
+INSERT INTO jsonb_stats_test
+SELECT jsonb_build_object('array',
+	jsonb_build_array(
+		floor(random() * 10)::int,
+		floor(random() * 10)::int,
+		floor(random() * 10)::int))
+FROM generate_series(1, 2000);
+ANALYZE jsonb_stats_test;
+CREATE OR REPLACE FUNCTION check_jsonb_stats_test_estimate(sql_condition text, accuracy real)
+RETURNS boolean AS
+$$
+	SELECT check_estimate('SELECT count(*) FROM jsonb_stats_test WHERE ' || sql_condition, accuracy)
+$$ LANGUAGE sql;
+DROP FUNCTION IF EXISTS check_jsonb_stats_test_estimate2(text, real);
+NOTICE:  function check_jsonb_stats_test_estimate2(text,pg_catalog.float4) does not exist, skipping
+CREATE OR REPLACE FUNCTION check_jsonb_stats_test_estimate2(sql_condition text, accuracy real)
+RETURNS TABLE(plan integer, actual integer) AS
+$$
+	SELECT get_plan_and_actual_rows('SELECT count(*) FROM jsonb_stats_test WHERE ' || sql_condition)
+$$ LANGUAGE sql;
+-- Check NULL estimate
+SELECT check_jsonb_stats_test_estimate($$js IS NULL$$, 0.03);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'bad_key' IS NULL$$, 0.01);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js #> '{bad_key}' IS NULL$$, 0.01);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 1000000 IS NULL$$, 0.01);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js #> '{1000000}' IS NULL$$, 0.01);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'bad_key1' -> 'bad_key2' IS NULL$$, 0.01);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js #> '{bad_key1,bad_key2}' IS NULL$$, 0.01);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'bad_key1' -> 1 IS NULL$$, 0.01);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js #> '{bad_key1,1}' IS NULL$$, 0.01);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 1000000 -> 'foo' IS NULL$$, 0.01);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js #> '{1000000,foo}' IS NULL$$, 0.01);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'bad_key' = '123'$$, 0.01);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 1000000 = '123'$$, 0.01);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+-- Check null eq estimate
+SELECT check_jsonb_stats_test_estimate($$js =  'null'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> 'null'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+-- Check boolean eq estimate
+SELECT check_jsonb_stats_test_estimate($$js =  'true'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> 'true'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js =  'false'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> 'false'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+-- Check numeric eq estimate
+SELECT check_jsonb_stats_test_estimate($$js = '12345'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js#>'{}' = '12345'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js = '3000'$$, 0.3);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js = '1234'$$, 1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> '6000'$$, 0.2);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+-- Check numeric range estimate
+SELECT check_jsonb_stats_test_estimate($$js < '0'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js < '100'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js < '1000'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js < '3456'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js < '10000'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js < '100000'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js > '100' AND js < '600'$$, 0.5);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js > '6800' AND js < '12000'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+-- Check string eq estimate
+SELECT check_jsonb_stats_test_estimate($$js = '"foo"'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js = '"bar7"'$$, 0.2);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js = '"baz1234"'$$, 10);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> '"bar4"'$$, 0.3);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+-- Check string range estimate
+SELECT check_jsonb_stats_test_estimate($$js > '"foo"'$$, 0.01);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js > '"bar"'$$, 0.01);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js > '"baz"'$$, 0.01);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+-- Check object eq estimate
+SELECT check_jsonb_stats_test_estimate($$js = '{}'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js > '{}'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+-- Check object key eq estimate
+SELECT check_jsonb_stats_test_estimate($$js -> 'foo' = '"bar"'$$, 0.2);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'foo' = '"baz"'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'foo' = '"baz5"'$$, 0.3);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js #> '{foo}' = '"bar"'$$, 0.2);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+-- Check object key range estimate
+SELECT check_jsonb_stats_test_estimate($$js -> 'foo' >= '"baz2"'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'foo' <  '"baz9"'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'foo' >= '"baz2"' AND js -> 'foo' < '"baz9"'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+-- Check array eq estimate
+SELECT check_jsonb_stats_test_estimate($$js = '[]'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js >= '[]' AND js < '{}'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+-- Check variable-length array element eq estimate
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 0 = '1'$$, 0.2);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 1 = '6'$$, 0.2);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 2 = '8'$$, 0.2);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 3 = '1'$$, 0.2);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+-- Check variable-length array element range estimate
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 0 < '7'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 1 < '7'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 2 < '7'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 3 < '7'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+-- Check variable-length array containment estimate
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' @> '[]'$$, 0.2);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' @> '[1]'$$, 0.2);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' @> '[100]'$$, 0.2);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' @> '[1, 2]'$$, 1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' @> '[1, 100]'$$, 1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' @> '[1, 2, 100]'$$, 1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' @> '[1, 2, 3]'$$, 5);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' @> '1'$$, 0.3);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' @> '100'$$, 10);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 0 @> '1'$$, 0.3);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 1 @> '1'$$, 0.3);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 2 @> '1'$$, 0.3);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 3 @> '1'$$, 0.3);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 0 @> '[1]'$$, 0.3);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> '{"array": []}'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> '{"array": [1]}'$$, 0.3);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> '{"array": [100]}'$$, 0.3);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> '{"array": [1, 2]}'$$, 1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> '{"array": [1, 100]}'$$, 1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> '{"array": [1, 2, 100]}'$$, 1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> '{"array": [1, 2, 3]}'$$, 3);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+-- check misc containment
+SELECT check_jsonb_stats_test_estimate($$js @> '"foo"'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> '12345'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> '[]'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> '[12345]'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> '["foo"]'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> '[["foo", "bar"]]'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> '[["foo"]]'$$, 0.2);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> '[{"key": "foo"}]'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js @> '[null]'$$, 0.3);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+-- Check object key null estimate
+SELECT check_jsonb_stats_test_estimate($$js -> 'foo' IS NULL$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'foo' IS NOT NULL$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> '"foo \"bar"' IS NOT NULL$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'bad_key' IS NULL$$, 0.01);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'bad_key' IS NOT NULL$$, 0.01);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+-- Check object key existence
+SELECT check_jsonb_stats_test_estimate($$js ? 'bad_key'$$, 10);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js ? 'foo'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js ? 'array'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js ?| '{foo,bad_key}'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js ?| '{foo,array}'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js ?& '{foo,bad_key}'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
+SELECT check_jsonb_stats_test_estimate($$js ?& '{foo,bar}'$$, 0.1);
+ check_jsonb_stats_test_estimate 
+---------------------------------
+ t
+(1 row)
+
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 423b9b9..05e5a4b 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2523,6 +2523,38 @@ pg_stats_ext_exprs| SELECT cn.nspname AS schemaname,
      LEFT JOIN pg_namespace sn ON ((sn.oid = s.stxnamespace)))
      JOIN LATERAL ( SELECT unnest(pg_get_statisticsobjdef_expressions(s.oid)) AS expr,
             unnest(sd.stxdexpr) AS a) stat ON ((stat.expr IS NOT NULL)));
+pg_stats_json| SELECT n.nspname AS schemaname,
+    c.relname AS tablename,
+    a.attname,
+    (paths.path ->> 'path'::text) AS json_path,
+    s.stainherit AS inherited,
+    (((paths.path -> 'json'::text) ->> 'nullfrac'::text))::real AS null_frac,
+    (((paths.path -> 'json'::text) ->> 'width'::text))::real AS avg_width,
+    (((paths.path -> 'json'::text) ->> 'distinct'::text))::real AS n_distinct,
+    ARRAY( SELECT val.value AS val
+           FROM jsonb_array_elements((((paths.path -> 'json'::text) -> 'mcv'::text) -> 'values'::text)) val(value)) AS most_common_vals,
+    ARRAY( SELECT ((num.value)::text)::real AS num
+           FROM jsonb_array_elements((((paths.path -> 'json'::text) -> 'mcv'::text) -> 'numbers'::text)) num(value)) AS most_common_freqs,
+    ARRAY( SELECT val.value AS val
+           FROM jsonb_array_elements((((paths.path -> 'json'::text) -> 'histogram'::text) -> 'values'::text)) val(value)) AS histogram_bounds,
+    ARRAY( SELECT ((val.value)::text)::integer AS val
+           FROM jsonb_array_elements((((paths.path -> 'array_length'::text) -> 'mcv'::text) -> 'values'::text)) val(value)) AS most_common_array_lengths,
+    ARRAY( SELECT ((num.value)::text)::real AS num
+           FROM jsonb_array_elements((((paths.path -> 'array_length'::text) -> 'mcv'::text) -> 'numbers'::text)) num(value)) AS most_common_array_length_freqs,
+    (((paths.path -> 'json'::text) ->> 'correlation'::text))::real AS correlation
+   FROM (((pg_statistic s
+     JOIN pg_class c ON ((c.oid = s.starelid)))
+     JOIN pg_attribute a ON (((c.oid = a.attrelid) AND (a.attnum = s.staattnum))))
+     LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace))),
+    LATERAL ( SELECT unnest((((
+                CASE
+                    WHEN (s.stakind1 = 8) THEN s.stavalues1
+                    WHEN (s.stakind2 = 8) THEN s.stavalues2
+                    WHEN (s.stakind3 = 8) THEN s.stavalues3
+                    WHEN (s.stakind4 = 8) THEN s.stavalues4
+                    WHEN (s.stakind5 = 8) THEN s.stavalues5
+                    ELSE NULL::anyarray
+                END)::text)::jsonb[])[2:]) AS path) paths;
 pg_tables| SELECT n.nspname AS schemaname,
     c.relname AS tablename,
     pg_get_userbyid(c.relowner) AS tableowner,
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 5030d19..2c8fc11 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -111,7 +111,7 @@ test: select_views portals_p2 foreign_key cluster dependency guc bitmapops combo
 # ----------
 # Another group of parallel tests (JSON related)
 # ----------
-test: json jsonb json_encoding jsonpath jsonpath_encoding jsonb_jsonpath sqljson json_sqljson jsonb_sqljson
+test: json jsonb json_encoding jsonpath jsonpath_encoding jsonb_jsonpath sqljson json_sqljson jsonb_sqljson jsonb_stats
 
 # ----------
 # Another group of parallel tests
diff --git a/src/test/regress/sql/jsonb_stats.sql b/src/test/regress/sql/jsonb_stats.sql
new file mode 100644
index 0000000..fac71d0
--- /dev/null
+++ b/src/test/regress/sql/jsonb_stats.sql
@@ -0,0 +1,249 @@
+CREATE OR REPLACE FUNCTION explain_jsonb(sql_query text)
+RETURNS TABLE(explain_line json) AS
+$$
+BEGIN
+	RETURN QUERY EXECUTE 'EXPLAIN (ANALYZE, FORMAT json) ' || sql_query;
+END;
+$$ LANGUAGE plpgsql;
+
+CREATE OR REPLACE FUNCTION get_plan_and_actual_rows(sql_query text)
+RETURNS TABLE(plan integer, actual integer) AS
+$$
+	SELECT
+		(plan->>'Plan Rows')::integer plan,
+		(plan->>'Actual Rows')::integer actual
+	FROM (
+		SELECT explain_jsonb(sql_query) #> '{0,Plan,Plans,0}'
+	) p(plan)
+$$ LANGUAGE sql;
+
+CREATE OR REPLACE FUNCTION check_estimate(sql_query text, accuracy real)
+RETURNS boolean AS
+$$
+	SELECT plan BETWEEN actual / (1 + accuracy) AND (actual + 1) * (1 + accuracy)
+	FROM (SELECT * FROM get_plan_and_actual_rows(sql_query)) x
+$$ LANGUAGE sql;
+
+CREATE OR REPLACE FUNCTION check_estimate2(sql_query text, accuracy real)
+RETURNS TABLE(min integer, max integer) AS
+$$
+	SELECT (actual * (1 - accuracy))::integer, ((actual + 1) * (1 + accuracy))::integer
+	FROM (SELECT * FROM get_plan_and_actual_rows(sql_query)) x
+$$ LANGUAGE sql;
+
+CREATE TABLE jsonb_stats_test(js jsonb);
+
+INSERT INTO jsonb_stats_test SELECT NULL FROM generate_series(1, 1000);
+
+INSERT INTO jsonb_stats_test SELECT 'null' FROM generate_series(1, 200);
+INSERT INTO jsonb_stats_test SELECT 'true' FROM generate_series(1, 300);
+INSERT INTO jsonb_stats_test SELECT 'false' FROM generate_series(1, 500);
+
+INSERT INTO jsonb_stats_test SELECT '12345' FROM generate_series(1, 100);
+INSERT INTO jsonb_stats_test SELECT (1000 * (i % 10))::text::jsonb FROM generate_series(1, 400) i;
+INSERT INTO jsonb_stats_test SELECT i::text::jsonb FROM generate_series(1, 500) i;
+
+INSERT INTO jsonb_stats_test SELECT '"foo"' FROM generate_series(1, 100);
+INSERT INTO jsonb_stats_test SELECT format('"bar%s"', i % 10)::jsonb FROM generate_series(1, 400) i;
+INSERT INTO jsonb_stats_test SELECT format('"baz%s"', i)::jsonb FROM generate_series(1, 500) i;
+
+INSERT INTO jsonb_stats_test SELECT '{}' FROM generate_series(1, 100);
+INSERT INTO jsonb_stats_test SELECT jsonb_build_object('foo', 'bar') FROM generate_series(1, 100);
+INSERT INTO jsonb_stats_test SELECT jsonb_build_object('foo', 'baz' || (i % 10)) FROM generate_series(1, 300) i;
+INSERT INTO jsonb_stats_test SELECT jsonb_build_object('foo', i % 10) FROM generate_series(1, 200) i;
+INSERT INTO jsonb_stats_test SELECT jsonb_build_object('"foo \"bar"', i % 10) FROM generate_series(1, 200) i;
+
+INSERT INTO jsonb_stats_test SELECT '[]' FROM generate_series(1, 100);
+INSERT INTO jsonb_stats_test SELECT '["foo"]' FROM generate_series(1, 200);
+INSERT INTO jsonb_stats_test SELECT '[12345]' FROM generate_series(1, 300);
+INSERT INTO jsonb_stats_test SELECT '[["foo"]]' FROM generate_series(1, 200);
+INSERT INTO jsonb_stats_test SELECT '[{"key": "foo"}]' FROM generate_series(1, 200);
+INSERT INTO jsonb_stats_test SELECT '[null, "foo"]' FROM generate_series(1, 200);
+INSERT INTO jsonb_stats_test SELECT '[null, 12345]' FROM generate_series(1, 300);
+INSERT INTO jsonb_stats_test SELECT '[null, ["foo"]]' FROM generate_series(1, 200);
+INSERT INTO jsonb_stats_test SELECT '[null, {"key": "foo"}]' FROM generate_series(1, 200);
+
+-- Build random variable-length integer arrays
+SELECT setseed(0.0);
+
+INSERT INTO jsonb_stats_test
+SELECT jsonb_build_object('array',
+	jsonb_build_array())
+FROM generate_series(1, 1000);
+
+INSERT INTO jsonb_stats_test
+SELECT jsonb_build_object('array',
+	jsonb_build_array(
+		floor(random() * 10)::int))
+FROM generate_series(1, 4000);
+
+INSERT INTO jsonb_stats_test
+SELECT jsonb_build_object('array',
+	jsonb_build_array(
+		floor(random() * 10)::int,
+		floor(random() * 10)::int))
+FROM generate_series(1, 3000);
+
+INSERT INTO jsonb_stats_test
+SELECT jsonb_build_object('array',
+	jsonb_build_array(
+		floor(random() * 10)::int,
+		floor(random() * 10)::int,
+		floor(random() * 10)::int))
+FROM generate_series(1, 2000);
+
+
+ANALYZE jsonb_stats_test;
+
+CREATE OR REPLACE FUNCTION check_jsonb_stats_test_estimate(sql_condition text, accuracy real)
+RETURNS boolean AS
+$$
+	SELECT check_estimate('SELECT count(*) FROM jsonb_stats_test WHERE ' || sql_condition, accuracy)
+$$ LANGUAGE sql;
+
+DROP FUNCTION IF EXISTS check_jsonb_stats_test_estimate2(text, real);
+
+CREATE OR REPLACE FUNCTION check_jsonb_stats_test_estimate2(sql_condition text, accuracy real)
+RETURNS TABLE(plan integer, actual integer) AS
+$$
+	SELECT get_plan_and_actual_rows('SELECT count(*) FROM jsonb_stats_test WHERE ' || sql_condition)
+$$ LANGUAGE sql;
+
+-- Check NULL estimate
+SELECT check_jsonb_stats_test_estimate($$js IS NULL$$, 0.03);
+SELECT check_jsonb_stats_test_estimate($$js -> 'bad_key' IS NULL$$, 0.01);
+SELECT check_jsonb_stats_test_estimate($$js #> '{bad_key}' IS NULL$$, 0.01);
+SELECT check_jsonb_stats_test_estimate($$js -> 1000000 IS NULL$$, 0.01);
+SELECT check_jsonb_stats_test_estimate($$js #> '{1000000}' IS NULL$$, 0.01);
+SELECT check_jsonb_stats_test_estimate($$js -> 'bad_key1' -> 'bad_key2' IS NULL$$, 0.01);
+SELECT check_jsonb_stats_test_estimate($$js #> '{bad_key1,bad_key2}' IS NULL$$, 0.01);
+SELECT check_jsonb_stats_test_estimate($$js -> 'bad_key1' -> 1 IS NULL$$, 0.01);
+SELECT check_jsonb_stats_test_estimate($$js #> '{bad_key1,1}' IS NULL$$, 0.01);
+SELECT check_jsonb_stats_test_estimate($$js -> 1000000 -> 'foo' IS NULL$$, 0.01);
+SELECT check_jsonb_stats_test_estimate($$js #> '{1000000,foo}' IS NULL$$, 0.01);
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'bad_key' = '123'$$, 0.01);
+SELECT check_jsonb_stats_test_estimate($$js -> 1000000 = '123'$$, 0.01);
+
+-- Check null eq estimate
+SELECT check_jsonb_stats_test_estimate($$js =  'null'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js @> 'null'$$, 0.1);
+
+-- Check boolean eq estimate
+SELECT check_jsonb_stats_test_estimate($$js =  'true'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js @> 'true'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js =  'false'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js @> 'false'$$, 0.1);
+
+-- Check numeric eq estimate
+SELECT check_jsonb_stats_test_estimate($$js = '12345'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js#>'{}' = '12345'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js = '3000'$$, 0.3);
+SELECT check_jsonb_stats_test_estimate($$js = '1234'$$, 1);
+SELECT check_jsonb_stats_test_estimate($$js @> '6000'$$, 0.2);
+
+-- Check numeric range estimate
+SELECT check_jsonb_stats_test_estimate($$js < '0'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js < '100'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js < '1000'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js < '3456'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js < '10000'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js < '100000'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js > '100' AND js < '600'$$, 0.5);
+SELECT check_jsonb_stats_test_estimate($$js > '6800' AND js < '12000'$$, 0.1);
+
+-- Check string eq estimate
+SELECT check_jsonb_stats_test_estimate($$js = '"foo"'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js = '"bar7"'$$, 0.2);
+SELECT check_jsonb_stats_test_estimate($$js = '"baz1234"'$$, 10);
+SELECT check_jsonb_stats_test_estimate($$js @> '"bar4"'$$, 0.3);
+
+-- Check string range estimate
+SELECT check_jsonb_stats_test_estimate($$js > '"foo"'$$, 0.01);
+SELECT check_jsonb_stats_test_estimate($$js > '"bar"'$$, 0.01);
+SELECT check_jsonb_stats_test_estimate($$js > '"baz"'$$, 0.01);
+
+-- Check object eq estimate
+SELECT check_jsonb_stats_test_estimate($$js = '{}'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js > '{}'$$, 0.1);
+
+-- Check object key eq estimate
+SELECT check_jsonb_stats_test_estimate($$js -> 'foo' = '"bar"'$$, 0.2);
+SELECT check_jsonb_stats_test_estimate($$js -> 'foo' = '"baz"'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js -> 'foo' = '"baz5"'$$, 0.3);
+SELECT check_jsonb_stats_test_estimate($$js #> '{foo}' = '"bar"'$$, 0.2);
+
+-- Check object key range estimate
+SELECT check_jsonb_stats_test_estimate($$js -> 'foo' >= '"baz2"'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js -> 'foo' <  '"baz9"'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js -> 'foo' >= '"baz2"' AND js -> 'foo' < '"baz9"'$$, 0.1);
+
+-- Check array eq estimate
+SELECT check_jsonb_stats_test_estimate($$js = '[]'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js >= '[]' AND js < '{}'$$, 0.1);
+
+-- Check variable-length array element eq estimate
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 0 = '1'$$, 0.2);
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 1 = '6'$$, 0.2);
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 2 = '8'$$, 0.2);
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 3 = '1'$$, 0.2);
+
+-- Check variable-length array element range estimate
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 0 < '7'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 1 < '7'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 2 < '7'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 3 < '7'$$, 0.1);
+
+-- Check variable-length array containment estimate
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' @> '[]'$$, 0.2);
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' @> '[1]'$$, 0.2);
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' @> '[100]'$$, 0.2);
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' @> '[1, 2]'$$, 1);
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' @> '[1, 100]'$$, 1);
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' @> '[1, 2, 100]'$$, 1);
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' @> '[1, 2, 3]'$$, 5);
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' @> '1'$$, 0.3);
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' @> '100'$$, 10);
+
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 0 @> '1'$$, 0.3);
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 1 @> '1'$$, 0.3);
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 2 @> '1'$$, 0.3);
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 3 @> '1'$$, 0.3);
+SELECT check_jsonb_stats_test_estimate($$js -> 'array' -> 0 @> '[1]'$$, 0.3);
+
+SELECT check_jsonb_stats_test_estimate($$js @> '{"array": []}'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js @> '{"array": [1]}'$$, 0.3);
+SELECT check_jsonb_stats_test_estimate($$js @> '{"array": [100]}'$$, 0.3);
+SELECT check_jsonb_stats_test_estimate($$js @> '{"array": [1, 2]}'$$, 1);
+SELECT check_jsonb_stats_test_estimate($$js @> '{"array": [1, 100]}'$$, 1);
+SELECT check_jsonb_stats_test_estimate($$js @> '{"array": [1, 2, 100]}'$$, 1);
+SELECT check_jsonb_stats_test_estimate($$js @> '{"array": [1, 2, 3]}'$$, 3);
+
+-- check misc containment
+SELECT check_jsonb_stats_test_estimate($$js @> '"foo"'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js @> '12345'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js @> '[]'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js @> '[12345]'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js @> '["foo"]'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js @> '[["foo", "bar"]]'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js @> '[["foo"]]'$$, 0.2);
+SELECT check_jsonb_stats_test_estimate($$js @> '[{"key": "foo"}]'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js @> '[null]'$$, 0.3);
+
+-- Check object key null estimate
+SELECT check_jsonb_stats_test_estimate($$js -> 'foo' IS NULL$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js -> 'foo' IS NOT NULL$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js -> '"foo \"bar"' IS NOT NULL$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js -> 'bad_key' IS NULL$$, 0.01);
+SELECT check_jsonb_stats_test_estimate($$js -> 'bad_key' IS NOT NULL$$, 0.01);
+
+-- Check object key existence
+SELECT check_jsonb_stats_test_estimate($$js ? 'bad_key'$$, 10);
+SELECT check_jsonb_stats_test_estimate($$js ? 'foo'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js ? 'array'$$, 0.1);
+
+SELECT check_jsonb_stats_test_estimate($$js ?| '{foo,bad_key}'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js ?| '{foo,array}'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js ?& '{foo,bad_key}'$$, 0.1);
+SELECT check_jsonb_stats_test_estimate($$js ?& '{foo,bar}'$$, 0.1);
-- 
1.8.3.1

#21

Mahendra Singh Thalor

mahi6run@gmail.com

over 3 years ago

In reply to: Nikita Glukhov (#16)

Re: Collecting statistics about contents of JSONB columns

On Fri, 11 Mar 2022 at 04:29, Nikita Glukhov <n.gluhov@postgrespro.ru>
wrote:

On 04.02.2022 05:47, Tomas Vondra wrote:

On 1/25/22 17:56, Mahendra Singh Thalor wrote:

...

For the last few days, I was trying to understand these patches, and

based on Tomas's suggestion, I was doing some performance tests.

With the attached .SQL file, I can see that analyze is taking more time

with these patches.

I haven't found the root cause of this but I feel that this time is due

to a loop of all the paths.

In my test data, there is a total of 951 different-2 paths. While doing

analysis, first we check all the sample rows(30000) and we collect all the
different-2 paths (here 951), and after that for every single path, we loop
over all the sample rows again to collect stats for a particular path. I
feel that these loops might be taking time.

Thanks, I've been doing some performance tests too, and you're right it

takes quite a bit of time.

That is absolutely not surprising, I have warned about poor performance
in cases with a large number of paths.

I agree the slowness is largely due to extracting all paths and then

processing them one by one - which means we have to loop over the tuples
over and over. In this case there's about 850k distinct paths extracted, so
we do ~850k loops over 30k tuples. That's gotta take time.

I don't know what exactly to do about this, but I already mentioned we

may need to pick a subset of paths to keep, similarly to how we pick items
for MCV. I mean, if we only saw a path once or twice, it's unlikely to be
interesting enough to build stats for it. I haven't tried, but I'd bet most
of the 850k paths might be ignored like this.

The other thing we might do is making it the loops more efficient. For

example, we might track which documents contain each path (by a small
bitmap or something), so that in the loop we can skip rows that don't
contain the path we're currently processing. Or something like that.

Apart from this performance issue, I haven't found any crashes or issues.

Well, I haven't seen any crashes either, but as I mentioned for complex

documents (2 levels, many distinct keys) the ANALYZE starts consuming a lot
of memory and may get killed by OOM. For example if you generate documents
like this

./json-generate.py 30000 2 8 1000 6 1000

and then run ANALYZE, that'll take ages and it very quickly gets into a

situation like this (generated from gdb by calling MemoryContextStats on
TopMemoryContext): and then run ANALYZE, that'll take ages and it very
quickly gets into a situation like this (generated from gdb by calling
MemoryContextStats on TopMemoryContext):

-------------------------------------------------------------------------
TopMemoryContext: 80776 total in 6 blocks; 13992 free (18 chunks); 66784

used

...
TopPortalContext: 8192 total in 1 blocks; 7656 free (0 chunks); 536 used
PortalContext: 1024 total in 1 blocks; 488 free (0 chunks); 536 used:

Analyze: 472726496 total in 150 blocks; 3725776 free (4 chunks);

469000720 used

Analyze Column: 921177696 total in 120 blocks; 5123256 free (238

chunks); 916054440 used

Json Analyze Tmp Context: 8192 total in 1 blocks; 5720 free (1

chunks); 2472 used

Json Analyze Pass Context: 8192 total in 1 blocks; 7928 free

(0 chunks); 264 used

JSON analyze path table: 1639706040 total in 25084 blocks;

1513640 free (33 chunks); 1638192400 used

Vacuum: 8192 total in 1 blocks; 7448 free (0 chunks); 744 used
...
Grand total: 3035316184 bytes in 25542 blocks; 10971120 free (352

chunks); 3024345064 used

-------------------------------------------------------------------------

Yes, that's backend 3GB of memory, out of which 1.6GB is in "analyze path

table" context, 400MB in "analyze" and 900MB in "analyze column" contexts.
I mean, that seems a bit excessive. And it grows over time, so after a
while my laptop gives up and kills the backend.

I'm not sure if it's a memory leak (which would be fixable), or it's due

to keeping stats for all the extracted paths. I mean, in this particular
case we have 850k paths - even if stats are just 1kB per path, that's
850MB. This requires more investigation.

Thank you for the tests and investigation.

I have tried to reduce memory consumption and speed up row scanning:

1. "JSON analyze path table" context contained ~1KB JsonPathAnlStats
structure per JSON path in the global hash table. I have moved
JsonPathAnlStats to the stack of compute_json_stats(), and hash
table now consumes ~70 bytes per path.

2. I have fixed copying of resulting JSONB stats into context, which
reduced the size of "Analyze Column" context.

3. I have optimized consumption of single-pass algorithm by storing
only value lists in the non-temporary context. That helped to
execute "2 64 64" test case in 30 seconds. Single-pass is a
bit faster in non-TOASTed cases, and much faster in TOASTed.
But it consumes much more memory and still goes to OOM in the
cases with more than ~100k paths.

4. I have implemented per-path document lists/bitmaps, which really
speed up the case "2 8 1000". List is converted into bitmap when
it becomes larger than bitmap.

5. Also I have fixed some bugs.

All these changes you can find commit form in our GitHub repository
on the branch jsonb_stats_20220310 [1].

Updated results of the test:

levels keys uniq keys paths master multi-pass single-pass
ms MB ms MB
-------------------------------------------------------------------
1 1 1 2 153 122 10 82 14
1 1 1000 1001 134 105 11 78 38
1 8 8 9 157 384 19 328 32
1 8 1000 1001 155 454 23 402 72
1 64 64 65 129 2889 45 2386 155
1 64 1000 1001 158 3990 94 1447 177
2 1 1 3 237 147 10 91 16
2 1 1000 30577 152 264 32 394 234
2 8 8 72 245 1943 37 1692 139
2 8 1000 852333 152 9175 678 OOM
2 64 64 4161 1784 ~1 hour 53 30018 1750
2 64 1000 1001001 4715 ~4 hours 1600 OOM

The two last multi-pass results are too slow, because JSONBs becomes
TOASTed. For measuring master in these tests, I disabled
WIDTH_THRESHOLD check which skipped TOASTed values > 1KB.

Next, I am going to try to disable all-paths collection and implement
collection of most common paths (and/or hashed paths maybe).

Hi Nikita,
I and Tomas discussed the design for disabling all-paths collection(collect
stats for only some paths). Below are some thoughts/doubts/questions.

*Point 1)* Please can you elaborate more that how are you going to
implement this(collect stats for only some paths).
*Point 2) *As JSON stats are taking time so should we add an on/off switch
to collect JSON stats?
*Point 3)* We thought of one more design: we can give an explicit path to
collect stats for a particular path only or we can pass a subset of the
JSON values but this may require a lot of code changes like syntax and all
so we are thinking that it will be good if we can collect stats only for
some common paths(by limit or any other way)

Thoughts?

--
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com

#22

Tomas Vondra

tomas.vondra@enterprisedb.com

over 3 years ago

In reply to: Mahendra Singh Thalor (#21)

Re: Collecting statistics about contents of JSONB columns

On 5/17/22 13:44, Mahendra Singh Thalor wrote:

...

Hi Nikita,
I and Tomas discussed the design for disabling all-paths
collection(collect stats for only some paths). Below are some
thoughts/doubts/questions.

*Point 1)* Please can you elaborate more that how are you going to
implement this(collect stats for only some paths).

I think Nikita mentioned he plans to only build stats only for most
common paths, which seems generally straightforward:

1) first pass over the documents, collect distinct paths and track how
many times we saw each one

2) in the second pass extract stats only for the most common paths (e.g.
the top 100 most common ones, or whatever the statistics target says)

I guess we might store at least the frequencing for uncommon paths,
which seems somewhat useful for selectivity estimation.

I wonder if we might further optimize this for less common paths. AFAICS
one of the reasons why we process the paths one by one (in the second
pass) is to limit memory consumption. By processing a single path, we
only need to accumulate values for that path.

But if we know the path is uncommon, we know there'll be few values. For
example the path may be only in 100 documents, not the whole sample. So
maybe we might process multiple paths at once (which would mean we don't
need to detoast the JSON documents that often, etc.).

OTOH that may be pointless, because if the paths are uncommon, chances
are the subsets of documents will be different, in which case it's
probably cheaper to just process the paths one by one.

*Point 2) *As JSON stats are taking time so should we add an on/off
switch to collect JSON stats?

IMHO we should think about doing that. I think it's not really possible
to eliminate (significant) regressions for all corner cases, and in many
cases people don't even need this statistics (e.g. when just storing and
retrieving JSON docs, without querying contents of the docs).

I don't know how exactly to enable/disable this - it very much depends
on how we store the stats. If we store that in pg_statistic, then ALTER
TABLE ... ALTER COLUMN seems like the right way to enable/disable these
path stats. We might also have a new "json" stats and do this through
CREATE STATISTICS. Or something else, not sure.

*Point 3)* We thought of one more design: we can give an explicit path
to collect stats for a particular path only or we can pass a subset of
the JSON values but this may require a lot of code changes like syntax
and all so we are thinking that it will be good if we can collect stats
only for some common paths(by limit or any other way)

I'm not sure I understand what this is saying, particularly the part
about subset of JSON values. Can you elaborate?

I can imagine specifying a list of interesting paths, and we'd only
collect stats for the matching subset of the JSON documents. So if you
have huge JSON documents with complex schema, but you only query a very
limited subset of paths, we could restrict ANALYZE to this subset.

In fact, that's what the 'selective analyze' commit [1]https://github.com/postgrespro/postgres/commit/7ab7397450df153e5a8563c978728cb731a0df33 in Nikita's
original patch set does in principle. We'd probably need to improve this
in some ways (e.g. to allow defining the column filter not just in
ANALYZE itself). I left it out of this patch to keep the patch as simple
as possible.

But why/how exactly would we limit the "JSON values"? Can you give some
example demonstrating that in practice?

regards

[1]: https://github.com/postgrespro/postgres/commit/7ab7397450df153e5a8563c978728cb731a0df33
https://github.com/postgrespro/postgres/commit/7ab7397450df153e5a8563c978728cb731a0df33

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company