Multivariate MCV stats can leak data to unprivileged users

Started by Dean Rasheedover 6 years ago34 messages

dean.a.rasheed@gmail.com

over 6 years ago

1 attachment(s)

While working on 1aebfbea83c, I noticed that the new multivariate MCV
stats feature suffers from the same problem, and also the original
problems that were fixed in e2d4ef8de8 and earlier --- namely that a
user can see values in the MCV lists that they shouldn't see (values
from tables that they don't have privileges on).

I think there are 2 separate issues here:

1). The table pg_statistic_ext is accessible to anyone, so any user
can see the MCV lists of any table. I think we should give this the
same treatment as pg_statistic, and hide it behind a security barrier
view, revoking public access from the table.

2). The multivariate MCV stats planner code can be made to invoke
user-defined operators, so a user can create a leaky operator and use
it to reveal data values from the MCV lists even if they have no
permissions on the table.

Attached is a draft patch to fix (2), which hooks into
statext_is_compatible_clause().

I haven't thought much about (1). There are some questions about what
exactly the view should look like. Probably it should translate table
oids to names, like pg_stats does, but should it also translate column
indexes to names? That could get fiddly. Should it unpack MCV items?

I'll raise this as an open item for PG12.

Regards,
Dean

Attachments:

fix-mcv-stats-planner-leak.patchtext/x-patch; charset=US-ASCII; name=fix-mcv-stats-planner-leak.patchDownload

diff --git a/src/backend/statistics/extended_stats.c b/src/backend/statistics/extended_stats.c
new file mode 100644
index ac0ae52..14a7041
--- a/src/backend/statistics/extended_stats.c
+++ b/src/backend/statistics/extended_stats.c
@@ -23,6 +23,7 @@
 #include "catalog/indexing.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_statistic_ext.h"
+#include "miscadmin.h"
 #include "nodes/nodeFuncs.h"
 #include "optimizer/clauses.h"
 #include "optimizer/optimizer.h"
@@ -753,7 +754,8 @@ choose_best_statistics(List *stats, Bitm
  * attribute numbers from all compatible clauses (recursively).
  */
 static bool
-statext_is_compatible_clause_internal(Node *clause, Index relid, Bitmapset **attnums)
+statext_is_compatible_clause_internal(PlannerInfo *root, Node *clause,
+									  Index relid, Bitmapset **attnums)
 {
 	/* Look inside any binary-compatible relabeling (as in examine_variable) */
 	if (IsA(clause, RelabelType))
@@ -784,6 +786,7 @@ statext_is_compatible_clause_internal(No
 	/* (Var op Const) or (Const op Var) */
 	if (is_opclause(clause))
 	{
+		RangeTblEntry *rte = root->simple_rte_array[relid];
 		OpExpr	   *expr = (OpExpr *) clause;
 		Var		   *var;
 		bool		varonleft = true;
@@ -826,9 +829,24 @@ statext_is_compatible_clause_internal(No
 				return false;
 		}
 
+		/*
+		 * If there are any securityQuals on the RTE from security barrier
+		 * views or RLS policies, then the user may not have access to all the
+		 * table's data, and we must check that the operator is leak-proof.
+		 *
+		 * If the operator is leaky, then we must ignore this clause for the
+		 * purposes of estimating with MCV lists, otherwise the operator might
+		 * reveal values from the MCV list that the user doesn't have
+		 * permission to see.
+		 */
+		if (rte->securityQuals != NIL &&
+			!get_func_leakproof(get_opcode(expr->opno)))
+			return false;
+
 		var = (varonleft) ? linitial(expr->args) : lsecond(expr->args);
 
-		return statext_is_compatible_clause_internal((Node *) var, relid, attnums);
+		return statext_is_compatible_clause_internal(root, (Node *) var,
+													 relid, attnums);
 	}
 
 	/* AND/OR/NOT clause */
@@ -859,7 +877,8 @@ statext_is_compatible_clause_internal(No
 			 * Had we found incompatible clause in the arguments, treat the
 			 * whole clause as incompatible.
 			 */
-			if (!statext_is_compatible_clause_internal((Node *) lfirst(lc),
+			if (!statext_is_compatible_clause_internal(root,
+													   (Node *) lfirst(lc),
 													   relid, attnums))
 				return false;
 		}
@@ -879,7 +898,8 @@ statext_is_compatible_clause_internal(No
 		if (!IsA(nt->arg, Var))
 			return false;
 
-		return statext_is_compatible_clause_internal((Node *) (nt->arg), relid, attnums);
+		return statext_is_compatible_clause_internal(root, (Node *) (nt->arg),
+													 relid, attnums);
 	}
 
 	return false;
@@ -902,9 +922,12 @@ statext_is_compatible_clause_internal(No
  * complex cases, for example (Var op Var).
  */
 static bool
-statext_is_compatible_clause(Node *clause, Index relid, Bitmapset **attnums)
+statext_is_compatible_clause(PlannerInfo *root, Node *clause, Index relid,
+							 Bitmapset **attnums)
 {
+	RangeTblEntry *rte = root->simple_rte_array[relid];
 	RestrictInfo *rinfo = (RestrictInfo *) clause;
+	Oid			userid;
 
 	if (!IsA(rinfo, RestrictInfo))
 		return false;
@@ -917,8 +940,43 @@ statext_is_compatible_clause(Node *claus
 	if (bms_membership(rinfo->clause_relids) != BMS_SINGLETON)
 		return false;
 
-	return statext_is_compatible_clause_internal((Node *) rinfo->clause,
-												 relid, attnums);
+	/* Check the clause and determine what attributes it references. */
+	if (!statext_is_compatible_clause_internal(root, (Node *) rinfo->clause,
+											   relid, attnums))
+		return false;
+
+	/*
+	 * Check that the user has permission to read all these attributes.  Use
+	 * checkAsUser if it's set, in case we're accessing the table via a view.
+	 */
+	userid = rte->checkAsUser ? rte->checkAsUser : GetUserId();
+
+	if (pg_class_aclcheck(rte->relid, userid, ACL_SELECT) != ACLCHECK_OK)
+	{
+		/* Don't have table privilege, must check individual columns */
+		if (bms_is_member(InvalidAttrNumber, *attnums))
+		{
+			/* Have a whole-row reference, must have access to all columns */
+			if (pg_attribute_aclcheck_all(rte->relid, userid, ACL_SELECT,
+										  ACLMASK_ALL) != ACLCHECK_OK)
+				return false;
+		}
+		else
+		{
+			/* Check the columns referenced by the clause */
+			int			attnum = -1;
+
+			while ((attnum = bms_next_member(*attnums, attnum)) >= 0)
+			{
+				if (pg_attribute_aclcheck(rte->relid, attnum, userid,
+										  ACL_SELECT) != ACLCHECK_OK)
+					return false;
+			}
+		}
+	}
+
+	/* If we reach here, the clause is OK */
+	return true;
 }
 
 /*
@@ -1020,7 +1078,7 @@ statext_mcv_clauselist_selectivity(Plann
 		Bitmapset  *attnums = NULL;
 
 		if (!bms_is_member(listidx, *estimatedclauses) &&
-			statext_is_compatible_clause(clause, rel->relid, &attnums))
+			statext_is_compatible_clause(root, clause, rel->relid, &attnums))
 		{
 			list_attnums[listidx] = attnums;
 			clauses_attnums = bms_add_members(clauses_attnums, attnums);
diff --git a/src/test/regress/expected/stats_ext.out b/src/test/regress/expected/stats_ext.out
new file mode 100644
index 6dfca7a..aea5ada
--- a/src/test/regress/expected/stats_ext.out
+++ b/src/test/regress/expected/stats_ext.out
@@ -684,3 +684,63 @@ SELECT * FROM check_estimated_rows('SELE
          1 |      0
 (1 row)
 
+-- Permission tests. Users should not be able to see specific data values in
+-- the extended statistics, if they lack permission to see those values in
+-- the underlying table.
+--
+-- Currently this is only relevant for MCV stats.
+CREATE TABLE priv_test_tbl (
+    a int,
+    b int
+);
+INSERT INTO priv_test_tbl
+     SELECT mod(i,5), mod(i,10) FROM generate_series(1,100) s(i);
+CREATE STATISTICS priv_test_stats (mcv) ON a, b
+  FROM priv_test_tbl;
+ANALYZE priv_test_tbl;
+-- User with no access
+CREATE USER regress_stats_user1;
+SET SESSION AUTHORIZATION regress_stats_user1;
+SELECT * FROM priv_test_tbl; -- Permission denied
+ERROR:  permission denied for table priv_test_tbl
+-- Attempt to gain access using a leaky operator
+CREATE FUNCTION op_leak(int, int) RETURNS bool
+    AS 'BEGIN RAISE NOTICE ''op_leak => %, %'', $1, $2; RETURN $1 < $2; END'
+    LANGUAGE plpgsql;
+CREATE OPERATOR <<< (procedure = op_leak, leftarg = int, rightarg = int,
+                     restrict = scalarltsel);
+SELECT * FROM priv_test_tbl WHERE a <<< 0 AND b <<< 0; -- Permission denied
+ERROR:  permission denied for table priv_test_tbl
+DELETE FROM priv_test_tbl WHERE a <<< 0 AND b <<< 0; -- Permission denied
+ERROR:  permission denied for table priv_test_tbl
+-- Grant access via a security barrier view, but hide all data
+RESET SESSION AUTHORIZATION;
+CREATE VIEW priv_test_view WITH (security_barrier=true)
+    AS SELECT * FROM priv_test_tbl WHERE false;
+GRANT SELECT, DELETE ON priv_test_view TO regress_stats_user1;
+-- Should now have access via the view, but see nothing and leak nothing
+SET SESSION AUTHORIZATION regress_stats_user1;
+SELECT * FROM priv_test_view WHERE a <<< 0 AND b <<< 0; -- Should not leak
+ a | b 
+---+---
+(0 rows)
+
+DELETE FROM priv_test_view WHERE a <<< 0 AND b <<< 0; -- Should not leak
+-- Grant table access, but hide all data with RLS
+RESET SESSION AUTHORIZATION;
+ALTER TABLE priv_test_tbl ENABLE ROW LEVEL SECURITY;
+GRANT SELECT, DELETE ON priv_test_tbl TO regress_stats_user1;
+-- Should now have direct table access, but see nothing and leak nothing
+SET SESSION AUTHORIZATION regress_stats_user1;
+SELECT * FROM priv_test_tbl WHERE a <<< 0 AND b <<< 0; -- Should not leak
+ a | b 
+---+---
+(0 rows)
+
+DELETE FROM priv_test_tbl WHERE a <<< 0 AND b <<< 0; -- Should not leak
+-- Tidy up
+DROP OPERATOR <<< (int, int);
+DROP FUNCTION op_leak(int, int);
+RESET SESSION AUTHORIZATION;
+DROP VIEW priv_test_view;
+DROP TABLE priv_test_tbl;
diff --git a/src/test/regress/sql/stats_ext.sql b/src/test/regress/sql/stats_ext.sql
new file mode 100644
index c6a5776..2ac1d27
--- a/src/test/regress/sql/stats_ext.sql
+++ b/src/test/regress/sql/stats_ext.sql
@@ -434,3 +434,63 @@ SELECT * FROM check_estimated_rows('SELE
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a AND NOT b AND c');
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a AND b AND NOT c');
+
+-- Permission tests. Users should not be able to see specific data values in
+-- the extended statistics, if they lack permission to see those values in
+-- the underlying table.
+--
+-- Currently this is only relevant for MCV stats.
+CREATE TABLE priv_test_tbl (
+    a int,
+    b int
+);
+
+INSERT INTO priv_test_tbl
+     SELECT mod(i,5), mod(i,10) FROM generate_series(1,100) s(i);
+
+CREATE STATISTICS priv_test_stats (mcv) ON a, b
+  FROM priv_test_tbl;
+
+ANALYZE priv_test_tbl;
+
+-- User with no access
+CREATE USER regress_stats_user1;
+SET SESSION AUTHORIZATION regress_stats_user1;
+SELECT * FROM priv_test_tbl; -- Permission denied
+
+-- Attempt to gain access using a leaky operator
+CREATE FUNCTION op_leak(int, int) RETURNS bool
+    AS 'BEGIN RAISE NOTICE ''op_leak => %, %'', $1, $2; RETURN $1 < $2; END'
+    LANGUAGE plpgsql;
+CREATE OPERATOR <<< (procedure = op_leak, leftarg = int, rightarg = int,
+                     restrict = scalarltsel);
+SELECT * FROM priv_test_tbl WHERE a <<< 0 AND b <<< 0; -- Permission denied
+DELETE FROM priv_test_tbl WHERE a <<< 0 AND b <<< 0; -- Permission denied
+
+-- Grant access via a security barrier view, but hide all data
+RESET SESSION AUTHORIZATION;
+CREATE VIEW priv_test_view WITH (security_barrier=true)
+    AS SELECT * FROM priv_test_tbl WHERE false;
+GRANT SELECT, DELETE ON priv_test_view TO regress_stats_user1;
+
+-- Should now have access via the view, but see nothing and leak nothing
+SET SESSION AUTHORIZATION regress_stats_user1;
+SELECT * FROM priv_test_view WHERE a <<< 0 AND b <<< 0; -- Should not leak
+DELETE FROM priv_test_view WHERE a <<< 0 AND b <<< 0; -- Should not leak
+
+-- Grant table access, but hide all data with RLS
+RESET SESSION AUTHORIZATION;
+ALTER TABLE priv_test_tbl ENABLE ROW LEVEL SECURITY;
+GRANT SELECT, DELETE ON priv_test_tbl TO regress_stats_user1;
+
+-- Should now have direct table access, but see nothing and leak nothing
+SET SESSION AUTHORIZATION regress_stats_user1;
+SELECT * FROM priv_test_tbl WHERE a <<< 0 AND b <<< 0; -- Should not leak
+DELETE FROM priv_test_tbl WHERE a <<< 0 AND b <<< 0; -- Should not leak
+
+-- Tidy up
+DROP OPERATOR <<< (int, int);
+DROP FUNCTION op_leak(int, int);
+RESET SESSION AUTHORIZATION;
+DROP VIEW priv_test_view;
+DROP TABLE priv_test_tbl;

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 6 years ago

In reply to: Dean Rasheed (#1)

1 attachment(s)

Re: Multivariate MCV stats can leak data to unprivileged users

On Fri, May 10, 2019 at 10:19:44AM +0100, Dean Rasheed wrote:

While working on 1aebfbea83c, I noticed that the new multivariate MCV
stats feature suffers from the same problem, and also the original
problems that were fixed in e2d4ef8de8 and earlier --- namely that a
user can see values in the MCV lists that they shouldn't see (values
from tables that they don't have privileges on).

I think there are 2 separate issues here:

1). The table pg_statistic_ext is accessible to anyone, so any user
can see the MCV lists of any table. I think we should give this the
same treatment as pg_statistic, and hide it behind a security barrier
view, revoking public access from the table.

2). The multivariate MCV stats planner code can be made to invoke
user-defined operators, so a user can create a leaky operator and use
it to reveal data values from the MCV lists even if they have no
permissions on the table.

Attached is a draft patch to fix (2), which hooks into
statext_is_compatible_clause().

I think that patch is good.

I haven't thought much about (1). There are some questions about what
exactly the view should look like. Probably it should translate table
oids to names, like pg_stats does, but should it also translate column
indexes to names? That could get fiddly. Should it unpack MCV items?

Yeah. I suggest we add a simple pg_stats_ext view, similar to pg_stats.
It would:

(1) translate the schema / relation / attribute names

I don't see why translating column indexes to names would be fiddly.
It's a matter of simple unnest + join, no? Or what issues you see?

(2) include values for ndistinct / dependencies, if built

Those don't include any actual values, so this should be OK. You might
make the argument that even this does leak a bit of information (e.g.
when you can see values in one column, and you know there's a strong
functional dependence, it tells you something about the other column
which you may not see). But I think we kinda already leak information
about that through estimates, so maybe that's not an issue.

(3) include MCV list only when user has access to all columns

Essentially, if the user is missing 'select' privilege on at least one
of the columns, there'll be NULL. Otherwise the MCV value.

The attached patch adds pg_stats_ext doing this. I don't claim it's the
best possible query backing the view, so any improvements are welcome.

I've been thinking we might somehow filter the statistics values, e.g. by
not showing values for attributes the user has no 'select' privilege on,
but that seems like a can of worms. It'd lead to MCV items that can't be
distinguished because the only difference was the removed attribute, and
so on. So I propose we simply show/hide the whole MCV list.

Likewise, I don't think it makes sense to expand the MCV list in this
view - that works for the single-dimensional case, because then the
list is expanded into two arrays (values + frequencies), which are easy
to process further. But for multivariate MCV lists that's much more
complex - we don't know how many attributes are there, for example.

So I suggest we just show the pg_mcv_list value as is, and leave it up
to the user to call the pg_mcv_list_items() function if needed.

This will also work for histograms, where expanding the value in the
pg_stats_ext would be even trickier.

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

add-pg-stats-ext.patchtext/plain; charset=us-asciiDownload

diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 566100d6df..b1302cfa5d 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -253,6 +253,26 @@ CREATE VIEW pg_stats WITH (security_barrier) AS
 
 REVOKE ALL on pg_statistic FROM public;
 
+CREATE VIEW pg_stats_ext WITH (security_barrier) AS
+    SELECT
+        nspname AS schemaname,
+        relname AS tablename,
+        (SELECT array_agg(attname) FROM unnest(stxkeys) k
+           JOIN pg_attribute a ON (a.attnum = k)
+          WHERE attrelid = stxrelid) AS attnames,
+        stxkind,
+        stxndistinct,
+        stxdependencies,
+        (CASE WHEN EXISTS (SELECT 1 FROM unnest(stxkeys) k
+           JOIN pg_attribute a ON (a.attnum = k)
+          WHERE attrelid = stxrelid AND NOT has_column_privilege(c.oid, a.attnum, 'select')) THEN NULL
+         ELSE stxmcv END) stxmcv
+    FROM pg_statistic_ext s JOIN pg_class c ON (c.oid = s.stxrelid)
+         LEFT JOIN pg_namespace n ON (n.oid = c.relnamespace)
+    WHERE (c.relrowsecurity = false OR NOT row_security_active(c.oid));
+
+REVOKE ALL on pg_statistic_ext FROM public;
+
 CREATE VIEW pg_publication_tables AS
     SELECT
         P.pubname AS pubname,

Dean Rasheed

dean.a.rasheed@gmail.com

over 6 years ago

In reply to: Tomas Vondra (#2)

1 attachment(s)

Re: Multivariate MCV stats can leak data to unprivileged users

On Mon, 13 May 2019 at 23:36, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

Yeah. I suggest we add a simple pg_stats_ext view, similar to pg_stats.
It would:

(1) translate the schema / relation / attribute names

I don't see why translating column indexes to names would be fiddly.
It's a matter of simple unnest + join, no? Or what issues you see?

Yeah, you're right. I thought it would be harder than that. One minor
thing -- I think we should have an explicit ORDER BY where we collect
the column names, to guarantee that they're listed in the right order.

(2) include values for ndistinct / dependencies, if built

Those don't include any actual values, so this should be OK. You might
make the argument that even this does leak a bit of information (e.g.
when you can see values in one column, and you know there's a strong
functional dependence, it tells you something about the other column
which you may not see). But I think we kinda already leak information
about that through estimates, so maybe that's not an issue.

Hmm. For normal statistics, if the user has no permissions on the
table, they don't get to see any of these kinds of statistics, not
even things like n_distinct. I think we should do the same here --
i.e., if the user has no permissions on the table, don't let them see
anything. Such a user will not be able to run EXPLAIN on queries
against the table, so they won't get to see any estimates, and I don't
think they should get to see any extended statistics either.

(3) include MCV list only when user has access to all columns

Essentially, if the user is missing 'select' privilege on at least one
of the columns, there'll be NULL. Otherwise the MCV value.

OK, that seems reasonable, except as I said above, I think that should
apply to all statistics, not just the MCV lists.

The attached patch adds pg_stats_ext doing this. I don't claim it's the
best possible query backing the view, so any improvements are welcome.

I've been thinking we might somehow filter the statistics values, e.g. by
not showing values for attributes the user has no 'select' privilege on,
but that seems like a can of worms. It'd lead to MCV items that can't be
distinguished because the only difference was the removed attribute, and
so on. So I propose we simply show/hide the whole MCV list.

Agreed.

Likewise, I don't think it makes sense to expand the MCV list in this
view - that works for the single-dimensional case, because then the
list is expanded into two arrays (values + frequencies), which are easy
to process further. But for multivariate MCV lists that's much more
complex - we don't know how many attributes are there, for example.

So I suggest we just show the pg_mcv_list value as is, and leave it up
to the user to call the pg_mcv_list_items() function if needed.

I think expanding the MCV lists is actually quite useful because then
you can see arrays of values, nulls, frequencies and base frequencies
in a reasonably readable form (it certainly looks better than a binary
dump), without needing to join to a function call, which is a bit
ugly, and unmemorable.

The results from the attached look quite reasonable at first glance.
It contains a few other changes as well:

1). It exposes the schema, name and owner of the statistics object as
well via the view, for completeness.

2). It changes a few column names -- traditionally these views strip
off the column name prefix from the underlying table, so I've
attempted to be consistent with other similar views.

3). I added array-valued columns for each of the MCV list components,
which makes it more like pg_stats.

4). I moved all the permission checks to the top-level WHERE clause,
so a user needs to have select permissions on all the columns
mentioned by the statistics, and the table mustn't have RLS in effect,
otherwise the user won't see the row for that statistics object.

5). Some columns from pg_statistic_ext have to be made visible for
psql \d to work. Basically, it needs to be able to query for the
existence of extended statistics, but it doesn't need to see the
actual statistical data. Of course, we could change psql to use the
view, but this way gives us better backwards compatibility with older
clients.

This is still going to break compatibility of any user code looking at
stxndistinct or stxdependencies from pg_statistic_ext, but at least it
doesn't break old versions of psql.

Note: doc and test updates still to do.

Regards,
Dean

Attachments:

add-pg-stats-ext-v2.patchtext/x-patch; charset=US-ASCII; name=add-pg-stats-ext-v2.patchDownload

diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
new file mode 100644
index 566100d..3227d71
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -253,6 +253,46 @@ CREATE VIEW pg_stats WITH (security_barr
 
 REVOKE ALL on pg_statistic FROM public;
 
+CREATE VIEW pg_stats_ext WITH (security_barrier) AS
+    SELECT cn.nspname AS schemaname,
+           c.relname AS tablename,
+           sn.nspname AS statistics_schemaname,
+           s.stxname AS statistics_name,
+           pg_get_userbyid(c.relowner) AS statistics_owner,
+           ( SELECT array_agg(a.attname ORDER BY a.attnum)
+             FROM unnest(s.stxkeys) k
+                  JOIN pg_attribute a
+                       ON (a.attrelid = s.stxrelid AND a.attnum = k)
+           ) AS attnames,
+           s.stxkind AS kinds,
+           s.stxndistinct AS n_distinct,
+           s.stxdependencies AS dependencies,
+           m.most_common_vals,
+           m.most_common_val_nulls,
+           m.most_common_freqs,
+           m.most_common_base_freqs
+    FROM pg_statistic_ext s JOIN pg_class c ON (c.oid = s.stxrelid)
+         LEFT JOIN pg_namespace cn ON (cn.oid = c.relnamespace)
+         LEFT JOIN pg_namespace sn ON (sn.oid = s.stxnamespace)
+         LEFT JOIN LATERAL
+                   ( SELECT array_agg(values) AS most_common_vals,
+                            array_agg(nulls) AS most_common_val_nulls,
+                            array_agg(frequency) AS most_common_freqs,
+                            array_agg(base_frequency) AS most_common_base_freqs
+                     FROM pg_mcv_list_items(s.stxmcv)
+                   ) m ON s.stxmcv IS NOT NULL
+    WHERE NOT EXISTS
+              ( SELECT 1
+                FROM unnest(stxkeys) k
+                     JOIN pg_attribute a
+                          ON (a.attrelid = s.stxrelid AND a.attnum = k)
+                WHERE NOT has_column_privilege(c.oid, a.attnum, 'select') )
+    AND (c.relrowsecurity = false OR NOT row_security_active(c.oid));
+
+REVOKE ALL on pg_statistic_ext FROM public;
+GRANT SELECT (oid, stxrelid, stxname, stxnamespace, stxowner, stxkeys, stxkind)
+    ON pg_statistic_ext TO public;
+
 CREATE VIEW pg_publication_tables AS
     SELECT
         P.pubname AS pubname,

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 6 years ago

In reply to: Dean Rasheed (#3)

Re: Multivariate MCV stats can leak data to unprivileged users

On Thu, May 16, 2019 at 02:28:03PM +0100, Dean Rasheed wrote:

On Mon, 13 May 2019 at 23:36, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

Yeah. I suggest we add a simple pg_stats_ext view, similar to pg_stats.
It would:

(1) translate the schema / relation / attribute names

I don't see why translating column indexes to names would be fiddly.
It's a matter of simple unnest + join, no? Or what issues you see?

Yeah, you're right. I thought it would be harder than that. One minor
thing -- I think we should have an explicit ORDER BY where we collect
the column names, to guarantee that they're listed in the right order.

Good point.

(2) include values for ndistinct / dependencies, if built

Those don't include any actual values, so this should be OK. You might
make the argument that even this does leak a bit of information (e.g.
when you can see values in one column, and you know there's a strong
functional dependence, it tells you something about the other column
which you may not see). But I think we kinda already leak information
about that through estimates, so maybe that's not an issue.

Hmm. For normal statistics, if the user has no permissions on the
table, they don't get to see any of these kinds of statistics, not
even things like n_distinct. I think we should do the same here --
i.e., if the user has no permissions on the table, don't let them see
anything. Such a user will not be able to run EXPLAIN on queries
against the table, so they won't get to see any estimates, and I don't
think they should get to see any extended statistics either.

OK, I haven't realized we don't show that even for normal stats.

(3) include MCV list only when user has access to all columns

Essentially, if the user is missing 'select' privilege on at least one
of the columns, there'll be NULL. Otherwise the MCV value.

OK, that seems reasonable, except as I said above, I think that should
apply to all statistics, not just the MCV lists.

The attached patch adds pg_stats_ext doing this. I don't claim it's the
best possible query backing the view, so any improvements are welcome.

I've been thinking we might somehow filter the statistics values, e.g. by
not showing values for attributes the user has no 'select' privilege on,
but that seems like a can of worms. It'd lead to MCV items that can't be
distinguished because the only difference was the removed attribute, and
so on. So I propose we simply show/hide the whole MCV list.

Agreed.

Likewise, I don't think it makes sense to expand the MCV list in this
view - that works for the single-dimensional case, because then the
list is expanded into two arrays (values + frequencies), which are easy
to process further. But for multivariate MCV lists that's much more
complex - we don't know how many attributes are there, for example.

So I suggest we just show the pg_mcv_list value as is, and leave it up
to the user to call the pg_mcv_list_items() function if needed.

I think expanding the MCV lists is actually quite useful because then
you can see arrays of values, nulls, frequencies and base frequencies
in a reasonably readable form (it certainly looks better than a binary
dump), without needing to join to a function call, which is a bit
ugly, and unmemorable.

Hmmm, ok. I think my main worry here is that it may or may not work for
more complex types of extended stats that are likely to come in the
future. Although, maybe it can be made work even for that.

The results from the attached look quite reasonable at first glance.
It contains a few other changes as well:

1). It exposes the schema, name and owner of the statistics object as
well via the view, for completeness.

2). It changes a few column names -- traditionally these views strip
off the column name prefix from the underlying table, so I've
attempted to be consistent with other similar views.

3). I added array-valued columns for each of the MCV list components,
which makes it more like pg_stats.

4). I moved all the permission checks to the top-level WHERE clause,
so a user needs to have select permissions on all the columns
mentioned by the statistics, and the table mustn't have RLS in effect,
otherwise the user won't see the row for that statistics object.

5). Some columns from pg_statistic_ext have to be made visible for
psql \d to work. Basically, it needs to be able to query for the
existence of extended statistics, but it doesn't need to see the
actual statistical data. Of course, we could change psql to use the
view, but this way gives us better backwards compatibility with older
clients.

This is still going to break compatibility of any user code looking at
stxndistinct or stxdependencies from pg_statistic_ext, but at least it
doesn't break old versions of psql.

Note: doc and test updates still to do.

Thanks. I'm travelling today/tomorrow, but I'll do my best to fill in the
missing bits ASAP.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Andres Freund

andres@anarazel.de

over 6 years ago

In reply to: Dean Rasheed (#3)

Re: Multivariate MCV stats can leak data to unprivileged users

Hi,

On 2019-05-16 14:28:03 +0100, Dean Rasheed wrote:

5). Some columns from pg_statistic_ext have to be made visible for
psql \d to work. Basically, it needs to be able to query for the
existence of extended statistics, but it doesn't need to see the
actual statistical data. Of course, we could change psql to use the
view, but this way gives us better backwards compatibility with older
clients.

This is still going to break compatibility of any user code looking at
stxndistinct or stxdependencies from pg_statistic_ext, but at least it
doesn't break old versions of psql.

Hm, it's not normally a goal to keep old psql working against new
postgres versions. And there's plenty other issues preventing a v11 psql
to work against 12. I'd not let this guide any design decisions.

Greetings,

Andres Freund

Dean Rasheed

dean.a.rasheed@gmail.com

over 6 years ago

In reply to: Andres Freund (#5)

Re: Multivariate MCV stats can leak data to unprivileged users

On Fri, 17 May 2019 at 21:29, Andres Freund <andres@anarazel.de> wrote:

On 2019-05-16 14:28:03 +0100, Dean Rasheed wrote:

5). Some columns from pg_statistic_ext have to be made visible for
psql \d to work. Basically, it needs to be able to query for the
existence of extended statistics, but it doesn't need to see the
actual statistical data. Of course, we could change psql to use the
view, but this way gives us better backwards compatibility with older
clients.

This is still going to break compatibility of any user code looking at
stxndistinct or stxdependencies from pg_statistic_ext, but at least it
doesn't break old versions of psql.

Hm, it's not normally a goal to keep old psql working against new
postgres versions. And there's plenty other issues preventing a v11 psql
to work against 12. I'd not let this guide any design decisions.

Ah good point. In fact running "\d some_table" from v11's psql against
a v12 database immediately falls over because of the removal of
relhasoids from pg_class, so this isn't a valid reason for retaining
access to any columns from pg_statistic_ext.

Regards,
Dean

Dean Rasheed

dean.a.rasheed@gmail.com

over 6 years ago

In reply to: Dean Rasheed (#6)

Re: Multivariate MCV stats can leak data to unprivileged users

On Sat, 18 May 2019 at 10:11, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:

On Fri, 17 May 2019 at 21:29, Andres Freund <andres@anarazel.de> wrote:

On 2019-05-16 14:28:03 +0100, Dean Rasheed wrote:

5). Some columns from pg_statistic_ext have to be made visible for
psql \d to work. Basically, it needs to be able to query for the
existence of extended statistics, but it doesn't need to see the
actual statistical data. Of course, we could change psql to use the
view, but this way gives us better backwards compatibility with older
clients.

This is still going to break compatibility of any user code looking at
stxndistinct or stxdependencies from pg_statistic_ext, but at least it
doesn't break old versions of psql.

Hm, it's not normally a goal to keep old psql working against new
postgres versions. And there's plenty other issues preventing a v11 psql
to work against 12. I'd not let this guide any design decisions.

Ah good point. In fact running "\d some_table" from v11's psql against
a v12 database immediately falls over because of the removal of
relhasoids from pg_class, so this isn't a valid reason for retaining
access to any columns from pg_statistic_ext.

On the other hand, pg_dump relies on pg_statistic_ext to work out
which extended statistics objects to dump. If we were to change that
to use pg_stats_ext, then a user dumping a table with RLS using the
--enable-row-security flag wouldn't get any extended statistics
objects, which would be a somewhat surprising result.

That could be fixed by changing the view to return rows for every
extended statistics object, nulling out values in columns that the
user doesn't have permission to see, in a similar way to Tomas'
original patch. It would have to be modified to do the RLS check in
the same place as the privilege checks, rather than in the top-level
WHERE clause, and we'd probably also have to expose OIDs in addition
to names, because that's what clients like psql and pg_dump want. To
me, that feels quite messy though, so I think I'd still vote for
leaving the first few columns of pg_statistic_ext accessible to
public, and not have to change the clients to work differently from
v12 onwards.

Regards,
Dean

Tom Lane

tgl@sss.pgh.pa.us

over 6 years ago

In reply to: Dean Rasheed (#7)

Re: Multivariate MCV stats can leak data to unprivileged users

Dean Rasheed <dean.a.rasheed@gmail.com> writes:

On the other hand, pg_dump relies on pg_statistic_ext to work out
which extended statistics objects to dump. If we were to change that
to use pg_stats_ext, then a user dumping a table with RLS using the
--enable-row-security flag wouldn't get any extended statistics
objects, which would be a somewhat surprising result.

It seems like what we need here is to have a separation between the
*definition* of a stats object (which is what pg_dump needs access
to) and the current actual *data* in it. I'd have expected that
keeping those in separate catalogs would be the thing to do, though
perhaps it's too late for that.

regards, tom lane

Dean Rasheed

dean.a.rasheed@gmail.com

over 6 years ago

In reply to: Tom Lane (#8)

Re: Multivariate MCV stats can leak data to unprivileged users

On Sat, 18 May 2019 at 16:13, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Dean Rasheed <dean.a.rasheed@gmail.com> writes:

On the other hand, pg_dump relies on pg_statistic_ext to work out
which extended statistics objects to dump. If we were to change that
to use pg_stats_ext, then a user dumping a table with RLS using the
--enable-row-security flag wouldn't get any extended statistics
objects, which would be a somewhat surprising result.

It seems like what we need here is to have a separation between the
*definition* of a stats object (which is what pg_dump needs access
to) and the current actual *data* in it. I'd have expected that
keeping those in separate catalogs would be the thing to do, though
perhaps it's too late for that.

Yeah, with the benefit of hindsight, that would have made sense, but
that seems like a pretty big change to be attempting at this stage.

Regards,
Dean

#10

Andres Freund

andres@anarazel.de

over 6 years ago

In reply to: Dean Rasheed (#9)

Re: Multivariate MCV stats can leak data to unprivileged users

Hi,

On May 18, 2019 8:43:29 AM PDT, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:

On Sat, 18 May 2019 at 16:13, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Dean Rasheed <dean.a.rasheed@gmail.com> writes:

On the other hand, pg_dump relies on pg_statistic_ext to work out
which extended statistics objects to dump. If we were to change

that

to use pg_stats_ext, then a user dumping a table with RLS using the
--enable-row-security flag wouldn't get any extended statistics
objects, which would be a somewhat surprising result.

It seems like what we need here is to have a separation between the
*definition* of a stats object (which is what pg_dump needs access
to) and the current actual *data* in it. I'd have expected that
keeping those in separate catalogs would be the thing to do, though
perhaps it's too late for that.

Yeah, with the benefit of hindsight, that would have made sense, but
that seems like a pretty big change to be attempting at this stage.

Otoh, having a suboptimal catalog representation that we'll likely have to change in one of the next releases also isn't great. Seems likely that we'll need post beta1 catversion bumps anyway?

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

#11

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 6 years ago

In reply to: Andres Freund (#10)

Re: Multivariate MCV stats can leak data to unprivileged users

On Sat, May 18, 2019 at 11:49:06AM -0700, Andres Freund wrote:

Hi,

On May 18, 2019 8:43:29 AM PDT, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:

On Sat, 18 May 2019 at 16:13, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Dean Rasheed <dean.a.rasheed@gmail.com> writes:

On the other hand, pg_dump relies on pg_statistic_ext to work out
which extended statistics objects to dump. If we were to change

that

to use pg_stats_ext, then a user dumping a table with RLS using the
--enable-row-security flag wouldn't get any extended statistics
objects, which would be a somewhat surprising result.

It seems like what we need here is to have a separation between the
*definition* of a stats object (which is what pg_dump needs access
to) and the current actual *data* in it. I'd have expected that
keeping those in separate catalogs would be the thing to do, though
perhaps it's too late for that.

Yeah, with the benefit of hindsight, that would have made sense, but
that seems like a pretty big change to be attempting at this stage.

Otoh, having a suboptimal catalog representation that we'll likely have
to change in one of the next releases also isn't great. Seems likely
that we'll need post beta1 catversion bumps anyway?

But that's not an issue intruduced by PG12, it works like that even for
the extended statistics introduced in PG10.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#12

Tom Lane

tgl@sss.pgh.pa.us

over 6 years ago

In reply to: Tomas Vondra (#11)

Re: Multivariate MCV stats can leak data to unprivileged users

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:

On Sat, May 18, 2019 at 11:49:06AM -0700, Andres Freund wrote:

On Sat, 18 May 2019 at 16:13, Tom Lane <tgl@sss.pgh.pa.us> wrote:

It seems like what we need here is to have a separation between the
*definition* of a stats object (which is what pg_dump needs access
to) and the current actual *data* in it.

Otoh, having a suboptimal catalog representation that we'll likely have
to change in one of the next releases also isn't great. Seems likely
that we'll need post beta1 catversion bumps anyway?

But that's not an issue intruduced by PG12, it works like that even for
the extended statistics introduced in PG10.

Yeah, but no time like the present to fix it if it's wrong ...

regards, tom lane

#13

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 6 years ago

In reply to: Tom Lane (#12)

Re: Multivariate MCV stats can leak data to unprivileged users

On Sat, May 18, 2019 at 03:45:11PM -0400, Tom Lane wrote:

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:

On Sat, May 18, 2019 at 11:49:06AM -0700, Andres Freund wrote:

On Sat, 18 May 2019 at 16:13, Tom Lane <tgl@sss.pgh.pa.us> wrote:

It seems like what we need here is to have a separation between the
*definition* of a stats object (which is what pg_dump needs access
to) and the current actual *data* in it.

Otoh, having a suboptimal catalog representation that we'll likely have
to change in one of the next releases also isn't great. Seems likely
that we'll need post beta1 catversion bumps anyway?

But that's not an issue intruduced by PG12, it works like that even for
the extended statistics introduced in PG10.

Yeah, but no time like the present to fix it if it's wrong ...

Sorry, not sure I understand. Are you saying we should try to rework
this before the beta1 release, or that we don't have time to do that?

I think we have four options - rework it before beta1, rework it after
beta1, rework it in PG13 and leave it as it is now.

If the pg_dump thing si the only issue, maybe there's a simple solution
that reworking all the catalogs. Not sure. Are there any other reasons
why the current catalog representation would be suboptimal, or do we
have some precedent of a catalog split this way? I can't think of any.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#14

Tom Lane

tgl@sss.pgh.pa.us

over 6 years ago

In reply to: Tomas Vondra (#13)

Re: Multivariate MCV stats can leak data to unprivileged users

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:

On Sat, May 18, 2019 at 03:45:11PM -0400, Tom Lane wrote:

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:

But that's not an issue intruduced by PG12, it works like that even for
the extended statistics introduced in PG10.

Yeah, but no time like the present to fix it if it's wrong ...

Sorry, not sure I understand. Are you saying we should try to rework
this before the beta1 release, or that we don't have time to do that?

I think we have four options - rework it before beta1, rework it after
beta1, rework it in PG13 and leave it as it is now.

Yup, that's about what the options are. I'm just voting against
"change it in v13". If we're going to change it, then the fewer
major versions that have the bogus definition the better --- and
since we're changing that catalog in v12 anyway, users will see
fewer distinct behaviors if we do this change too.

It's very possibly too late to get it done before beta1,
unfortunately. But as Andres noted, post-beta1 catversion
bumps are hardly unusual, so I do not think "rework after
beta1" is unacceptable.

regards, tom lane

#15

Stephen Frost

sfrost@snowman.net

over 6 years ago

In reply to: Tom Lane (#14)

Re: Multivariate MCV stats can leak data to unprivileged users

Greetings,

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:

On Sat, May 18, 2019 at 03:45:11PM -0400, Tom Lane wrote:

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:

But that's not an issue intruduced by PG12, it works like that even for
the extended statistics introduced in PG10.

Yeah, but no time like the present to fix it if it's wrong ...

Sorry, not sure I understand. Are you saying we should try to rework
this before the beta1 release, or that we don't have time to do that?

I think we have four options - rework it before beta1, rework it after
beta1, rework it in PG13 and leave it as it is now.

Yup, that's about what the options are. I'm just voting against
"change it in v13". If we're going to change it, then the fewer
major versions that have the bogus definition the better --- and
since we're changing that catalog in v12 anyway, users will see
fewer distinct behaviors if we do this change too.

It's very possibly too late to get it done before beta1,
unfortunately. But as Andres noted, post-beta1 catversion
bumps are hardly unusual, so I do not think "rework after
beta1" is unacceptable.

Agreed.

Thanks,

Stephen

#16

Dean Rasheed

dean.a.rasheed@gmail.com

over 6 years ago

In reply to: Stephen Frost (#15)

1 attachment(s)

Re: Multivariate MCV stats can leak data to unprivileged users

On Sun, 19 May 2019 at 00:48, Stephen Frost <sfrost@snowman.net> wrote:

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:

I think we have four options - rework it before beta1, rework it after
beta1, rework it in PG13 and leave it as it is now.

Yup, that's about what the options are. I'm just voting against
"change it in v13". If we're going to change it, then the fewer
major versions that have the bogus definition the better --- and
since we're changing that catalog in v12 anyway, users will see
fewer distinct behaviors if we do this change too.

It's very possibly too late to get it done before beta1,
unfortunately. But as Andres noted, post-beta1 catversion
bumps are hardly unusual, so I do not think "rework after
beta1" is unacceptable.

Agreed.

Yes, that makes sense.

I think we shouldn't risk trying to get this into beta1, but let's try
to get it done as soon as possible after that.

Actually, it doesn't appear to be as big a change as I had feared. As
a starter for ten, here's a patch doing the basic split, moving the
extended stats data into a new catalog pg_statistic_ext_data (I'm not
particularly wedded to that name, it's just the first name that came
to mind).

With this patch the catalogs look like this:

I opted to create/remove pg_statistic_ext_data tuples at the same time
as the pg_statistic_ext tuples, in CreateStatistics() /
RemoveStatisticsById(), so then it's easier to see that they're in a
one-to-one relationship, and other code doesn't need to worry about
the data tuple not existing. The other option would be to defer
inserting the data tuple to ANALYZE.

I couldn't resist moving the code block that declares
pg_statistic_ext's indexes in indexing.h to the right place, assuming
that file is (mostly) sorted alphabetically by catalog name. This puts
the extended stats entries just after the normal stats entries which
seems preferable.

This is only a very rough first draft (e.g., no doc updates), but it
passes all the regression tests.

Regards,
Dean

Attachments:

split-up-pg-statistics-ext.patchtext/x-patch; charset=US-ASCII; name=split-up-pg-statistics-ext.patchDownload

diff --git a/src/backend/catalog/Makefile b/src/backend/catalog/Makefile
new file mode 100644
index f186198..8bece07
--- a/src/backend/catalog/Makefile
+++ b/src/backend/catalog/Makefile
@@ -34,7 +34,7 @@ CATALOG_HEADERS := \
 	pg_attrdef.h pg_constraint.h pg_inherits.h pg_index.h pg_operator.h \
 	pg_opfamily.h pg_opclass.h pg_am.h pg_amop.h pg_amproc.h \
 	pg_language.h pg_largeobject_metadata.h pg_largeobject.h pg_aggregate.h \
-	pg_statistic_ext.h \
+	pg_statistic_ext.h pg_statistic_ext_data.h \
 	pg_statistic.h pg_rewrite.h pg_trigger.h pg_event_trigger.h pg_description.h \
 	pg_cast.h pg_enum.h pg_namespace.h pg_conversion.h pg_depend.h \
 	pg_database.h pg_db_role_setting.h pg_tablespace.h pg_pltemplate.h \
diff --git a/src/backend/commands/statscmds.c b/src/backend/commands/statscmds.c
new file mode 100644
index a191916..7cfaa96
--- a/src/backend/commands/statscmds.c
+++ b/src/backend/commands/statscmds.c
@@ -23,6 +23,7 @@
 #include "catalog/namespace.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_statistic_ext.h"
+#include "catalog/pg_statistic_ext_data.h"
 #include "commands/comment.h"
 #include "commands/defrem.h"
 #include "miscadmin.h"
@@ -67,8 +68,11 @@ CreateStatistics(CreateStatsStmt *stmt)
 	HeapTuple	htup;
 	Datum		values[Natts_pg_statistic_ext];
 	bool		nulls[Natts_pg_statistic_ext];
+	Datum		datavalues[Natts_pg_statistic_ext_data];
+	bool		datanulls[Natts_pg_statistic_ext_data];
 	int2vector *stxkeys;
 	Relation	statrel;
+	Relation	datarel;
 	Relation	rel = NULL;
 	Oid			relid;
 	ObjectAddress parentobject,
@@ -336,11 +340,6 @@ CreateStatistics(CreateStatsStmt *stmt)
 	values[Anum_pg_statistic_ext_stxkeys - 1] = PointerGetDatum(stxkeys);
 	values[Anum_pg_statistic_ext_stxkind - 1] = PointerGetDatum(stxkind);
 
-	/* no statistics built yet */
-	nulls[Anum_pg_statistic_ext_stxndistinct - 1] = true;
-	nulls[Anum_pg_statistic_ext_stxdependencies - 1] = true;
-	nulls[Anum_pg_statistic_ext_stxmcv - 1] = true;
-
 	/* insert it into pg_statistic_ext */
 	htup = heap_form_tuple(statrel->rd_att, values, nulls);
 	CatalogTupleInsert(statrel, htup);
@@ -349,6 +348,29 @@ CreateStatistics(CreateStatsStmt *stmt)
 	relation_close(statrel, RowExclusiveLock);
 
 	/*
+	 * Also build the pg_statistic_ext_data tuple, to hold the actual
+	 * statistics data.
+	 */
+	datarel = table_open(StatisticExtDataRelationId, RowExclusiveLock);
+
+	memset(datavalues, 0, sizeof(datavalues));
+	memset(datanulls, false, sizeof(datanulls));
+
+	datavalues[Anum_pg_statistic_ext_data_stxoid - 1] = ObjectIdGetDatum(statoid);
+
+	/* no statistics built yet */
+	datanulls[Anum_pg_statistic_ext_data_stxndistinct - 1] = true;
+	datanulls[Anum_pg_statistic_ext_data_stxdependencies - 1] = true;
+	datanulls[Anum_pg_statistic_ext_data_stxmcv - 1] = true;
+
+	/* insert it into pg_statistic_ext_data */
+	htup = heap_form_tuple(datarel->rd_att, datavalues, datanulls);
+	CatalogTupleInsert(datarel, htup);
+	heap_freetuple(htup);
+
+	relation_close(datarel, RowExclusiveLock);
+
+	/*
 	 * Invalidate relcache so that others see the new statistics object.
 	 */
 	CacheInvalidateRelcache(rel);
@@ -404,6 +426,23 @@ RemoveStatisticsById(Oid statsOid)
 	Oid			relid;
 
 	/*
+	 * First delete the pg_statistic_ext_data tuple holding the actual
+	 * statistical data.
+	 */
+	relation = table_open(StatisticExtDataRelationId, RowExclusiveLock);
+
+	tup = SearchSysCache1(STATEXTDATASTXOID, ObjectIdGetDatum(statsOid));
+
+	if (!HeapTupleIsValid(tup)) /* should not happen */
+		elog(ERROR, "cache lookup failed for statistics data %u", statsOid);
+
+	CatalogTupleDelete(relation, &tup->t_self);
+
+	ReleaseSysCache(tup);
+
+	table_close(relation, RowExclusiveLock);
+
+	/*
 	 * Delete the pg_statistic_ext tuple.  Also send out a cache inval on the
 	 * associated table, so that dependent plans will be rebuilt.
 	 */
@@ -431,8 +470,8 @@ RemoveStatisticsById(Oid statsOid)
  *
  * This could throw an error if the type change can't be supported.
  * If it can be supported, but the stats must be recomputed, a likely choice
- * would be to set the relevant column(s) of the pg_statistic_ext tuple to
- * null until the next ANALYZE.  (Note that the type change hasn't actually
+ * would be to set the relevant column(s) of the pg_statistic_ext_data tuple
+ * to null until the next ANALYZE.  (Note that the type change hasn't actually
  * happened yet, so one option that's *not* on the table is to recompute
  * immediately.)
  *
@@ -456,11 +495,11 @@ UpdateStatisticsForTypeChange(Oid statsO
 
 	Relation	rel;
 
-	Datum		values[Natts_pg_statistic_ext];
-	bool		nulls[Natts_pg_statistic_ext];
-	bool		replaces[Natts_pg_statistic_ext];
+	Datum		values[Natts_pg_statistic_ext_data];
+	bool		nulls[Natts_pg_statistic_ext_data];
+	bool		replaces[Natts_pg_statistic_ext_data];
 
-	oldtup = SearchSysCache1(STATEXTOID, ObjectIdGetDatum(statsOid));
+	oldtup = SearchSysCache1(STATEXTDATASTXOID, ObjectIdGetDatum(statsOid));
 	if (!HeapTupleIsValid(oldtup))
 		elog(ERROR, "cache lookup failed for statistics object %u", statsOid);
 
@@ -479,14 +518,14 @@ UpdateStatisticsForTypeChange(Oid statsO
 	 * OK, we need to reset some statistics. So let's build the new tuple,
 	 * replacing the affected statistics types with NULL.
 	 */
-	memset(nulls, 0, Natts_pg_statistic_ext * sizeof(bool));
-	memset(replaces, 0, Natts_pg_statistic_ext * sizeof(bool));
-	memset(values, 0, Natts_pg_statistic_ext * sizeof(Datum));
+	memset(nulls, 0, Natts_pg_statistic_ext_data * sizeof(bool));
+	memset(replaces, 0, Natts_pg_statistic_ext_data * sizeof(bool));
+	memset(values, 0, Natts_pg_statistic_ext_data * sizeof(Datum));
 
-	replaces[Anum_pg_statistic_ext_stxmcv - 1] = true;
-	nulls[Anum_pg_statistic_ext_stxmcv - 1] = true;
+	replaces[Anum_pg_statistic_ext_data_stxmcv - 1] = true;
+	nulls[Anum_pg_statistic_ext_data_stxmcv - 1] = true;
 
-	rel = heap_open(StatisticExtRelationId, RowExclusiveLock);
+	rel = heap_open(StatisticExtDataRelationId, RowExclusiveLock);
 
 	/* replace the old tuple */
 	stup = heap_modify_tuple(oldtup,
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
new file mode 100644
index 80441de..a72ddd1
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1308,6 +1308,7 @@ get_relation_statistics(RelOptInfo *rel,
 		Oid			statOid = lfirst_oid(l);
 		Form_pg_statistic_ext staForm;
 		HeapTuple	htup;
+		HeapTuple	dtup;
 		Bitmapset  *keys = NULL;
 		int			i;
 
@@ -1316,6 +1317,10 @@ get_relation_statistics(RelOptInfo *rel,
 			elog(ERROR, "cache lookup failed for statistics object %u", statOid);
 		staForm = (Form_pg_statistic_ext) GETSTRUCT(htup);
 
+		dtup = SearchSysCache1(STATEXTDATASTXOID, ObjectIdGetDatum(statOid));
+		if (!HeapTupleIsValid(dtup))
+			elog(ERROR, "cache lookup failed for statistics object %u", statOid);
+
 		/*
 		 * First, build the array of columns covered.  This is ultimately
 		 * wasted if no stats within the object have actually been built, but
@@ -1325,7 +1330,7 @@ get_relation_statistics(RelOptInfo *rel,
 			keys = bms_add_member(keys, staForm->stxkeys.values[i]);
 
 		/* add one StatisticExtInfo for each kind built */
-		if (statext_is_kind_built(htup, STATS_EXT_NDISTINCT))
+		if (statext_is_kind_built(dtup, STATS_EXT_NDISTINCT))
 		{
 			StatisticExtInfo *info = makeNode(StatisticExtInfo);
 
@@ -1337,7 +1342,7 @@ get_relation_statistics(RelOptInfo *rel,
 			stainfos = lcons(info, stainfos);
 		}
 
-		if (statext_is_kind_built(htup, STATS_EXT_DEPENDENCIES))
+		if (statext_is_kind_built(dtup, STATS_EXT_DEPENDENCIES))
 		{
 			StatisticExtInfo *info = makeNode(StatisticExtInfo);
 
@@ -1349,7 +1354,7 @@ get_relation_statistics(RelOptInfo *rel,
 			stainfos = lcons(info, stainfos);
 		}
 
-		if (statext_is_kind_built(htup, STATS_EXT_MCV))
+		if (statext_is_kind_built(dtup, STATS_EXT_MCV))
 		{
 			StatisticExtInfo *info = makeNode(StatisticExtInfo);
 
@@ -1362,6 +1367,7 @@ get_relation_statistics(RelOptInfo *rel,
 		}
 
 		ReleaseSysCache(htup);
+		ReleaseSysCache(dtup);
 		bms_free(keys);
 	}
 
diff --git a/src/backend/statistics/README.mcv b/src/backend/statistics/README.mcv
new file mode 100644
index c18878f..4733341
--- a/src/backend/statistics/README.mcv
+++ b/src/backend/statistics/README.mcv
@@ -86,11 +86,14 @@ So instead the MCV lists are stored in a
 which however makes it more difficult to inspect the contents. To make that
 easier, there's a SRF returning detailed information about the MCV lists.
 
-    SELECT m.* FROM pg_statistic_ext,
-                    pg_mcv_list_items(stxmcv) m WHERE stxname = 'stts2';
+    SELECT m.* FROM pg_statistic_ext s,
+                    pg_statistic_ext_data d,
+                    pg_mcv_list_items(stxmcv) m
+              WHERE s.stxname = 'stts2'
+                AND d.stxoid = s.oid;
 
 It accepts one parameter - a pg_mcv_list value (which can only be obtained
-from pg_statistic_ext catalog, to defend against malicious input), and
+from pg_statistic_ext_data catalog, to defend against malicious input), and
 returns these columns:
 
     - item index (0, ..., (nitems-1))
diff --git a/src/backend/statistics/dependencies.c b/src/backend/statistics/dependencies.c
new file mode 100644
index 0b26e41..43590f6
--- a/src/backend/statistics/dependencies.c
+++ b/src/backend/statistics/dependencies.c
@@ -17,6 +17,7 @@
 #include "access/sysattr.h"
 #include "catalog/pg_operator.h"
 #include "catalog/pg_statistic_ext.h"
+#include "catalog/pg_statistic_ext_data.h"
 #include "lib/stringinfo.h"
 #include "nodes/nodeFuncs.h"
 #include "optimizer/clauses.h"
@@ -639,12 +640,12 @@ statext_dependencies_load(Oid mvoid)
 	Datum		deps;
 	HeapTuple	htup;
 
-	htup = SearchSysCache1(STATEXTOID, ObjectIdGetDatum(mvoid));
+	htup = SearchSysCache1(STATEXTDATASTXOID, ObjectIdGetDatum(mvoid));
 	if (!HeapTupleIsValid(htup))
 		elog(ERROR, "cache lookup failed for statistics object %u", mvoid);
 
-	deps = SysCacheGetAttr(STATEXTOID, htup,
-						   Anum_pg_statistic_ext_stxdependencies, &isnull);
+	deps = SysCacheGetAttr(STATEXTDATASTXOID, htup,
+						   Anum_pg_statistic_ext_data_stxdependencies, &isnull);
 	if (isnull)
 		elog(ERROR,
 			 "requested statistic kind \"%c\" is not yet built for statistics object %u",
diff --git a/src/backend/statistics/extended_stats.c b/src/backend/statistics/extended_stats.c
new file mode 100644
index ac0ae52..a49b340
--- a/src/backend/statistics/extended_stats.c
+++ b/src/backend/statistics/extended_stats.c
@@ -23,6 +23,7 @@
 #include "catalog/indexing.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_statistic_ext.h"
+#include "catalog/pg_statistic_ext_data.h"
 #include "nodes/nodeFuncs.h"
 #include "optimizer/clauses.h"
 #include "optimizer/optimizer.h"
@@ -65,9 +66,9 @@ typedef struct StatExtEntry
 static List *fetch_statentries_for_relation(Relation pg_statext, Oid relid);
 static VacAttrStats **lookup_var_attr_stats(Relation rel, Bitmapset *attrs,
 					  int nvacatts, VacAttrStats **vacatts);
-static void statext_store(Relation pg_stext, Oid relid,
+static void statext_store(Oid statOid,
 			  MVNDistinct *ndistinct, MVDependencies *dependencies,
-			  MCVList * mcvlist, VacAttrStats **stats);
+			  MCVList * mcv, VacAttrStats **stats);
 
 
 /*
@@ -145,7 +146,7 @@ BuildRelationExtStatistics(Relation oner
 		}
 
 		/* store the statistics in the catalog */
-		statext_store(pg_stext, stat->statOid, ndistinct, dependencies, mcv, stats);
+		statext_store(stat->statOid, ndistinct, dependencies, mcv, stats);
 	}
 
 	table_close(pg_stext, RowExclusiveLock);
@@ -156,7 +157,7 @@ BuildRelationExtStatistics(Relation oner
 
 /*
  * statext_is_kind_built
- *		Is this stat kind built in the given pg_statistic_ext tuple?
+ *		Is this stat kind built in the given pg_statistic_ext_data tuple?
  */
 bool
 statext_is_kind_built(HeapTuple htup, char type)
@@ -166,15 +167,15 @@ statext_is_kind_built(HeapTuple htup, ch
 	switch (type)
 	{
 		case STATS_EXT_NDISTINCT:
-			attnum = Anum_pg_statistic_ext_stxndistinct;
+			attnum = Anum_pg_statistic_ext_data_stxndistinct;
 			break;
 
 		case STATS_EXT_DEPENDENCIES:
-			attnum = Anum_pg_statistic_ext_stxdependencies;
+			attnum = Anum_pg_statistic_ext_data_stxdependencies;
 			break;
 
 		case STATS_EXT_MCV:
-			attnum = Anum_pg_statistic_ext_stxmcv;
+			attnum = Anum_pg_statistic_ext_data_stxmcv;
 			break;
 
 		default:
@@ -312,70 +313,77 @@ lookup_var_attr_stats(Relation rel, Bitm
 
 /*
  * statext_store
- *	Serializes the statistics and stores them into the pg_statistic_ext tuple.
+ *	Serializes the statistics and stores them into the pg_statistic_ext_data
+ *	tuple.
  */
 static void
-statext_store(Relation pg_stext, Oid statOid,
+statext_store(Oid statOid,
 			  MVNDistinct *ndistinct, MVDependencies *dependencies,
 			  MCVList * mcv, VacAttrStats **stats)
 {
 	HeapTuple	stup,
 				oldtup;
-	Datum		values[Natts_pg_statistic_ext];
-	bool		nulls[Natts_pg_statistic_ext];
-	bool		replaces[Natts_pg_statistic_ext];
+	Datum		values[Natts_pg_statistic_ext_data];
+	bool		nulls[Natts_pg_statistic_ext_data];
+	bool		replaces[Natts_pg_statistic_ext_data];
+	Relation	pg_stextdata;
 
 	memset(nulls, true, sizeof(nulls));
 	memset(replaces, false, sizeof(replaces));
 	memset(values, 0, sizeof(values));
 
 	/*
-	 * Construct a new pg_statistic_ext tuple, replacing the calculated stats.
+	 * Construct a new pg_statistic_ext_data tuple, replacing the calculated
+	 * stats.
 	 */
 	if (ndistinct != NULL)
 	{
 		bytea	   *data = statext_ndistinct_serialize(ndistinct);
 
-		nulls[Anum_pg_statistic_ext_stxndistinct - 1] = (data == NULL);
-		values[Anum_pg_statistic_ext_stxndistinct - 1] = PointerGetDatum(data);
+		nulls[Anum_pg_statistic_ext_data_stxndistinct - 1] = (data == NULL);
+		values[Anum_pg_statistic_ext_data_stxndistinct - 1] = PointerGetDatum(data);
 	}
 
 	if (dependencies != NULL)
 	{
 		bytea	   *data = statext_dependencies_serialize(dependencies);
 
-		nulls[Anum_pg_statistic_ext_stxdependencies - 1] = (data == NULL);
-		values[Anum_pg_statistic_ext_stxdependencies - 1] = PointerGetDatum(data);
+		nulls[Anum_pg_statistic_ext_data_stxdependencies - 1] = (data == NULL);
+		values[Anum_pg_statistic_ext_data_stxdependencies - 1] = PointerGetDatum(data);
 	}
 
 	if (mcv != NULL)
 	{
 		bytea	   *data = statext_mcv_serialize(mcv, stats);
 
-		nulls[Anum_pg_statistic_ext_stxmcv - 1] = (data == NULL);
-		values[Anum_pg_statistic_ext_stxmcv - 1] = PointerGetDatum(data);
+		nulls[Anum_pg_statistic_ext_data_stxmcv - 1] = (data == NULL);
+		values[Anum_pg_statistic_ext_data_stxmcv - 1] = PointerGetDatum(data);
 	}
 
 	/* always replace the value (either by bytea or NULL) */
-	replaces[Anum_pg_statistic_ext_stxndistinct - 1] = true;
-	replaces[Anum_pg_statistic_ext_stxdependencies - 1] = true;
-	replaces[Anum_pg_statistic_ext_stxmcv - 1] = true;
+	replaces[Anum_pg_statistic_ext_data_stxndistinct - 1] = true;
+	replaces[Anum_pg_statistic_ext_data_stxdependencies - 1] = true;
+	replaces[Anum_pg_statistic_ext_data_stxmcv - 1] = true;
 
-	/* there should already be a pg_statistic_ext tuple */
-	oldtup = SearchSysCache1(STATEXTOID, ObjectIdGetDatum(statOid));
+	/* there should already be a pg_statistic_ext_data tuple */
+	oldtup = SearchSysCache1(STATEXTDATASTXOID, ObjectIdGetDatum(statOid));
 	if (!HeapTupleIsValid(oldtup))
 		elog(ERROR, "cache lookup failed for statistics object %u", statOid);
 
 	/* replace it */
+	pg_stextdata = table_open(StatisticExtDataRelationId, RowExclusiveLock);
+
 	stup = heap_modify_tuple(oldtup,
-							 RelationGetDescr(pg_stext),
+							 RelationGetDescr(pg_stextdata),
 							 values,
 							 nulls,
 							 replaces);
 	ReleaseSysCache(oldtup);
-	CatalogTupleUpdate(pg_stext, &stup->t_self, stup);
+	CatalogTupleUpdate(pg_stextdata, &stup->t_self, stup);
 
 	heap_freetuple(stup);
+
+	table_close(pg_stextdata, RowExclusiveLock);
 }
 
 /* initialize multi-dimensional sort */
diff --git a/src/backend/statistics/mcv.c b/src/backend/statistics/mcv.c
new file mode 100644
index 05ab6c9..998a1dc
--- a/src/backend/statistics/mcv.c
+++ b/src/backend/statistics/mcv.c
@@ -19,6 +19,7 @@
 #include "access/htup_details.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_statistic_ext.h"
+#include "catalog/pg_statistic_ext_data.h"
 #include "fmgr.h"
 #include "funcapi.h"
 #include "nodes/nodeFuncs.h"
@@ -429,13 +430,13 @@ statext_mcv_load(Oid mvoid)
 	MCVList    *result;
 	bool		isnull;
 	Datum		mcvlist;
-	HeapTuple	htup = SearchSysCache1(STATEXTOID, ObjectIdGetDatum(mvoid));
+	HeapTuple	htup = SearchSysCache1(STATEXTDATASTXOID, ObjectIdGetDatum(mvoid));
 
 	if (!HeapTupleIsValid(htup))
 		elog(ERROR, "cache lookup failed for statistics object %u", mvoid);
 
-	mcvlist = SysCacheGetAttr(STATEXTOID, htup,
-							  Anum_pg_statistic_ext_stxmcv, &isnull);
+	mcvlist = SysCacheGetAttr(STATEXTDATASTXOID, htup,
+							  Anum_pg_statistic_ext_data_stxmcv, &isnull);
 
 	if (isnull)
 		elog(ERROR,
diff --git a/src/backend/statistics/mvdistinct.c b/src/backend/statistics/mvdistinct.c
new file mode 100644
index 133503c..1c857e5
--- a/src/backend/statistics/mvdistinct.c
+++ b/src/backend/statistics/mvdistinct.c
@@ -27,6 +27,7 @@
 
 #include "access/htup_details.h"
 #include "catalog/pg_statistic_ext.h"
+#include "catalog/pg_statistic_ext_data.h"
 #include "utils/fmgrprotos.h"
 #include "utils/lsyscache.h"
 #include "lib/stringinfo.h"
@@ -145,12 +146,12 @@ statext_ndistinct_load(Oid mvoid)
 	Datum		ndist;
 	HeapTuple	htup;
 
-	htup = SearchSysCache1(STATEXTOID, ObjectIdGetDatum(mvoid));
+	htup = SearchSysCache1(STATEXTDATASTXOID, ObjectIdGetDatum(mvoid));
 	if (!HeapTupleIsValid(htup))
 		elog(ERROR, "cache lookup failed for statistics object %u", mvoid);
 
-	ndist = SysCacheGetAttr(STATEXTOID, htup,
-							Anum_pg_statistic_ext_stxndistinct, &isnull);
+	ndist = SysCacheGetAttr(STATEXTDATASTXOID, htup,
+							Anum_pg_statistic_ext_data_stxndistinct, &isnull);
 	if (isnull)
 		elog(ERROR,
 			 "requested statistic kind \"%c\" is not yet built for statistics object %u",
diff --git a/src/backend/utils/cache/syscache.c b/src/backend/utils/cache/syscache.c
new file mode 100644
index ac98c19..20a7de2
--- a/src/backend/utils/cache/syscache.c
+++ b/src/backend/utils/cache/syscache.c
@@ -62,6 +62,7 @@
 #include "catalog/pg_replication_origin.h"
 #include "catalog/pg_statistic.h"
 #include "catalog/pg_statistic_ext.h"
+#include "catalog/pg_statistic_ext_data.h"
 #include "catalog/pg_subscription.h"
 #include "catalog/pg_subscription_rel.h"
 #include "catalog/pg_tablespace.h"
@@ -727,6 +728,17 @@ static const struct cachedesc cacheinfo[
 		},
 		32
 	},
+	{StatisticExtDataRelationId, /* STATEXTDATASTXOID */
+		StatisticExtDataStxoidIndexId,
+		1,
+		{
+			Anum_pg_statistic_ext_data_stxoid,
+			0,
+			0,
+			0
+		},
+		4
+	},
 	{StatisticExtRelationId,	/* STATEXTNAMENSP */
 		StatisticExtNameIndexId,
 		2,
diff --git a/src/include/catalog/indexing.h b/src/include/catalog/indexing.h
new file mode 100644
index f253613..8b51aca
--- a/src/include/catalog/indexing.h
+++ b/src/include/catalog/indexing.h
@@ -186,13 +186,6 @@ DECLARE_UNIQUE_INDEX(pg_largeobject_loid
 DECLARE_UNIQUE_INDEX(pg_largeobject_metadata_oid_index, 2996, on pg_largeobject_metadata using btree(oid oid_ops));
 #define LargeObjectMetadataOidIndexId	2996
 
-DECLARE_UNIQUE_INDEX(pg_statistic_ext_oid_index, 3380, on pg_statistic_ext using btree(oid oid_ops));
-#define StatisticExtOidIndexId	3380
-DECLARE_UNIQUE_INDEX(pg_statistic_ext_name_index, 3997, on pg_statistic_ext using btree(stxname name_ops, stxnamespace oid_ops));
-#define StatisticExtNameIndexId 3997
-DECLARE_INDEX(pg_statistic_ext_relid_index, 3379, on pg_statistic_ext using btree(stxrelid oid_ops));
-#define StatisticExtRelidIndexId 3379
-
 DECLARE_UNIQUE_INDEX(pg_namespace_nspname_index, 2684, on pg_namespace using btree(nspname name_ops));
 #define NamespaceNameIndexId  2684
 DECLARE_UNIQUE_INDEX(pg_namespace_oid_index, 2685, on pg_namespace using btree(oid oid_ops));
@@ -237,6 +230,16 @@ DECLARE_INDEX(pg_shdepend_reference_inde
 DECLARE_UNIQUE_INDEX(pg_statistic_relid_att_inh_index, 2696, on pg_statistic using btree(starelid oid_ops, staattnum int2_ops, stainherit bool_ops));
 #define StatisticRelidAttnumInhIndexId	2696
 
+DECLARE_UNIQUE_INDEX(pg_statistic_ext_oid_index, 3380, on pg_statistic_ext using btree(oid oid_ops));
+#define StatisticExtOidIndexId	3380
+DECLARE_UNIQUE_INDEX(pg_statistic_ext_name_index, 3997, on pg_statistic_ext using btree(stxname name_ops, stxnamespace oid_ops));
+#define StatisticExtNameIndexId 3997
+DECLARE_INDEX(pg_statistic_ext_relid_index, 3379, on pg_statistic_ext using btree(stxrelid oid_ops));
+#define StatisticExtRelidIndexId 3379
+
+DECLARE_UNIQUE_INDEX(pg_statistic_ext_data_stxoid_index, 3433, on pg_statistic_ext_data using btree(stxoid oid_ops));
+#define StatisticExtDataStxoidIndexId 3433
+
 DECLARE_UNIQUE_INDEX(pg_tablespace_oid_index, 2697, on pg_tablespace using btree(oid oid_ops));
 #define TablespaceOidIndexId  2697
 DECLARE_UNIQUE_INDEX(pg_tablespace_spcname_index, 2698, on pg_tablespace using btree(spcname name_ops));
diff --git a/src/include/catalog/pg_statistic_ext.h b/src/include/catalog/pg_statistic_ext.h
new file mode 100644
index e449f9e..d8c5e06
--- a/src/include/catalog/pg_statistic_ext.h
+++ b/src/include/catalog/pg_statistic_ext.h
@@ -1,8 +1,12 @@
 /*-------------------------------------------------------------------------
  *
  * pg_statistic_ext.h
- *	  definition of the "extended statistics" system catalog (pg_statistic_ext)
+ *	  definition of the "extended statistics" system catalog
+ *	  (pg_statistic_ext)
  *
+ * Note that pg_statistic_ext contains the definitions of extended statistics
+ * objects, created by CREATE STATISTICS, but not the actual statistical data,
+ * created by running ANALYZE.
  *
  * Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
@@ -47,9 +51,6 @@ CATALOG(pg_statistic_ext,3381,StatisticE
 #ifdef CATALOG_VARLEN
 	char		stxkind[1] BKI_FORCE_NOT_NULL;	/* statistics kinds requested
 												 * to build */
-	pg_ndistinct stxndistinct;	/* ndistinct coefficients (serialized) */
-	pg_dependencies stxdependencies;	/* dependencies (serialized) */
-	pg_mcv_list stxmcv;			/* MCV (serialized) */
 #endif
 
 } FormData_pg_statistic_ext;
diff --git a/src/include/catalog/pg_statistic_ext_data.h b/src/include/catalog/pg_statistic_ext_data.h
new file mode 100644
index ...76a38d7
--- a/src/include/catalog/pg_statistic_ext_data.h
+++ b/src/include/catalog/pg_statistic_ext_data.h
@@ -0,0 +1,52 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_statistic_ext_data.h
+ *	  definition of the "extended statistics data" system catalog
+ *	  (pg_statistic_ext_data)
+ *
+ * This catalog stores the statistical data for extended statistics objects.
+ *
+ * Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/catalog/pg_statistic_ext_data.h
+ *
+ * NOTES
+ *	  The Catalog.pm module reads this file and derives schema
+ *	  information.
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PG_STATISTIC_EXT_DATA_H
+#define PG_STATISTIC_EXT_DATA_H
+
+#include "catalog/genbki.h"
+#include "catalog/pg_statistic_ext_data_d.h"
+
+/* ----------------
+ *		pg_statistic_ext_data definition.  cpp turns this into
+ *		typedef struct FormData_pg_statistic_ext_data
+ * ----------------
+ */
+CATALOG(pg_statistic_ext_data,3429,StatisticExtDataRelationId)
+{
+	Oid			stxoid;			/* statistics object this data is for */
+
+#ifdef CATALOG_VARLEN			/* variable-length fields start here */
+
+	pg_ndistinct stxndistinct;	/* ndistinct coefficients (serialized) */
+	pg_dependencies stxdependencies;	/* dependencies (serialized) */
+	pg_mcv_list stxmcv;			/* MCV (serialized) */
+
+#endif
+
+} FormData_pg_statistic_ext_data;
+
+/* ----------------
+ *		Form_pg_statistic_ext_data corresponds to a pointer to a tuple with
+ *		the format of pg_statistic_ext_data relation.
+ * ----------------
+ */
+typedef FormData_pg_statistic_ext_data *Form_pg_statistic_ext_data;
+
+#endif							/* PG_STATISTIC_EXT_DATA_H */
diff --git a/src/include/catalog/toasting.h b/src/include/catalog/toasting.h
new file mode 100644
index 5ee628c..e73c570
--- a/src/include/catalog/toasting.h
+++ b/src/include/catalog/toasting.h
@@ -70,6 +70,7 @@ DECLARE_TOAST(pg_rewrite, 2838, 2839);
 DECLARE_TOAST(pg_seclabel, 3598, 3599);
 DECLARE_TOAST(pg_statistic, 2840, 2841);
 DECLARE_TOAST(pg_statistic_ext, 3439, 3440);
+DECLARE_TOAST(pg_statistic_ext_data, 3430, 3431);
 DECLARE_TOAST(pg_trigger, 2336, 2337);
 DECLARE_TOAST(pg_ts_dict, 4169, 4170);
 DECLARE_TOAST(pg_type, 4171, 4172);
diff --git a/src/include/utils/syscache.h b/src/include/utils/syscache.h
new file mode 100644
index 95ee489..1fa63aa
--- a/src/include/utils/syscache.h
+++ b/src/include/utils/syscache.h
@@ -86,6 +86,7 @@ enum SysCacheIdentifier
 	REPLORIGNAME,
 	RULERELNAME,
 	SEQRELID,
+	STATEXTDATASTXOID,
 	STATEXTNAMENSP,
 	STATEXTOID,
 	STATRELATTINH,
diff --git a/src/test/regress/expected/oidjoins.out b/src/test/regress/expected/oidjoins.out
new file mode 100644
index 4edc817..1302cc2
--- a/src/test/regress/expected/oidjoins.out
+++ b/src/test/regress/expected/oidjoins.out
@@ -985,6 +985,14 @@ WHERE	stxowner != 0 AND
 ------+----------
 (0 rows)
 
+SELECT	ctid, stxoid
+FROM	pg_catalog.pg_statistic_ext_data fk
+WHERE	stxoid != 0 AND
+	NOT EXISTS(SELECT 1 FROM pg_catalog.pg_statistic_ext pk WHERE pk.oid = fk.stxoid);
+ ctid | stxoid 
+------+--------
+(0 rows)
+
 SELECT	ctid, spcowner
 FROM	pg_catalog.pg_tablespace fk
 WHERE	spcowner != 0 AND
diff --git a/src/test/regress/expected/sanity_check.out b/src/test/regress/expected/sanity_check.out
new file mode 100644
index 392e8a4..8ff0da1
--- a/src/test/regress/expected/sanity_check.out
+++ b/src/test/regress/expected/sanity_check.out
@@ -149,6 +149,7 @@ pg_shdescription|t
 pg_shseclabel|t
 pg_statistic|t
 pg_statistic_ext|t
+pg_statistic_ext_data|t
 pg_subscription|t
 pg_subscription_rel|t
 pg_tablespace|t
diff --git a/src/test/regress/expected/stats_ext.out b/src/test/regress/expected/stats_ext.out
new file mode 100644
index 6dfca7a..afdd14a
--- a/src/test/regress/expected/stats_ext.out
+++ b/src/test/regress/expected/stats_ext.out
@@ -199,8 +199,10 @@ SELECT * FROM check_estimated_rows('SELE
 -- correct command
 CREATE STATISTICS s10 ON a, b, c FROM ndistinct;
 ANALYZE ndistinct;
-SELECT stxkind, stxndistinct
-  FROM pg_statistic_ext WHERE stxrelid = 'ndistinct'::regclass;
+SELECT s.stxkind, d.stxndistinct
+  FROM pg_statistic_ext s, pg_statistic_ext_data d
+ WHERE s.stxrelid = 'ndistinct'::regclass
+   AND d.stxoid = s.oid;
  stxkind |                    stxndistinct                     
 ---------+-----------------------------------------------------
  {d,f,m} | {"3, 4": 11, "3, 6": 11, "4, 6": 11, "3, 4, 6": 11}
@@ -246,8 +248,10 @@ INSERT INTO ndistinct (a, b, c, filler1)
             cash_words(mod(i,33)::int::money)
        FROM generate_series(1,5000) s(i);
 ANALYZE ndistinct;
-SELECT stxkind, stxndistinct
-  FROM pg_statistic_ext WHERE stxrelid = 'ndistinct'::regclass;
+SELECT s.stxkind, d.stxndistinct
+  FROM pg_statistic_ext s, pg_statistic_ext_data d
+ WHERE s.stxrelid = 'ndistinct'::regclass
+   AND d.stxoid = s.oid;
  stxkind |                        stxndistinct                        
 ---------+------------------------------------------------------------
  {d,f,m} | {"3, 4": 2550, "3, 6": 800, "4, 6": 1632, "3, 4, 6": 5000}
@@ -285,8 +289,10 @@ SELECT * FROM check_estimated_rows('SELE
 (1 row)
 
 DROP STATISTICS s10;
-SELECT stxkind, stxndistinct
-  FROM pg_statistic_ext WHERE stxrelid = 'ndistinct'::regclass;
+SELECT s.stxkind, d.stxndistinct
+  FROM pg_statistic_ext s, pg_statistic_ext_data d
+ WHERE s.stxrelid = 'ndistinct'::regclass
+   AND d.stxoid = s.oid;
  stxkind | stxndistinct 
 ---------+--------------
 (0 rows)
@@ -537,7 +543,10 @@ SELECT * FROM check_estimated_rows('SELE
 
 -- check change of unrelated column type does not reset the MCV statistics
 ALTER TABLE mcv_lists ALTER COLUMN d TYPE VARCHAR(64);
-SELECT stxmcv IS NOT NULL FROM pg_statistic_ext WHERE stxname = 'mcv_lists_stats';
+SELECT d.stxmcv IS NOT NULL
+  FROM pg_statistic_ext s, pg_statistic_ext_data d
+ WHERE s.stxname = 'mcv_lists_stats'
+   AND d.stxoid = s.oid;
  ?column? 
 ----------
  t
@@ -600,8 +609,11 @@ SELECT * FROM check_estimated_rows('SELE
 TRUNCATE mcv_lists;
 INSERT INTO mcv_lists (a, b, c) SELECT 1, 2, 3 FROM generate_series(1,1000) s(i);
 ANALYZE mcv_lists;
-SELECT m.* FROM pg_statistic_ext,
-              pg_mcv_list_items(stxmcv) m WHERE stxname = 'mcv_lists_stats';
+SELECT m.*
+  FROM pg_statistic_ext s, pg_statistic_ext_data d,
+       pg_mcv_list_items(d.stxmcv) m
+ WHERE s.stxname = 'mcv_lists_stats'
+   AND d.stxoid = s.oid;
  index |  values   |  nulls  | frequency | base_frequency 
 -------+-----------+---------+-----------+----------------
      0 | {1, 2, 3} | {f,f,f} |         1 |              1
diff --git a/src/test/regress/sql/oidjoins.sql b/src/test/regress/sql/oidjoins.sql
new file mode 100644
index dbe4a58..b774cbc
--- a/src/test/regress/sql/oidjoins.sql
+++ b/src/test/regress/sql/oidjoins.sql
@@ -493,6 +493,10 @@ SELECT	ctid, stxowner
 FROM	pg_catalog.pg_statistic_ext fk
 WHERE	stxowner != 0 AND
 	NOT EXISTS(SELECT 1 FROM pg_catalog.pg_authid pk WHERE pk.oid = fk.stxowner);
+SELECT	ctid, stxoid
+FROM	pg_catalog.pg_statistic_ext_data fk
+WHERE	stxoid != 0 AND
+	NOT EXISTS(SELECT 1 FROM pg_catalog.pg_statistic_ext pk WHERE pk.oid = fk.stxoid);
 SELECT	ctid, spcowner
 FROM	pg_catalog.pg_tablespace fk
 WHERE	spcowner != 0 AND
diff --git a/src/test/regress/sql/stats_ext.sql b/src/test/regress/sql/stats_ext.sql
new file mode 100644
index c6a5776..7454f64
--- a/src/test/regress/sql/stats_ext.sql
+++ b/src/test/regress/sql/stats_ext.sql
@@ -144,8 +144,10 @@ CREATE STATISTICS s10 ON a, b, c FROM nd
 
 ANALYZE ndistinct;
 
-SELECT stxkind, stxndistinct
-  FROM pg_statistic_ext WHERE stxrelid = 'ndistinct'::regclass;
+SELECT s.stxkind, d.stxndistinct
+  FROM pg_statistic_ext s, pg_statistic_ext_data d
+ WHERE s.stxrelid = 'ndistinct'::regclass
+   AND d.stxoid = s.oid;
 
 -- Hash Aggregate, thanks to estimates improved by the statistic
 SELECT * FROM check_estimated_rows('SELECT COUNT(*) FROM ndistinct GROUP BY a, b');
@@ -170,8 +172,10 @@ INSERT INTO ndistinct (a, b, c, filler1)
 
 ANALYZE ndistinct;
 
-SELECT stxkind, stxndistinct
-  FROM pg_statistic_ext WHERE stxrelid = 'ndistinct'::regclass;
+SELECT s.stxkind, d.stxndistinct
+  FROM pg_statistic_ext s, pg_statistic_ext_data d
+ WHERE s.stxrelid = 'ndistinct'::regclass
+   AND d.stxoid = s.oid;
 
 -- correct esimates
 SELECT * FROM check_estimated_rows('SELECT COUNT(*) FROM ndistinct GROUP BY a, b');
@@ -186,8 +190,10 @@ SELECT * FROM check_estimated_rows('SELE
 
 DROP STATISTICS s10;
 
-SELECT stxkind, stxndistinct
-  FROM pg_statistic_ext WHERE stxrelid = 'ndistinct'::regclass;
+SELECT s.stxkind, d.stxndistinct
+  FROM pg_statistic_ext s, pg_statistic_ext_data d
+ WHERE s.stxrelid = 'ndistinct'::regclass
+   AND d.stxoid = s.oid;
 
 -- dropping the statistics results in under-estimates
 SELECT * FROM check_estimated_rows('SELECT COUNT(*) FROM ndistinct GROUP BY a, b');
@@ -335,7 +341,10 @@ SELECT * FROM check_estimated_rows('SELE
 -- check change of unrelated column type does not reset the MCV statistics
 ALTER TABLE mcv_lists ALTER COLUMN d TYPE VARCHAR(64);
 
-SELECT stxmcv IS NOT NULL FROM pg_statistic_ext WHERE stxname = 'mcv_lists_stats';
+SELECT d.stxmcv IS NOT NULL
+  FROM pg_statistic_ext s, pg_statistic_ext_data d
+ WHERE s.stxname = 'mcv_lists_stats'
+   AND d.stxoid = s.oid;
 
 -- check change of column type resets the MCV statistics
 ALTER TABLE mcv_lists ALTER COLUMN c TYPE numeric;
@@ -378,8 +387,11 @@ TRUNCATE mcv_lists;
 INSERT INTO mcv_lists (a, b, c) SELECT 1, 2, 3 FROM generate_series(1,1000) s(i);
 ANALYZE mcv_lists;
 
-SELECT m.* FROM pg_statistic_ext,
-              pg_mcv_list_items(stxmcv) m WHERE stxname = 'mcv_lists_stats';
+SELECT m.*
+  FROM pg_statistic_ext s, pg_statistic_ext_data d,
+       pg_mcv_list_items(d.stxmcv) m
+ WHERE s.stxname = 'mcv_lists_stats'
+   AND d.stxoid = s.oid;
 
 -- mcv with arrays
 CREATE TABLE mcv_lists_arrays (
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
new file mode 100644
index c6050a3..8c9712c
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -719,6 +719,7 @@ FormData_pg_sequence_data
 FormData_pg_shdepend
 FormData_pg_statistic
 FormData_pg_statistic_ext
+FormData_pg_statistic_ext_data
 FormData_pg_subscription
 FormData_pg_subscription_rel
 FormData_pg_tablespace
@@ -776,6 +777,7 @@ Form_pg_sequence_data
 Form_pg_shdepend
 Form_pg_statistic
 Form_pg_statistic_ext
+Form_pg_statistic_ext_data
 Form_pg_subscription
 Form_pg_subscription_rel
 Form_pg_tablespace

#17

Tom Lane

tgl@sss.pgh.pa.us

over 6 years ago

In reply to: Dean Rasheed (#16)

Re: Multivariate MCV stats can leak data to unprivileged users

Dean Rasheed <dean.a.rasheed@gmail.com> writes:

I think we shouldn't risk trying to get this into beta1, but let's try
to get it done as soon as possible after that.

Agreed.

\d pg_statistic_ext
Table "pg_catalog.pg_statistic_ext"
Column | Type | Collation | Nullable | Default
--------------+------------+-----------+----------+---------
oid | oid | | not null |
stxrelid | oid | | not null |
stxname | name | | not null |
stxnamespace | oid | | not null |
stxowner | oid | | not null |
stxkeys | int2vector | | not null |
stxkind | "char"[] | | not null |
Indexes:
"pg_statistic_ext_name_index" UNIQUE, btree (stxname, stxnamespace)
"pg_statistic_ext_oid_index" UNIQUE, btree (oid)
"pg_statistic_ext_relid_index" btree (stxrelid)

Check.

\d pg_statistic_ext_data
Table "pg_catalog.pg_statistic_ext_data"
Column | Type | Collation | Nullable | Default
-----------------+-----------------+-----------+----------+---------
stxoid | oid | | not null |
stxndistinct | pg_ndistinct | C | |
stxdependencies | pg_dependencies | C | |
stxmcv | pg_mcv_list | C | |
Indexes:
"pg_statistic_ext_data_stxoid_index" UNIQUE, btree (stxoid)

I wonder ... another way we could potentially do this is

create table pg_statistic_ext_data(
stxoid oid, -- OID of owning pg_statistic_ext entry
stxkind char, -- what kind of data
stxdata bytea -- the data, in some format or other
);

The advantage of this way is that we'd not have to rejigger the
catalog's rowtype every time we think of a new kind of extended
stats. The disadvantage is that manual inspection of the contents
of an entry would become much harder, for lack of any convenient
output function. However, this whole exercise is mostly to prevent
casual manual inspection anyway :-(, so I wonder how much we care
about that.

Also, I assume there's going to be a user-accessible view that shows
a join of these tables, but only those rows that correspond to columns
the current user can read all of. Should we give that view the name
pg_statistic_ext for maximum backward compatibility? I'm not sure.
pg_dump would probably prefer it if the view is what has a new name,
but other clients might like the other way.

regards, tom lane

#18

Tom Lane

tgl@sss.pgh.pa.us

over 6 years ago

In reply to: Tom Lane (#17)

Re: Multivariate MCV stats can leak data to unprivileged users

I wrote:

I wonder ... another way we could potentially do this is

create table pg_statistic_ext_data(
stxoid oid, -- OID of owning pg_statistic_ext entry
stxkind char, -- what kind of data
stxdata bytea -- the data, in some format or other
);

The advantage of this way is that we'd not have to rejigger the
catalog's rowtype every time we think of a new kind of extended
stats. The disadvantage is that manual inspection of the contents
of an entry would become much harder, for lack of any convenient
output function.

No, wait, scratch that. We could fold the three existing types
pg_ndistinct, pg_dependencies, pg_mcv_list into one new type, say
"pg_stats_ext_data", where the actual storage would need to have an
ID field (so we'd waste a byte or two duplicating the externally
visible stxkind field inside stxdata). The output function for this
type is just a switch over the existing code. The big advantage of
this way compared to the current approach is that adding a new
ext-stats type requires *zero* work with adding new catalog entries.
Just add another switch case in pg_stats_ext_data_out() and you're
done.

regards, tom lane

#19

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 6 years ago

In reply to: Tom Lane (#18)

Re: Multivariate MCV stats can leak data to unprivileged users

On Sun, May 19, 2019 at 10:28:43AM -0400, Tom Lane wrote:

I wrote:

I wonder ... another way we could potentially do this is

create table pg_statistic_ext_data(
stxoid oid, -- OID of owning pg_statistic_ext entry
stxkind char, -- what kind of data
stxdata bytea -- the data, in some format or other
);

The advantage of this way is that we'd not have to rejigger the
catalog's rowtype every time we think of a new kind of extended
stats. The disadvantage is that manual inspection of the contents
of an entry would become much harder, for lack of any convenient
output function.

No, wait, scratch that. We could fold the three existing types
pg_ndistinct, pg_dependencies, pg_mcv_list into one new type, say
"pg_stats_ext_data", where the actual storage would need to have an
ID field (so we'd waste a byte or two duplicating the externally
visible stxkind field inside stxdata). The output function for this
type is just a switch over the existing code. The big advantage of
this way compared to the current approach is that adding a new
ext-stats type requires *zero* work with adding new catalog entries.
Just add another switch case in pg_stats_ext_data_out() and you're
done.

The annoying thing is that this undoes the protections provided by special
data types generated only in internally. It's not possible to generate
e.g. pg_mcv_list values in user code (except for C code, at which points
all bets are off anyway). By abandoning this and reverting to bytea anyone
could craft a bytea and claim it's a statistic value.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#20

Dean Rasheed

dean.a.rasheed@gmail.com

over 6 years ago

In reply to: Tom Lane (#18)

Re: Multivariate MCV stats can leak data to unprivileged users

On Sun, 19 May 2019 at 15:28, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I wonder ... another way we could potentially do this is

create table pg_statistic_ext_data(
stxoid oid, -- OID of owning pg_statistic_ext entry
stxkind char, -- what kind of data
stxdata bytea -- the data, in some format or other
);

The advantage of this way is that we'd not have to rejigger the
catalog's rowtype every time we think of a new kind of extended
stats. The disadvantage is that manual inspection of the contents
of an entry would become much harder, for lack of any convenient
output function.

No, wait, scratch that. We could fold the three existing types
pg_ndistinct, pg_dependencies, pg_mcv_list into one new type, say
"pg_stats_ext_data", where the actual storage would need to have an
ID field (so we'd waste a byte or two duplicating the externally
visible stxkind field inside stxdata). The output function for this
type is just a switch over the existing code. The big advantage of
this way compared to the current approach is that adding a new
ext-stats type requires *zero* work with adding new catalog entries.
Just add another switch case in pg_stats_ext_data_out() and you're
done.

This feels a little over-engineered to me. Presumably there'd be a
compound key on (stxoid, stxkind) and we'd have to scan multiple rows
to get all the applicable stats, whereas currently they're all in one
row. And then the user-accessible view would probably need separate
sub-queries for each stats kind.

If the point is just to avoid adding columns to the catalog in future
releases, I'm not sure it's worth the added complexity. We know that
we will probably add histogram stats in a future release. I'm not sure
how many more kinds we'll end up adding, but it doesn't seem likely to
be a huge number. I think we'll add far more columns to other catalog
tables as we add new features to each release.

Regards,
Dean

#21

Tom Lane

tgl@sss.pgh.pa.us

over 6 years ago

In reply to: Tomas Vondra (#19)

Re: Multivariate MCV stats can leak data to unprivileged users

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:

On Sun, May 19, 2019 at 10:28:43AM -0400, Tom Lane wrote:

No, wait, scratch that. We could fold the three existing types
pg_ndistinct, pg_dependencies, pg_mcv_list into one new type, say
"pg_stats_ext_data", where the actual storage would need to have an
ID field (so we'd waste a byte or two duplicating the externally
visible stxkind field inside stxdata). The output function for this
type is just a switch over the existing code. The big advantage of
this way compared to the current approach is that adding a new
ext-stats type requires *zero* work with adding new catalog entries.
Just add another switch case in pg_stats_ext_data_out() and you're
done.

The annoying thing is that this undoes the protections provided by special
data types generated only in internally. It's not possible to generate
e.g. pg_mcv_list values in user code (except for C code, at which points
all bets are off anyway). By abandoning this and reverting to bytea anyone
could craft a bytea and claim it's a statistic value.

That would have been true of the original proposal, but not of this
modified one.

regards, tom lane

#22

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 6 years ago

In reply to: Tom Lane (#21)

Re: Multivariate MCV stats can leak data to unprivileged users

On Sun, May 19, 2019 at 02:14:54PM -0400, Tom Lane wrote:

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:

On Sun, May 19, 2019 at 10:28:43AM -0400, Tom Lane wrote:

No, wait, scratch that. We could fold the three existing types
pg_ndistinct, pg_dependencies, pg_mcv_list into one new type, say
"pg_stats_ext_data", where the actual storage would need to have an
ID field (so we'd waste a byte or two duplicating the externally
visible stxkind field inside stxdata). The output function for this
type is just a switch over the existing code. The big advantage of
this way compared to the current approach is that adding a new
ext-stats type requires *zero* work with adding new catalog entries.
Just add another switch case in pg_stats_ext_data_out() and you're
done.

The annoying thing is that this undoes the protections provided by special
data types generated only in internally. It's not possible to generate
e.g. pg_mcv_list values in user code (except for C code, at which points
all bets are off anyway). By abandoning this and reverting to bytea anyone
could craft a bytea and claim it's a statistic value.

That would have been true of the original proposal, but not of this
modified one.

Oh, right. It still has the disadvantage that it obfuscates the actual
data stored in the pg_stats_ext_data (or whatever would it be called),
so e.g. functions would have to do additional checks to make sure it
actually is the right statistic type. For example pg_mcv_list_items()
could not rely on receiving pg_mcv_list values, as per the signature,
but would have to check the value.

Of course, I don't expect to have too many such functions, but overall
this approach with a single type feels a bit too like EAV to my taste.

I think Dean is right we should not expect many more statistic types
than what we already have - a histogram, and perhaps one or two more. So
I agree with Dean the current design with separate statistic types is
not such a big issue.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#23

Dean Rasheed

dean.a.rasheed@gmail.com

over 6 years ago

In reply to: Tomas Vondra (#22)

Re: Multivariate MCV stats can leak data to unprivileged users

On Sun, 19 May 2019 at 23:45, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

Oh, right. It still has the disadvantage that it obfuscates the actual
data stored in the pg_stats_ext_data (or whatever would it be called),
so e.g. functions would have to do additional checks to make sure it
actually is the right statistic type. For example pg_mcv_list_items()
could not rely on receiving pg_mcv_list values, as per the signature,
but would have to check the value.

Yes. In fact, since the user-accessible view would want to expose
datatypes specific to the stats kinds rather than bytea or cstring
values, we would need SQL-callable conversion functions for each kind:

* to_pg_ndistinct(pg_extended_stats_ext_data) returns pg_ndistinct
* to_pg_dependencies(pg_extended_stats_ext_data) returns pg_dependencies
* to_pg_mcv(pg_extended_stats_ext_data) returns pg_mcv
* ...

and each of these would throw an error if it weren't given an extended
stats object of the right kind. Then to extract MCV data, you'd have
to do pg_mcv_list_items(to_pg_mcv(ext_data)), and presumably there'd
be something similar for histograms.

IMO, that's not a nice model, compared to just having columns of the
right types in the first place.

Also this model presupposes that all future stats kinds are most
conveniently represented in a single column, but maybe that won't be
the case. It's conceivable that a future stats kind would benefit from
splitting its data across multiple columns.

Of course, I don't expect to have too many such functions, but overall
this approach with a single type feels a bit too like EAV to my taste.

Yes, I think it is an EAV model. I think EAV models do have their
place, but I think that's largely where adding new columns is a common
operation and involves adding little to no extra code. I don't think
either of those is true for extended stats. What we've seen over the
last couple of years is that adding each new stats kind is a large
undertaking, involving lots of new code. That alone is going to limit
just how many ever get added, and compared to that effort, adding new
columns to the catalog is small fry.

Regards,
Dean

#24

Tom Lane

tgl@sss.pgh.pa.us

over 6 years ago

In reply to: Dean Rasheed (#23)

Re: Multivariate MCV stats can leak data to unprivileged users

Dean Rasheed <dean.a.rasheed@gmail.com> writes:

On Sun, 19 May 2019 at 23:45, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

Oh, right. It still has the disadvantage that it obfuscates the actual
data stored in the pg_stats_ext_data (or whatever would it be called),
so e.g. functions would have to do additional checks to make sure it
actually is the right statistic type. For example pg_mcv_list_items()
could not rely on receiving pg_mcv_list values, as per the signature,
but would have to check the value.

Yes. In fact, since the user-accessible view would want to expose
datatypes specific to the stats kinds rather than bytea or cstring
values, we would need SQL-callable conversion functions for each kind:

It seems like people are willfully misunderstanding my suggestion.
You'd only need *one* conversion function, which would look at the
embedded ID field and then emit the appropriate text representation.
I don't see a reason why we'd have the separate pg_ndistinct etc. types
any more at all.

Also this model presupposes that all future stats kinds are most
conveniently represented in a single column, but maybe that won't be
the case. It's conceivable that a future stats kind would benefit from
splitting its data across multiple columns.

Hm, that's possible I suppose, but it seems a little far-fetched.
You could equally well argue that pg_ndistinct etc. should have been
broken down into smaller types, but we didn't.

Yes, I think it is an EAV model. I think EAV models do have their
place, but I think that's largely where adding new columns is a common
operation and involves adding little to no extra code. I don't think
either of those is true for extended stats. What we've seen over the
last couple of years is that adding each new stats kind is a large
undertaking, involving lots of new code. That alone is going to limit
just how many ever get added, and compared to that effort, adding new
columns to the catalog is small fry.

I can't argue with that --- the make-work is just a small part of the
total. But it's still make-work.

Anyway, it was just a suggestion, and if people don't like it that's
fine. But I don't want it to be rejected on the basis of false
arguments.

regards, tom lane

#25

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 6 years ago

In reply to: Tom Lane (#24)

Re: Multivariate MCV stats can leak data to unprivileged users

On Mon, May 20, 2019 at 09:32:24AM -0400, Tom Lane wrote:

Dean Rasheed <dean.a.rasheed@gmail.com> writes:

On Sun, 19 May 2019 at 23:45, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

Oh, right. It still has the disadvantage that it obfuscates the actual
data stored in the pg_stats_ext_data (or whatever would it be called),
so e.g. functions would have to do additional checks to make sure it
actually is the right statistic type. For example pg_mcv_list_items()
could not rely on receiving pg_mcv_list values, as per the signature,
but would have to check the value.

Yes. In fact, since the user-accessible view would want to expose
datatypes specific to the stats kinds rather than bytea or cstring
values, we would need SQL-callable conversion functions for each kind:

It seems like people are willfully misunderstanding my suggestion.
You'd only need *one* conversion function, which would look at the
embedded ID field and then emit the appropriate text representation.
I don't see a reason why we'd have the separate pg_ndistinct etc. types
any more at all.

That would however require having input functions, which we currently
don't have. Otherwise people could not process the statistic values using
functions like pg_mcv_list_items(). Which I think is useful.

Of course, we could add input functions, but there was a reasoning for not
having them (similarly to pg_node_tree).

Also this model presupposes that all future stats kinds are most
conveniently represented in a single column, but maybe that won't be
the case. It's conceivable that a future stats kind would benefit from
splitting its data across multiple columns.

Hm, that's possible I suppose, but it seems a little far-fetched.
You could equally well argue that pg_ndistinct etc. should have been
broken down into smaller types, but we didn't.

True. I can't rule out adding such "split" statistic type, but don't think
it's very likely. The extended statistic values tend to be complex and
easier to represent in a single value.

Yes, I think it is an EAV model. I think EAV models do have their
place, but I think that's largely where adding new columns is a common
operation and involves adding little to no extra code. I don't think
either of those is true for extended stats. What we've seen over the
last couple of years is that adding each new stats kind is a large
undertaking, involving lots of new code. That alone is going to limit
just how many ever get added, and compared to that effort, adding new
columns to the catalog is small fry.

I can't argue with that --- the make-work is just a small part of the
total. But it's still make-work.

Anyway, it was just a suggestion, and if people don't like it that's
fine. But I don't want it to be rejected on the basis of false
arguments.

Sure.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#26

Dean Rasheed

dean.a.rasheed@gmail.com

over 6 years ago

In reply to: Tom Lane (#24)

Re: Multivariate MCV stats can leak data to unprivileged users

On Mon, 20 May 2019 at 14:32, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Dean Rasheed <dean.a.rasheed@gmail.com> writes:

On Sun, 19 May 2019 at 23:45, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

Oh, right. It still has the disadvantage that it obfuscates the actual
data stored in the pg_stats_ext_data (or whatever would it be called),
so e.g. functions would have to do additional checks to make sure it
actually is the right statistic type. For example pg_mcv_list_items()
could not rely on receiving pg_mcv_list values, as per the signature,
but would have to check the value.

Yes. In fact, since the user-accessible view would want to expose
datatypes specific to the stats kinds rather than bytea or cstring
values, we would need SQL-callable conversion functions for each kind:

It seems like people are willfully misunderstanding my suggestion.

I'm more than capable of inadvertently misunderstanding, without the
need to willfully do so :-)

You'd only need *one* conversion function, which would look at the
embedded ID field and then emit the appropriate text representation.
I don't see a reason why we'd have the separate pg_ndistinct etc. types
any more at all.

Hmm, OK. So then would you also make the user-accessible view agnostic
about the kinds of stats supported in the same way, returning zero or
more rows per STATISTICS object, depending on how many kinds of stats
have been built? That would have the advantage of never needing to
change the view definition again, as more stats kinds are supported.

We'd need to change pg_mcv_list_items() to accept a pg_stats_ext_data
value rather than a pg_mcv value, and it would be the user's
responsibility to call it if they wanted to see the contents of the
MCV list (I was originally thinking that we'd include a call to
pg_mcv_list_items() in the view definition, so that it produced
friendlier looking output, since the default textual representation of
an MCV list is completely opaque, unlike the other stats kinds).
Actually, I can see another advantage to not including
pg_mcv_list_items() in the view definition -- in the future, we may
dream up a better version of pg_mcv_list_items(), like say one that
produced JSON, and then we'd regret using the current function.

Anyway, it was just a suggestion, and if people don't like it that's
fine. But I don't want it to be rejected on the basis of false
arguments.

To be clear, I'm not intentionally rejecting your idea. I'm merely
trying to fully understand the implications.

At this stage, perhaps it would be helpful to prototype something for
comparison.

Regards,
Dean

#27

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 6 years ago

In reply to: Dean Rasheed (#26)

Re: Multivariate MCV stats can leak data to unprivileged users

On Mon, May 20, 2019 at 04:09:24PM +0100, Dean Rasheed wrote:

On Mon, 20 May 2019 at 14:32, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Dean Rasheed <dean.a.rasheed@gmail.com> writes:

On Sun, 19 May 2019 at 23:45, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

Oh, right. It still has the disadvantage that it obfuscates the actual
data stored in the pg_stats_ext_data (or whatever would it be called),
so e.g. functions would have to do additional checks to make sure it
actually is the right statistic type. For example pg_mcv_list_items()
could not rely on receiving pg_mcv_list values, as per the signature,
but would have to check the value.

Yes. In fact, since the user-accessible view would want to expose
datatypes specific to the stats kinds rather than bytea or cstring
values, we would need SQL-callable conversion functions for each kind:

It seems like people are willfully misunderstanding my suggestion.

I'm more than capable of inadvertently misunderstanding, without the
need to willfully do so :-)

You'd only need *one* conversion function, which would look at the
embedded ID field and then emit the appropriate text representation.
I don't see a reason why we'd have the separate pg_ndistinct etc. types
any more at all.

Hmm, OK. So then would you also make the user-accessible view agnostic
about the kinds of stats supported in the same way, returning zero or
more rows per STATISTICS object, depending on how many kinds of stats
have been built? That would have the advantage of never needing to
change the view definition again, as more stats kinds are supported.

If I got Tom's proposal right, there'd be only one statistic value in
each pg_stats_ext_data value. It'd be a very thin wrapper, essentially
just the value itself + type flag. So for example if you did

CREATE STATISTICS s (ndistinct, mcv, dependencies) ...

you'd get three rows in pg_statistic_ext_data (assuming all the stats
get actually built).

We'd need to change pg_mcv_list_items() to accept a pg_stats_ext_data
value rather than a pg_mcv value, and it would be the user's
responsibility to call it if they wanted to see the contents of the
MCV list (I was originally thinking that we'd include a call to
pg_mcv_list_items() in the view definition, so that it produced
friendlier looking output, since the default textual representation of
an MCV list is completely opaque, unlike the other stats kinds).
Actually, I can see another advantage to not including
pg_mcv_list_items() in the view definition -- in the future, we may
dream up a better version of pg_mcv_list_items(), like say one that
produced JSON, and then we'd regret using the current function.

Yeah. As I said, it obfuscates the "actual" type of the stats value, so
we can no longer rely on the function machinery to verify the type. All
functions dealing with the "wrapper" type would have to verify it
actually contains the right statistic type.

Anyway, it was just a suggestion, and if people don't like it that's
fine. But I don't want it to be rejected on the basis of false
arguments.

To be clear, I'm not intentionally rejecting your idea. I'm merely
trying to fully understand the implications.

At this stage, perhaps it would be helpful to prototype something for
comparison.

I'll look into that. I'll try to whip something up before pgcon, but I
can't guarantee that :-(

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#28

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 6 years ago

In reply to: Tomas Vondra (#27)

3 attachment(s)

Re: Multivariate MCV stats can leak data to unprivileged users

Hi,

Attached are three patches tweaking the stats - two were already posted
in this thread, the third one is just updating docs.

1) 0001 - split pg_statistic_ext into definition + data

This is pretty much the patch Dean posted some time ago, rebased to
current master (fixing just minor pgindent bitrot).

2) 0002 - update sgml docs to reflect changes from 0001

3) 0003 - define pg_stats_ext view, similar to pg_stats

The question is whether we want to also redesign pg_statistic_ext_data
per Tom's proposal (more about that later), but I think we can treat
that as an additional step on top of 0001. So I propose we get those
changes committed, and then perhaps also switch the data table to the
EAV model.

Barring objections, I'll do that early next week, after cleaning up
those patches a bit more.

One thing I think we should fix is naming of the attributes in the 0001
patch. At the moment both catalogs use "stx" prefix - e.g. "stxkind" is
in pg_statistic_ext, and "stxmcv" is in pg_statistic_ext_data. We should
probably switch to "stxd" in the _data catalog. Opinions?

Now, back to the proposal to split the _data catalog rows to EAV form,
with a new data type replacing the multiple types we have at the moment.
I've started hacking on it today, but the more I work on it the less
useful it seems to me.

My understanding is that with that approach we'd replace the _data
catalog (which currently has one column per statistic type, with a
separate data type) with 1:M generic rows, with a generic data type.
That is, we'd replace this

CREATE TABLE pg_statistic_ext_data (
stxoid OID,
stxdependencies pg_dependencies,
stxndistinct pg_ndistinct,
stxmcv pg_mcv_list,
... histograms ...
);

with something like this:

CREATE TABLE pg_statistiex_ext_data (
stxoid OID,
stxkind CHAR,
stxdata pg_statistic_ext_type
);

where pg_statistic_ext would store all existing statistic types. along
with a "flag" saying which value it actually stored (essentially a copy
of the stxkind column, which we however need to lookup a statistic of a
certain type, without having to detoast the statistic itself).

As I mentioned before, I kinda dislike the fact that this obfuscates the
actual statistic type by hiding it behing the "wrapper" type.

The other thing is that we have to deal with 1:M relationship every time
we (re)build the statistics, or when we need to access them. Now, it may
not be a huge amount of code, but it just seems unnecessary. It would
make sense if we planned to add large number of additional statistic
types, but that seems unlikely - I personally can think of maybe one new
statistic type, but that's about it.

I'll continue working on it and I'll share the results early next week,
after playing with it a bit, but I think we should get the existing
patches committed and then continue discussing this as an additional
improvement.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

0001-split-up-pg_statistics_ext.patchtext/plain; charset=us-asciiDownload

From 74e3f575f52a7ea109adde2bce3bf7e9fd44913e Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@2ndquadrant.com>
Date: Thu, 6 Jun 2019 13:44:14 +0200
Subject: [PATCH 1/3] split up pg_statistics_ext

---
 src/backend/catalog/Makefile                |  2 +-
 src/backend/commands/statscmds.c            | 73 ++++++++++++++++-----
 src/backend/optimizer/util/plancat.c        | 12 +++-
 src/backend/statistics/README.mcv           |  9 ++-
 src/backend/statistics/dependencies.c       |  7 +-
 src/backend/statistics/extended_stats.c     | 61 +++++++++--------
 src/backend/statistics/mcv.c                |  7 +-
 src/backend/statistics/mvdistinct.c         |  7 +-
 src/backend/utils/cache/syscache.c          | 12 ++++
 src/include/catalog/indexing.h              | 17 +++--
 src/include/catalog/pg_statistic_ext.h      |  9 +--
 src/include/catalog/pg_statistic_ext_data.h | 52 +++++++++++++++
 src/include/catalog/toasting.h              |  1 +
 src/include/utils/syscache.h                |  1 +
 src/test/regress/expected/oidjoins.out      |  8 +++
 src/test/regress/expected/sanity_check.out  |  1 +
 src/test/regress/expected/stats_ext.out     | 30 ++++++---
 src/test/regress/sql/oidjoins.sql           |  4 ++
 src/test/regress/sql/stats_ext.sql          | 30 ++++++---
 src/tools/pgindent/typedefs.list            |  2 +
 20 files changed, 256 insertions(+), 89 deletions(-)
 create mode 100644 src/include/catalog/pg_statistic_ext_data.h

diff --git a/src/backend/catalog/Makefile b/src/backend/catalog/Makefile
index f186198fc6..8bece078dd 100644
--- a/src/backend/catalog/Makefile
+++ b/src/backend/catalog/Makefile
@@ -34,7 +34,7 @@ CATALOG_HEADERS := \
 	pg_attrdef.h pg_constraint.h pg_inherits.h pg_index.h pg_operator.h \
 	pg_opfamily.h pg_opclass.h pg_am.h pg_amop.h pg_amproc.h \
 	pg_language.h pg_largeobject_metadata.h pg_largeobject.h pg_aggregate.h \
-	pg_statistic_ext.h \
+	pg_statistic_ext.h pg_statistic_ext_data.h \
 	pg_statistic.h pg_rewrite.h pg_trigger.h pg_event_trigger.h pg_description.h \
 	pg_cast.h pg_enum.h pg_namespace.h pg_conversion.h pg_depend.h \
 	pg_database.h pg_db_role_setting.h pg_tablespace.h pg_pltemplate.h \
diff --git a/src/backend/commands/statscmds.c b/src/backend/commands/statscmds.c
index 217d3a4533..f863887463 100644
--- a/src/backend/commands/statscmds.c
+++ b/src/backend/commands/statscmds.c
@@ -23,6 +23,7 @@
 #include "catalog/namespace.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_statistic_ext.h"
+#include "catalog/pg_statistic_ext_data.h"
 #include "commands/comment.h"
 #include "commands/defrem.h"
 #include "miscadmin.h"
@@ -67,8 +68,11 @@ CreateStatistics(CreateStatsStmt *stmt)
 	HeapTuple	htup;
 	Datum		values[Natts_pg_statistic_ext];
 	bool		nulls[Natts_pg_statistic_ext];
+	Datum		datavalues[Natts_pg_statistic_ext_data];
+	bool		datanulls[Natts_pg_statistic_ext_data];
 	int2vector *stxkeys;
 	Relation	statrel;
+	Relation	datarel;
 	Relation	rel = NULL;
 	Oid			relid;
 	ObjectAddress parentobject,
@@ -336,11 +340,6 @@ CreateStatistics(CreateStatsStmt *stmt)
 	values[Anum_pg_statistic_ext_stxkeys - 1] = PointerGetDatum(stxkeys);
 	values[Anum_pg_statistic_ext_stxkind - 1] = PointerGetDatum(stxkind);
 
-	/* no statistics built yet */
-	nulls[Anum_pg_statistic_ext_stxndistinct - 1] = true;
-	nulls[Anum_pg_statistic_ext_stxdependencies - 1] = true;
-	nulls[Anum_pg_statistic_ext_stxmcv - 1] = true;
-
 	/* insert it into pg_statistic_ext */
 	htup = heap_form_tuple(statrel->rd_att, values, nulls);
 	CatalogTupleInsert(statrel, htup);
@@ -348,6 +347,29 @@ CreateStatistics(CreateStatsStmt *stmt)
 
 	relation_close(statrel, RowExclusiveLock);
 
+	/*
+	 * Also build the pg_statistic_ext_data tuple, to hold the actual
+	 * statistics data.
+	 */
+	datarel = table_open(StatisticExtDataRelationId, RowExclusiveLock);
+
+	memset(datavalues, 0, sizeof(datavalues));
+	memset(datanulls, false, sizeof(datanulls));
+
+	datavalues[Anum_pg_statistic_ext_data_stxoid - 1] = ObjectIdGetDatum(statoid);
+
+	/* no statistics built yet */
+	datanulls[Anum_pg_statistic_ext_data_stxndistinct - 1] = true;
+	datanulls[Anum_pg_statistic_ext_data_stxdependencies - 1] = true;
+	datanulls[Anum_pg_statistic_ext_data_stxmcv - 1] = true;
+
+	/* insert it into pg_statistic_ext_data */
+	htup = heap_form_tuple(datarel->rd_att, datavalues, datanulls);
+	CatalogTupleInsert(datarel, htup);
+	heap_freetuple(htup);
+
+	relation_close(datarel, RowExclusiveLock);
+
 	/*
 	 * Invalidate relcache so that others see the new statistics object.
 	 */
@@ -403,6 +425,23 @@ RemoveStatisticsById(Oid statsOid)
 	Form_pg_statistic_ext statext;
 	Oid			relid;
 
+	/*
+	 * First delete the pg_statistic_ext_data tuple holding the actual
+	 * statistical data.
+	 */
+	relation = table_open(StatisticExtDataRelationId, RowExclusiveLock);
+
+	tup = SearchSysCache1(STATEXTDATASTXOID, ObjectIdGetDatum(statsOid));
+
+	if (!HeapTupleIsValid(tup)) /* should not happen */
+		elog(ERROR, "cache lookup failed for statistics data %u", statsOid);
+
+	CatalogTupleDelete(relation, &tup->t_self);
+
+	ReleaseSysCache(tup);
+
+	table_close(relation, RowExclusiveLock);
+
 	/*
 	 * Delete the pg_statistic_ext tuple.  Also send out a cache inval on the
 	 * associated table, so that dependent plans will be rebuilt.
@@ -431,8 +470,8 @@ RemoveStatisticsById(Oid statsOid)
  *
  * This could throw an error if the type change can't be supported.
  * If it can be supported, but the stats must be recomputed, a likely choice
- * would be to set the relevant column(s) of the pg_statistic_ext tuple to
- * null until the next ANALYZE.  (Note that the type change hasn't actually
+ * would be to set the relevant column(s) of the pg_statistic_ext_data tuple
+ * to null until the next ANALYZE.  (Note that the type change hasn't actually
  * happened yet, so one option that's *not* on the table is to recompute
  * immediately.)
  *
@@ -456,11 +495,11 @@ UpdateStatisticsForTypeChange(Oid statsOid, Oid relationOid, int attnum,
 
 	Relation	rel;
 
-	Datum		values[Natts_pg_statistic_ext];
-	bool		nulls[Natts_pg_statistic_ext];
-	bool		replaces[Natts_pg_statistic_ext];
+	Datum		values[Natts_pg_statistic_ext_data];
+	bool		nulls[Natts_pg_statistic_ext_data];
+	bool		replaces[Natts_pg_statistic_ext_data];
 
-	oldtup = SearchSysCache1(STATEXTOID, ObjectIdGetDatum(statsOid));
+	oldtup = SearchSysCache1(STATEXTDATASTXOID, ObjectIdGetDatum(statsOid));
 	if (!HeapTupleIsValid(oldtup))
 		elog(ERROR, "cache lookup failed for statistics object %u", statsOid);
 
@@ -479,14 +518,14 @@ UpdateStatisticsForTypeChange(Oid statsOid, Oid relationOid, int attnum,
 	 * OK, we need to reset some statistics. So let's build the new tuple,
 	 * replacing the affected statistics types with NULL.
 	 */
-	memset(nulls, 0, Natts_pg_statistic_ext * sizeof(bool));
-	memset(replaces, 0, Natts_pg_statistic_ext * sizeof(bool));
-	memset(values, 0, Natts_pg_statistic_ext * sizeof(Datum));
+	memset(nulls, 0, Natts_pg_statistic_ext_data * sizeof(bool));
+	memset(replaces, 0, Natts_pg_statistic_ext_data * sizeof(bool));
+	memset(values, 0, Natts_pg_statistic_ext_data * sizeof(Datum));
 
-	replaces[Anum_pg_statistic_ext_stxmcv - 1] = true;
-	nulls[Anum_pg_statistic_ext_stxmcv - 1] = true;
+	replaces[Anum_pg_statistic_ext_data_stxmcv - 1] = true;
+	nulls[Anum_pg_statistic_ext_data_stxmcv - 1] = true;
 
-	rel = heap_open(StatisticExtRelationId, RowExclusiveLock);
+	rel = heap_open(StatisticExtDataRelationId, RowExclusiveLock);
 
 	/* replace the old tuple */
 	stup = heap_modify_tuple(oldtup,
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 2405acbf6f..40f497660d 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1308,6 +1308,7 @@ get_relation_statistics(RelOptInfo *rel, Relation relation)
 		Oid			statOid = lfirst_oid(l);
 		Form_pg_statistic_ext staForm;
 		HeapTuple	htup;
+		HeapTuple	dtup;
 		Bitmapset  *keys = NULL;
 		int			i;
 
@@ -1316,6 +1317,10 @@ get_relation_statistics(RelOptInfo *rel, Relation relation)
 			elog(ERROR, "cache lookup failed for statistics object %u", statOid);
 		staForm = (Form_pg_statistic_ext) GETSTRUCT(htup);
 
+		dtup = SearchSysCache1(STATEXTDATASTXOID, ObjectIdGetDatum(statOid));
+		if (!HeapTupleIsValid(dtup))
+			elog(ERROR, "cache lookup failed for statistics object %u", statOid);
+
 		/*
 		 * First, build the array of columns covered.  This is ultimately
 		 * wasted if no stats within the object have actually been built, but
@@ -1325,7 +1330,7 @@ get_relation_statistics(RelOptInfo *rel, Relation relation)
 			keys = bms_add_member(keys, staForm->stxkeys.values[i]);
 
 		/* add one StatisticExtInfo for each kind built */
-		if (statext_is_kind_built(htup, STATS_EXT_NDISTINCT))
+		if (statext_is_kind_built(dtup, STATS_EXT_NDISTINCT))
 		{
 			StatisticExtInfo *info = makeNode(StatisticExtInfo);
 
@@ -1337,7 +1342,7 @@ get_relation_statistics(RelOptInfo *rel, Relation relation)
 			stainfos = lcons(info, stainfos);
 		}
 
-		if (statext_is_kind_built(htup, STATS_EXT_DEPENDENCIES))
+		if (statext_is_kind_built(dtup, STATS_EXT_DEPENDENCIES))
 		{
 			StatisticExtInfo *info = makeNode(StatisticExtInfo);
 
@@ -1349,7 +1354,7 @@ get_relation_statistics(RelOptInfo *rel, Relation relation)
 			stainfos = lcons(info, stainfos);
 		}
 
-		if (statext_is_kind_built(htup, STATS_EXT_MCV))
+		if (statext_is_kind_built(dtup, STATS_EXT_MCV))
 		{
 			StatisticExtInfo *info = makeNode(StatisticExtInfo);
 
@@ -1362,6 +1367,7 @@ get_relation_statistics(RelOptInfo *rel, Relation relation)
 		}
 
 		ReleaseSysCache(htup);
+		ReleaseSysCache(dtup);
 		bms_free(keys);
 	}
 
diff --git a/src/backend/statistics/README.mcv b/src/backend/statistics/README.mcv
index c18878f5d2..4733341112 100644
--- a/src/backend/statistics/README.mcv
+++ b/src/backend/statistics/README.mcv
@@ -86,11 +86,14 @@ So instead the MCV lists are stored in a custom data type (pg_mcv_list),
 which however makes it more difficult to inspect the contents. To make that
 easier, there's a SRF returning detailed information about the MCV lists.
 
-    SELECT m.* FROM pg_statistic_ext,
-                    pg_mcv_list_items(stxmcv) m WHERE stxname = 'stts2';
+    SELECT m.* FROM pg_statistic_ext s,
+                    pg_statistic_ext_data d,
+                    pg_mcv_list_items(stxmcv) m
+              WHERE s.stxname = 'stts2'
+                AND d.stxoid = s.oid;
 
 It accepts one parameter - a pg_mcv_list value (which can only be obtained
-from pg_statistic_ext catalog, to defend against malicious input), and
+from pg_statistic_ext_data catalog, to defend against malicious input), and
 returns these columns:
 
     - item index (0, ..., (nitems-1))
diff --git a/src/backend/statistics/dependencies.c b/src/backend/statistics/dependencies.c
index cd318faf3b..b4bde517ae 100644
--- a/src/backend/statistics/dependencies.c
+++ b/src/backend/statistics/dependencies.c
@@ -17,6 +17,7 @@
 #include "access/sysattr.h"
 #include "catalog/pg_operator.h"
 #include "catalog/pg_statistic_ext.h"
+#include "catalog/pg_statistic_ext_data.h"
 #include "lib/stringinfo.h"
 #include "nodes/nodeFuncs.h"
 #include "optimizer/clauses.h"
@@ -637,12 +638,12 @@ statext_dependencies_load(Oid mvoid)
 	Datum		deps;
 	HeapTuple	htup;
 
-	htup = SearchSysCache1(STATEXTOID, ObjectIdGetDatum(mvoid));
+	htup = SearchSysCache1(STATEXTDATASTXOID, ObjectIdGetDatum(mvoid));
 	if (!HeapTupleIsValid(htup))
 		elog(ERROR, "cache lookup failed for statistics object %u", mvoid);
 
-	deps = SysCacheGetAttr(STATEXTOID, htup,
-						   Anum_pg_statistic_ext_stxdependencies, &isnull);
+	deps = SysCacheGetAttr(STATEXTDATASTXOID, htup,
+						   Anum_pg_statistic_ext_data_stxdependencies, &isnull);
 	if (isnull)
 		elog(ERROR,
 			 "requested statistic kind \"%c\" is not yet built for statistics object %u",
diff --git a/src/backend/statistics/extended_stats.c b/src/backend/statistics/extended_stats.c
index ab187915c1..ed24231fd9 100644
--- a/src/backend/statistics/extended_stats.c
+++ b/src/backend/statistics/extended_stats.c
@@ -23,6 +23,7 @@
 #include "catalog/indexing.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_statistic_ext.h"
+#include "catalog/pg_statistic_ext_data.h"
 #include "nodes/nodeFuncs.h"
 #include "optimizer/clauses.h"
 #include "optimizer/optimizer.h"
@@ -65,9 +66,9 @@ typedef struct StatExtEntry
 static List *fetch_statentries_for_relation(Relation pg_statext, Oid relid);
 static VacAttrStats **lookup_var_attr_stats(Relation rel, Bitmapset *attrs,
 											int nvacatts, VacAttrStats **vacatts);
-static void statext_store(Relation pg_stext, Oid relid,
+static void statext_store(Oid relid,
 						  MVNDistinct *ndistinct, MVDependencies *dependencies,
-						  MCVList *mcvlist, VacAttrStats **stats);
+						  MCVList *mcv, VacAttrStats **stats);
 
 
 /*
@@ -145,7 +146,7 @@ BuildRelationExtStatistics(Relation onerel, double totalrows,
 		}
 
 		/* store the statistics in the catalog */
-		statext_store(pg_stext, stat->statOid, ndistinct, dependencies, mcv, stats);
+		statext_store(stat->statOid, ndistinct, dependencies, mcv, stats);
 	}
 
 	table_close(pg_stext, RowExclusiveLock);
@@ -156,7 +157,7 @@ BuildRelationExtStatistics(Relation onerel, double totalrows,
 
 /*
  * statext_is_kind_built
- *		Is this stat kind built in the given pg_statistic_ext tuple?
+ *		Is this stat kind built in the given pg_statistic_ext_data tuple?
  */
 bool
 statext_is_kind_built(HeapTuple htup, char type)
@@ -166,15 +167,15 @@ statext_is_kind_built(HeapTuple htup, char type)
 	switch (type)
 	{
 		case STATS_EXT_NDISTINCT:
-			attnum = Anum_pg_statistic_ext_stxndistinct;
+			attnum = Anum_pg_statistic_ext_data_stxndistinct;
 			break;
 
 		case STATS_EXT_DEPENDENCIES:
-			attnum = Anum_pg_statistic_ext_stxdependencies;
+			attnum = Anum_pg_statistic_ext_data_stxdependencies;
 			break;
 
 		case STATS_EXT_MCV:
-			attnum = Anum_pg_statistic_ext_stxmcv;
+			attnum = Anum_pg_statistic_ext_data_stxmcv;
 			break;
 
 		default:
@@ -312,70 +313,76 @@ lookup_var_attr_stats(Relation rel, Bitmapset *attrs,
 
 /*
  * statext_store
- *	Serializes the statistics and stores them into the pg_statistic_ext tuple.
+ *	Serializes the statistics and stores them into the pg_statistic_ext_data
+ *	tuple.
  */
 static void
-statext_store(Relation pg_stext, Oid statOid,
+statext_store(Oid statOid,
 			  MVNDistinct *ndistinct, MVDependencies *dependencies,
 			  MCVList *mcv, VacAttrStats **stats)
 {
 	HeapTuple	stup,
 				oldtup;
-	Datum		values[Natts_pg_statistic_ext];
-	bool		nulls[Natts_pg_statistic_ext];
-	bool		replaces[Natts_pg_statistic_ext];
+	Datum		values[Natts_pg_statistic_ext_data];
+	bool		nulls[Natts_pg_statistic_ext_data];
+	bool		replaces[Natts_pg_statistic_ext_data];
+	Relation	pg_stextdata;
 
 	memset(nulls, true, sizeof(nulls));
 	memset(replaces, false, sizeof(replaces));
 	memset(values, 0, sizeof(values));
 
 	/*
-	 * Construct a new pg_statistic_ext tuple, replacing the calculated stats.
+	 * Construct a new pg_statistic_ext_data tuple, replacing the calculated
+	 * stats.
 	 */
 	if (ndistinct != NULL)
 	{
 		bytea	   *data = statext_ndistinct_serialize(ndistinct);
 
-		nulls[Anum_pg_statistic_ext_stxndistinct - 1] = (data == NULL);
-		values[Anum_pg_statistic_ext_stxndistinct - 1] = PointerGetDatum(data);
+		nulls[Anum_pg_statistic_ext_data_stxndistinct - 1] = (data == NULL);
+		values[Anum_pg_statistic_ext_data_stxndistinct - 1] = PointerGetDatum(data);
 	}
 
 	if (dependencies != NULL)
 	{
 		bytea	   *data = statext_dependencies_serialize(dependencies);
 
-		nulls[Anum_pg_statistic_ext_stxdependencies - 1] = (data == NULL);
-		values[Anum_pg_statistic_ext_stxdependencies - 1] = PointerGetDatum(data);
+		nulls[Anum_pg_statistic_ext_data_stxdependencies - 1] = (data == NULL);
+		values[Anum_pg_statistic_ext_data_stxdependencies - 1] = PointerGetDatum(data);
 	}
-
 	if (mcv != NULL)
 	{
 		bytea	   *data = statext_mcv_serialize(mcv, stats);
 
-		nulls[Anum_pg_statistic_ext_stxmcv - 1] = (data == NULL);
-		values[Anum_pg_statistic_ext_stxmcv - 1] = PointerGetDatum(data);
+		nulls[Anum_pg_statistic_ext_data_stxmcv - 1] = (data == NULL);
+		values[Anum_pg_statistic_ext_data_stxmcv - 1] = PointerGetDatum(data);
 	}
 
 	/* always replace the value (either by bytea or NULL) */
-	replaces[Anum_pg_statistic_ext_stxndistinct - 1] = true;
-	replaces[Anum_pg_statistic_ext_stxdependencies - 1] = true;
-	replaces[Anum_pg_statistic_ext_stxmcv - 1] = true;
+	replaces[Anum_pg_statistic_ext_data_stxndistinct - 1] = true;
+	replaces[Anum_pg_statistic_ext_data_stxdependencies - 1] = true;
+	replaces[Anum_pg_statistic_ext_data_stxmcv - 1] = true;
 
-	/* there should already be a pg_statistic_ext tuple */
-	oldtup = SearchSysCache1(STATEXTOID, ObjectIdGetDatum(statOid));
+	/* there should already be a pg_statistic_ext_data tuple */
+	oldtup = SearchSysCache1(STATEXTDATASTXOID, ObjectIdGetDatum(statOid));
 	if (!HeapTupleIsValid(oldtup))
 		elog(ERROR, "cache lookup failed for statistics object %u", statOid);
 
 	/* replace it */
+	pg_stextdata = table_open(StatisticExtDataRelationId, RowExclusiveLock);
+
 	stup = heap_modify_tuple(oldtup,
-							 RelationGetDescr(pg_stext),
+							 RelationGetDescr(pg_stextdata),
 							 values,
 							 nulls,
 							 replaces);
 	ReleaseSysCache(oldtup);
-	CatalogTupleUpdate(pg_stext, &stup->t_self, stup);
+	CatalogTupleUpdate(pg_stextdata, &stup->t_self, stup);
 
 	heap_freetuple(stup);
+
+	table_close(pg_stextdata, RowExclusiveLock);
 }
 
 /* initialize multi-dimensional sort */
diff --git a/src/backend/statistics/mcv.c b/src/backend/statistics/mcv.c
index d1f0fd55e8..56f3b183a1 100644
--- a/src/backend/statistics/mcv.c
+++ b/src/backend/statistics/mcv.c
@@ -19,6 +19,7 @@
 #include "access/htup_details.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_statistic_ext.h"
+#include "catalog/pg_statistic_ext_data.h"
 #include "fmgr.h"
 #include "funcapi.h"
 #include "nodes/nodeFuncs.h"
@@ -429,13 +430,13 @@ statext_mcv_load(Oid mvoid)
 	MCVList    *result;
 	bool		isnull;
 	Datum		mcvlist;
-	HeapTuple	htup = SearchSysCache1(STATEXTOID, ObjectIdGetDatum(mvoid));
+	HeapTuple	htup = SearchSysCache1(STATEXTDATASTXOID, ObjectIdGetDatum(mvoid));
 
 	if (!HeapTupleIsValid(htup))
 		elog(ERROR, "cache lookup failed for statistics object %u", mvoid);
 
-	mcvlist = SysCacheGetAttr(STATEXTOID, htup,
-							  Anum_pg_statistic_ext_stxmcv, &isnull);
+	mcvlist = SysCacheGetAttr(STATEXTDATASTXOID, htup,
+							  Anum_pg_statistic_ext_data_stxmcv, &isnull);
 
 	if (isnull)
 		elog(ERROR,
diff --git a/src/backend/statistics/mvdistinct.c b/src/backend/statistics/mvdistinct.c
index 7432a6a396..10fb82d931 100644
--- a/src/backend/statistics/mvdistinct.c
+++ b/src/backend/statistics/mvdistinct.c
@@ -27,6 +27,7 @@
 
 #include "access/htup_details.h"
 #include "catalog/pg_statistic_ext.h"
+#include "catalog/pg_statistic_ext_data.h"
 #include "utils/fmgrprotos.h"
 #include "utils/lsyscache.h"
 #include "lib/stringinfo.h"
@@ -145,12 +146,12 @@ statext_ndistinct_load(Oid mvoid)
 	Datum		ndist;
 	HeapTuple	htup;
 
-	htup = SearchSysCache1(STATEXTOID, ObjectIdGetDatum(mvoid));
+	htup = SearchSysCache1(STATEXTDATASTXOID, ObjectIdGetDatum(mvoid));
 	if (!HeapTupleIsValid(htup))
 		elog(ERROR, "cache lookup failed for statistics object %u", mvoid);
 
-	ndist = SysCacheGetAttr(STATEXTOID, htup,
-							Anum_pg_statistic_ext_stxndistinct, &isnull);
+	ndist = SysCacheGetAttr(STATEXTDATASTXOID, htup,
+							Anum_pg_statistic_ext_data_stxndistinct, &isnull);
 	if (isnull)
 		elog(ERROR,
 			 "requested statistic kind \"%c\" is not yet built for statistics object %u",
diff --git a/src/backend/utils/cache/syscache.c b/src/backend/utils/cache/syscache.c
index 476538354d..99976468e5 100644
--- a/src/backend/utils/cache/syscache.c
+++ b/src/backend/utils/cache/syscache.c
@@ -62,6 +62,7 @@
 #include "catalog/pg_replication_origin.h"
 #include "catalog/pg_statistic.h"
 #include "catalog/pg_statistic_ext.h"
+#include "catalog/pg_statistic_ext_data.h"
 #include "catalog/pg_subscription.h"
 #include "catalog/pg_subscription_rel.h"
 #include "catalog/pg_tablespace.h"
@@ -727,6 +728,17 @@ static const struct cachedesc cacheinfo[] = {
 		},
 		32
 	},
+	{StatisticExtDataRelationId, /* STATEXTDATASTXOID */
+		StatisticExtDataStxoidIndexId,
+		1,
+		{
+			Anum_pg_statistic_ext_data_stxoid,
+			0,
+			0,
+			0
+		},
+		4
+	},
 	{StatisticExtRelationId,	/* STATEXTNAMENSP */
 		StatisticExtNameIndexId,
 		2,
diff --git a/src/include/catalog/indexing.h b/src/include/catalog/indexing.h
index 5f2aee8a4a..ef4445b017 100644
--- a/src/include/catalog/indexing.h
+++ b/src/include/catalog/indexing.h
@@ -186,13 +186,6 @@ DECLARE_UNIQUE_INDEX(pg_largeobject_loid_pn_index, 2683, on pg_largeobject using
 DECLARE_UNIQUE_INDEX(pg_largeobject_metadata_oid_index, 2996, on pg_largeobject_metadata using btree(oid oid_ops));
 #define LargeObjectMetadataOidIndexId	2996
 
-DECLARE_UNIQUE_INDEX(pg_statistic_ext_oid_index, 3380, on pg_statistic_ext using btree(oid oid_ops));
-#define StatisticExtOidIndexId	3380
-DECLARE_UNIQUE_INDEX(pg_statistic_ext_name_index, 3997, on pg_statistic_ext using btree(stxname name_ops, stxnamespace oid_ops));
-#define StatisticExtNameIndexId 3997
-DECLARE_INDEX(pg_statistic_ext_relid_index, 3379, on pg_statistic_ext using btree(stxrelid oid_ops));
-#define StatisticExtRelidIndexId 3379
-
 DECLARE_UNIQUE_INDEX(pg_namespace_nspname_index, 2684, on pg_namespace using btree(nspname name_ops));
 #define NamespaceNameIndexId  2684
 DECLARE_UNIQUE_INDEX(pg_namespace_oid_index, 2685, on pg_namespace using btree(oid oid_ops));
@@ -237,6 +230,16 @@ DECLARE_INDEX(pg_shdepend_reference_index, 1233, on pg_shdepend using btree(refc
 DECLARE_UNIQUE_INDEX(pg_statistic_relid_att_inh_index, 2696, on pg_statistic using btree(starelid oid_ops, staattnum int2_ops, stainherit bool_ops));
 #define StatisticRelidAttnumInhIndexId	2696
 
+DECLARE_UNIQUE_INDEX(pg_statistic_ext_oid_index, 3380, on pg_statistic_ext using btree(oid oid_ops));
+#define StatisticExtOidIndexId	3380
+DECLARE_UNIQUE_INDEX(pg_statistic_ext_name_index, 3997, on pg_statistic_ext using btree(stxname name_ops, stxnamespace oid_ops));
+#define StatisticExtNameIndexId 3997
+DECLARE_INDEX(pg_statistic_ext_relid_index, 3379, on pg_statistic_ext using btree(stxrelid oid_ops));
+#define StatisticExtRelidIndexId 3379
+
+DECLARE_UNIQUE_INDEX(pg_statistic_ext_data_stxoid_index, 3433, on pg_statistic_ext_data using btree(stxoid oid_ops));
+#define StatisticExtDataStxoidIndexId 3433
+
 DECLARE_UNIQUE_INDEX(pg_tablespace_oid_index, 2697, on pg_tablespace using btree(oid oid_ops));
 #define TablespaceOidIndexId  2697
 DECLARE_UNIQUE_INDEX(pg_tablespace_spcname_index, 2698, on pg_tablespace using btree(spcname name_ops));
diff --git a/src/include/catalog/pg_statistic_ext.h b/src/include/catalog/pg_statistic_ext.h
index e449f9efe8..d8c5e0651e 100644
--- a/src/include/catalog/pg_statistic_ext.h
+++ b/src/include/catalog/pg_statistic_ext.h
@@ -1,8 +1,12 @@
 /*-------------------------------------------------------------------------
  *
  * pg_statistic_ext.h
- *	  definition of the "extended statistics" system catalog (pg_statistic_ext)
+ *	  definition of the "extended statistics" system catalog
+ *	  (pg_statistic_ext)
  *
+ * Note that pg_statistic_ext contains the definitions of extended statistics
+ * objects, created by CREATE STATISTICS, but not the actual statistical data,
+ * created by running ANALYZE.
  *
  * Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
@@ -47,9 +51,6 @@ CATALOG(pg_statistic_ext,3381,StatisticExtRelationId)
 #ifdef CATALOG_VARLEN
 	char		stxkind[1] BKI_FORCE_NOT_NULL;	/* statistics kinds requested
 												 * to build */
-	pg_ndistinct stxndistinct;	/* ndistinct coefficients (serialized) */
-	pg_dependencies stxdependencies;	/* dependencies (serialized) */
-	pg_mcv_list stxmcv;			/* MCV (serialized) */
 #endif
 
 } FormData_pg_statistic_ext;
diff --git a/src/include/catalog/pg_statistic_ext_data.h b/src/include/catalog/pg_statistic_ext_data.h
new file mode 100644
index 0000000000..76a38d7753
--- /dev/null
+++ b/src/include/catalog/pg_statistic_ext_data.h
@@ -0,0 +1,52 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_statistic_ext_data.h
+ *	  definition of the "extended statistics data" system catalog
+ *	  (pg_statistic_ext_data)
+ *
+ * This catalog stores the statistical data for extended statistics objects.
+ *
+ * Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/catalog/pg_statistic_ext_data.h
+ *
+ * NOTES
+ *	  The Catalog.pm module reads this file and derives schema
+ *	  information.
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PG_STATISTIC_EXT_DATA_H
+#define PG_STATISTIC_EXT_DATA_H
+
+#include "catalog/genbki.h"
+#include "catalog/pg_statistic_ext_data_d.h"
+
+/* ----------------
+ *		pg_statistic_ext_data definition.  cpp turns this into
+ *		typedef struct FormData_pg_statistic_ext_data
+ * ----------------
+ */
+CATALOG(pg_statistic_ext_data,3429,StatisticExtDataRelationId)
+{
+	Oid			stxoid;			/* statistics object this data is for */
+
+#ifdef CATALOG_VARLEN			/* variable-length fields start here */
+
+	pg_ndistinct stxndistinct;	/* ndistinct coefficients (serialized) */
+	pg_dependencies stxdependencies;	/* dependencies (serialized) */
+	pg_mcv_list stxmcv;			/* MCV (serialized) */
+
+#endif
+
+} FormData_pg_statistic_ext_data;
+
+/* ----------------
+ *		Form_pg_statistic_ext_data corresponds to a pointer to a tuple with
+ *		the format of pg_statistic_ext_data relation.
+ * ----------------
+ */
+typedef FormData_pg_statistic_ext_data *Form_pg_statistic_ext_data;
+
+#endif							/* PG_STATISTIC_EXT_DATA_H */
diff --git a/src/include/catalog/toasting.h b/src/include/catalog/toasting.h
index a9e633d6bb..cc5dfed0bf 100644
--- a/src/include/catalog/toasting.h
+++ b/src/include/catalog/toasting.h
@@ -70,6 +70,7 @@ DECLARE_TOAST(pg_rewrite, 2838, 2839);
 DECLARE_TOAST(pg_seclabel, 3598, 3599);
 DECLARE_TOAST(pg_statistic, 2840, 2841);
 DECLARE_TOAST(pg_statistic_ext, 3439, 3440);
+DECLARE_TOAST(pg_statistic_ext_data, 3430, 3431);
 DECLARE_TOAST(pg_trigger, 2336, 2337);
 DECLARE_TOAST(pg_ts_dict, 4169, 4170);
 DECLARE_TOAST(pg_type, 4171, 4172);
diff --git a/src/include/utils/syscache.h b/src/include/utils/syscache.h
index a6307474ee..918765cc99 100644
--- a/src/include/utils/syscache.h
+++ b/src/include/utils/syscache.h
@@ -86,6 +86,7 @@ enum SysCacheIdentifier
 	REPLORIGNAME,
 	RULERELNAME,
 	SEQRELID,
+	STATEXTDATASTXOID,
 	STATEXTNAMENSP,
 	STATEXTOID,
 	STATRELATTINH,
diff --git a/src/test/regress/expected/oidjoins.out b/src/test/regress/expected/oidjoins.out
index 4edc8175aa..1302cc271b 100644
--- a/src/test/regress/expected/oidjoins.out
+++ b/src/test/regress/expected/oidjoins.out
@@ -985,6 +985,14 @@ WHERE	stxowner != 0 AND
 ------+----------
 (0 rows)
 
+SELECT	ctid, stxoid
+FROM	pg_catalog.pg_statistic_ext_data fk
+WHERE	stxoid != 0 AND
+	NOT EXISTS(SELECT 1 FROM pg_catalog.pg_statistic_ext pk WHERE pk.oid = fk.stxoid);
+ ctid | stxoid 
+------+--------
+(0 rows)
+
 SELECT	ctid, spcowner
 FROM	pg_catalog.pg_tablespace fk
 WHERE	spcowner != 0 AND
diff --git a/src/test/regress/expected/sanity_check.out b/src/test/regress/expected/sanity_check.out
index 392e8a4957..8ff0da185e 100644
--- a/src/test/regress/expected/sanity_check.out
+++ b/src/test/regress/expected/sanity_check.out
@@ -149,6 +149,7 @@ pg_shdescription|t
 pg_shseclabel|t
 pg_statistic|t
 pg_statistic_ext|t
+pg_statistic_ext_data|t
 pg_subscription|t
 pg_subscription_rel|t
 pg_tablespace|t
diff --git a/src/test/regress/expected/stats_ext.out b/src/test/regress/expected/stats_ext.out
index 6dfca7a606..afdd14a7ec 100644
--- a/src/test/regress/expected/stats_ext.out
+++ b/src/test/regress/expected/stats_ext.out
@@ -199,8 +199,10 @@ SELECT * FROM check_estimated_rows('SELECT COUNT(*) FROM ndistinct GROUP BY b, c
 -- correct command
 CREATE STATISTICS s10 ON a, b, c FROM ndistinct;
 ANALYZE ndistinct;
-SELECT stxkind, stxndistinct
-  FROM pg_statistic_ext WHERE stxrelid = 'ndistinct'::regclass;
+SELECT s.stxkind, d.stxndistinct
+  FROM pg_statistic_ext s, pg_statistic_ext_data d
+ WHERE s.stxrelid = 'ndistinct'::regclass
+   AND d.stxoid = s.oid;
  stxkind |                    stxndistinct                     
 ---------+-----------------------------------------------------
  {d,f,m} | {"3, 4": 11, "3, 6": 11, "4, 6": 11, "3, 4, 6": 11}
@@ -246,8 +248,10 @@ INSERT INTO ndistinct (a, b, c, filler1)
             cash_words(mod(i,33)::int::money)
        FROM generate_series(1,5000) s(i);
 ANALYZE ndistinct;
-SELECT stxkind, stxndistinct
-  FROM pg_statistic_ext WHERE stxrelid = 'ndistinct'::regclass;
+SELECT s.stxkind, d.stxndistinct
+  FROM pg_statistic_ext s, pg_statistic_ext_data d
+ WHERE s.stxrelid = 'ndistinct'::regclass
+   AND d.stxoid = s.oid;
  stxkind |                        stxndistinct                        
 ---------+------------------------------------------------------------
  {d,f,m} | {"3, 4": 2550, "3, 6": 800, "4, 6": 1632, "3, 4, 6": 5000}
@@ -285,8 +289,10 @@ SELECT * FROM check_estimated_rows('SELECT COUNT(*) FROM ndistinct GROUP BY a, d
 (1 row)
 
 DROP STATISTICS s10;
-SELECT stxkind, stxndistinct
-  FROM pg_statistic_ext WHERE stxrelid = 'ndistinct'::regclass;
+SELECT s.stxkind, d.stxndistinct
+  FROM pg_statistic_ext s, pg_statistic_ext_data d
+ WHERE s.stxrelid = 'ndistinct'::regclass
+   AND d.stxoid = s.oid;
  stxkind | stxndistinct 
 ---------+--------------
 (0 rows)
@@ -537,7 +543,10 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a <= 4 AND b <
 
 -- check change of unrelated column type does not reset the MCV statistics
 ALTER TABLE mcv_lists ALTER COLUMN d TYPE VARCHAR(64);
-SELECT stxmcv IS NOT NULL FROM pg_statistic_ext WHERE stxname = 'mcv_lists_stats';
+SELECT d.stxmcv IS NOT NULL
+  FROM pg_statistic_ext s, pg_statistic_ext_data d
+ WHERE s.stxname = 'mcv_lists_stats'
+   AND d.stxoid = s.oid;
  ?column? 
 ----------
  t
@@ -600,8 +609,11 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a IS NULL AND
 TRUNCATE mcv_lists;
 INSERT INTO mcv_lists (a, b, c) SELECT 1, 2, 3 FROM generate_series(1,1000) s(i);
 ANALYZE mcv_lists;
-SELECT m.* FROM pg_statistic_ext,
-              pg_mcv_list_items(stxmcv) m WHERE stxname = 'mcv_lists_stats';
+SELECT m.*
+  FROM pg_statistic_ext s, pg_statistic_ext_data d,
+       pg_mcv_list_items(d.stxmcv) m
+ WHERE s.stxname = 'mcv_lists_stats'
+   AND d.stxoid = s.oid;
  index |  values   |  nulls  | frequency | base_frequency 
 -------+-----------+---------+-----------+----------------
      0 | {1, 2, 3} | {f,f,f} |         1 |              1
diff --git a/src/test/regress/sql/oidjoins.sql b/src/test/regress/sql/oidjoins.sql
index dbe4a5857d..b774cbca5b 100644
--- a/src/test/regress/sql/oidjoins.sql
+++ b/src/test/regress/sql/oidjoins.sql
@@ -493,6 +493,10 @@ SELECT	ctid, stxowner
 FROM	pg_catalog.pg_statistic_ext fk
 WHERE	stxowner != 0 AND
 	NOT EXISTS(SELECT 1 FROM pg_catalog.pg_authid pk WHERE pk.oid = fk.stxowner);
+SELECT	ctid, stxoid
+FROM	pg_catalog.pg_statistic_ext_data fk
+WHERE	stxoid != 0 AND
+	NOT EXISTS(SELECT 1 FROM pg_catalog.pg_statistic_ext pk WHERE pk.oid = fk.stxoid);
 SELECT	ctid, spcowner
 FROM	pg_catalog.pg_tablespace fk
 WHERE	spcowner != 0 AND
diff --git a/src/test/regress/sql/stats_ext.sql b/src/test/regress/sql/stats_ext.sql
index c6a5776120..7454f643c2 100644
--- a/src/test/regress/sql/stats_ext.sql
+++ b/src/test/regress/sql/stats_ext.sql
@@ -144,8 +144,10 @@ CREATE STATISTICS s10 ON a, b, c FROM ndistinct;
 
 ANALYZE ndistinct;
 
-SELECT stxkind, stxndistinct
-  FROM pg_statistic_ext WHERE stxrelid = 'ndistinct'::regclass;
+SELECT s.stxkind, d.stxndistinct
+  FROM pg_statistic_ext s, pg_statistic_ext_data d
+ WHERE s.stxrelid = 'ndistinct'::regclass
+   AND d.stxoid = s.oid;
 
 -- Hash Aggregate, thanks to estimates improved by the statistic
 SELECT * FROM check_estimated_rows('SELECT COUNT(*) FROM ndistinct GROUP BY a, b');
@@ -170,8 +172,10 @@ INSERT INTO ndistinct (a, b, c, filler1)
 
 ANALYZE ndistinct;
 
-SELECT stxkind, stxndistinct
-  FROM pg_statistic_ext WHERE stxrelid = 'ndistinct'::regclass;
+SELECT s.stxkind, d.stxndistinct
+  FROM pg_statistic_ext s, pg_statistic_ext_data d
+ WHERE s.stxrelid = 'ndistinct'::regclass
+   AND d.stxoid = s.oid;
 
 -- correct esimates
 SELECT * FROM check_estimated_rows('SELECT COUNT(*) FROM ndistinct GROUP BY a, b');
@@ -186,8 +190,10 @@ SELECT * FROM check_estimated_rows('SELECT COUNT(*) FROM ndistinct GROUP BY a, d
 
 DROP STATISTICS s10;
 
-SELECT stxkind, stxndistinct
-  FROM pg_statistic_ext WHERE stxrelid = 'ndistinct'::regclass;
+SELECT s.stxkind, d.stxndistinct
+  FROM pg_statistic_ext s, pg_statistic_ext_data d
+ WHERE s.stxrelid = 'ndistinct'::regclass
+   AND d.stxoid = s.oid;
 
 -- dropping the statistics results in under-estimates
 SELECT * FROM check_estimated_rows('SELECT COUNT(*) FROM ndistinct GROUP BY a, b');
@@ -335,7 +341,10 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a <= 4 AND b <
 -- check change of unrelated column type does not reset the MCV statistics
 ALTER TABLE mcv_lists ALTER COLUMN d TYPE VARCHAR(64);
 
-SELECT stxmcv IS NOT NULL FROM pg_statistic_ext WHERE stxname = 'mcv_lists_stats';
+SELECT d.stxmcv IS NOT NULL
+  FROM pg_statistic_ext s, pg_statistic_ext_data d
+ WHERE s.stxname = 'mcv_lists_stats'
+   AND d.stxoid = s.oid;
 
 -- check change of column type resets the MCV statistics
 ALTER TABLE mcv_lists ALTER COLUMN c TYPE numeric;
@@ -378,8 +387,11 @@ TRUNCATE mcv_lists;
 INSERT INTO mcv_lists (a, b, c) SELECT 1, 2, 3 FROM generate_series(1,1000) s(i);
 ANALYZE mcv_lists;
 
-SELECT m.* FROM pg_statistic_ext,
-              pg_mcv_list_items(stxmcv) m WHERE stxname = 'mcv_lists_stats';
+SELECT m.*
+  FROM pg_statistic_ext s, pg_statistic_ext_data d,
+       pg_mcv_list_items(d.stxmcv) m
+ WHERE s.stxname = 'mcv_lists_stats'
+   AND d.stxoid = s.oid;
 
 -- mcv with arrays
 CREATE TABLE mcv_lists_arrays (
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 8cc033eb13..bdcbc8d15e 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -729,6 +729,7 @@ FormData_pg_sequence_data
 FormData_pg_shdepend
 FormData_pg_statistic
 FormData_pg_statistic_ext
+FormData_pg_statistic_ext_data
 FormData_pg_subscription
 FormData_pg_subscription_rel
 FormData_pg_tablespace
@@ -786,6 +787,7 @@ Form_pg_sequence_data
 Form_pg_shdepend
 Form_pg_statistic
 Form_pg_statistic_ext
+Form_pg_statistic_ext_data
 Form_pg_subscription
 Form_pg_subscription_rel
 Form_pg_tablespace
-- 
2.20.1

0002-update-docs-after-splitting-stats.patchtext/plain; charset=us-asciiDownload

From ee6263b1338a219c25892e8321688701dd608b8c Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@2ndquadrant.com>
Date: Thu, 6 Jun 2019 16:08:49 +0200
Subject: [PATCH 2/3] update docs after splitting stats

---
 doc/src/sgml/catalogs.sgml  | 62 ++++++++++++++++++++++++++++++++-----
 doc/src/sgml/func.sgml      |  4 +--
 doc/src/sgml/perform.sgml   | 12 ++++---
 doc/src/sgml/planstats.sgml |  2 +-
 4 files changed, 66 insertions(+), 14 deletions(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 36193d1491..a17e22540c 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -297,7 +297,12 @@
 
      <row>
       <entry><link linkend="catalog-pg-statistic-ext"><structname>pg_statistic_ext</structname></link></entry>
-      <entry>extended planner statistics</entry>
+      <entry>extended planner statistics (definition)</entry>
+     </row>
+
+     <row>
+      <entry><link linkend="catalog-pg-statistic-ext-data"><structname>pg_statistic_ext_data</structname></link></entry>
+      <entry>extended planner statistics (built statistics)</entry>
      </row>
 
      <row>
@@ -6581,6 +6586,55 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
       </entry>
      </row>
 
+    </tbody>
+   </tgroup>
+  </table>
+
+  <para>
+   The <structfield>stxkind</structfield> field is filled at creation of the
+   statistics object, indicating which statistic type(s) are desired. The
+   statistics (once computed by <command>ANALYZE</command>) are stored in
+   <link linkend="catalog-pg-statistic-ext-data"><structname>pg_statistic_ext_data</structname></link>
+   catalog.
+  </para>
+ </sect1>
+
+ <sect1 id="catalog-pg-statistic-ext-data">
+  <title><structname>pg_statistic_ext_data</structname></title>
+
+  <indexterm zone="catalog-pg-statistic-ext">
+   <primary>pg_statistic_ext</primary>
+  </indexterm>
+
+  <para>
+   The catalog <structname>pg_statistic_ext</structname>
+   holds extended planner statistics.
+   Each row in this catalog corresponds to a <firstterm>statistics object</firstterm>
+   created with <xref linkend="sql-createstatistics"/>.
+  </para>
+
+  <table>
+   <title><structname>pg_statistic_ext_data</structname> Columns</title>
+
+   <tgroup cols="4">
+    <thead>
+     <row>
+      <entry>Name</entry>
+      <entry>Type</entry>
+      <entry>References</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+
+    <tbody>
+
+     <row>
+      <entry><structfield>stxoid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="catalog-pg-statistic-ext"><structname>pg_statistic_ext</structname></link>.oid</literal></entry>
+      <entry>Extended statistic containing the definition for this data.</entry>
+     </row>
+
      <row>
       <entry><structfield>stxndistinct</structfield></entry>
       <entry><type>pg_ndistinct</type></entry>
@@ -6614,12 +6668,6 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
    </tgroup>
   </table>
 
-  <para>
-   The <structfield>stxkind</structfield> field is filled at creation of the
-   statistics object, indicating which statistic type(s) are desired.
-   The fields after it are initially NULL and are filled only when the
-   corresponding statistic has been computed by <command>ANALYZE</command>.
-  </para>
  </sect1>
 
  <sect1 id="catalog-pg-subscription">
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 8debb8cbbc..6bdbba74dd 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -22385,12 +22385,12 @@ CREATE EVENT TRIGGER test_table_rewrite_oid
     The <function>pg_mcv_list_items</function> function can be used like this:
 
 <programlisting>
-SELECT m.* FROM pg_statistic_ext,
+SELECT m.* FROM pg_statistic_ext join pg_statistic_ext_data on (oid = stxoid),
                 pg_mcv_list_items(stxmcv) m WHERE stxname = 'stts';
 </programlisting>
 
      Values of the <type>pg_mcv_list</type> can be obtained only from the
-     <literal>pg_statistic_ext.stxmcv</literal> column.
+     <literal>pg_statistic_ext_data.stxmcv</literal> column.
    </para>
   </sect2>
 
diff --git a/doc/src/sgml/perform.sgml b/doc/src/sgml/perform.sgml
index a84be85159..087b0e555f 100644
--- a/doc/src/sgml/perform.sgml
+++ b/doc/src/sgml/perform.sgml
@@ -1076,6 +1076,10 @@ WHERE tablename = 'road';
     <primary>pg_statistic_ext</primary>
    </indexterm>
 
+   <indexterm>
+    <primary>pg_statistic_ext_data</primary>
+   </indexterm>
+
    <para>
     It is common to see slow queries running bad execution plans because
     multiple columns used in the query clauses are correlated.
@@ -1104,7 +1108,7 @@ WHERE tablename = 'road';
     interest in the statistics.  Actual data collection is performed
     by <command>ANALYZE</command> (either a manual command, or background
     auto-analyze).  The collected values can be examined in the
-    <link linkend="catalog-pg-statistic-ext"><structname>pg_statistic_ext</structname></link>
+    <link linkend="catalog-pg-statistic-ext-data"><structname>pg_statistic_ext_data</structname></link>
     catalog.
    </para>
 
@@ -1173,7 +1177,7 @@ CREATE STATISTICS stts (dependencies) ON zip, city FROM zipcodes;
 ANALYZE zipcodes;
 
 SELECT stxname, stxkeys, stxdependencies
-  FROM pg_statistic_ext
+  FROM pg_statistic_ext join pg_statistic_ext_data on (oid = stxoid)
   WHERE stxname = 'stts';
  stxname | stxkeys |             stxdependencies               
 ---------+---------+------------------------------------------
@@ -1263,7 +1267,7 @@ CREATE STATISTICS stts2 (ndistinct) ON zip, state, city FROM zipcodes;
 ANALYZE zipcodes;
 
 SELECT stxkeys AS k, stxndistinct AS nd
-  FROM pg_statistic_ext
+  FROM pg_statistic_ext join pg_statistic_ext_data on (oid = stxoid)
   WHERE stxname = 'stts2';
 -[ RECORD 1 ]--------------------------------------------------------
 k  | 1 2 5
@@ -1317,7 +1321,7 @@ CREATE STATISTICS stts3 (mcv) ON state, city FROM zipcodes;
 
 ANALYZE zipcodes;
 
-SELECT m.* FROM pg_statistic_ext,
+SELECT m.* FROM pg_statistic_ext join pg_statistic_ext_data on (oid = stxoid),
                 pg_mcv_list_items(stxmcv) m WHERE stxname = 'stts3';
 
  index |         values         | nulls | frequency | base_frequency 
diff --git a/doc/src/sgml/planstats.sgml b/doc/src/sgml/planstats.sgml
index 4b1d3f4952..f85f91dd13 100644
--- a/doc/src/sgml/planstats.sgml
+++ b/doc/src/sgml/planstats.sgml
@@ -635,7 +635,7 @@ EXPLAIN (ANALYZE, TIMING OFF) SELECT * FROM t WHERE a = 1 AND b = 1;
     <function>pg_mcv_list_items</function> set-returning function.
 
 <programlisting>
-SELECT m.* FROM pg_statistic_ext,
+SELECT m.* FROM pg_statistic_ext join pg_statistic_ext_data on (oid = stxoid),
                 pg_mcv_list_items(stxmcv) m WHERE stxname = 'stts2';
  index |  values  | nulls | frequency | base_frequency 
 -------+----------+-------+-----------+----------------
-- 
2.20.1

0003-add-pg_stats_ext-view.patchtext/plain; charset=us-asciiDownload

From 289b425db32d739a2c0609f1b4b8fca1b465fac2 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@2ndquadrant.com>
Date: Thu, 6 Jun 2019 18:40:50 +0200
Subject: [PATCH 3/3] add pg_stats_ext view

---
 src/backend/catalog/system_views.sql | 41 ++++++++++++++++++++++++++++
 src/test/regress/expected/rules.out  | 29 ++++++++++++++++++++
 2 files changed, 70 insertions(+)

diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 78a103cdb9..5bcf24ee84 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -253,6 +253,47 @@ CREATE VIEW pg_stats WITH (security_barrier) AS
 
 REVOKE ALL on pg_statistic FROM public;
 
+CREATE VIEW pg_stats_ext WITH (security_barrier) AS
+    SELECT cn.nspname AS schemaname,
+           c.relname AS tablename,
+           sn.nspname AS statistics_schemaname,
+           s.stxname AS statistics_name,
+           pg_get_userbyid(c.relowner) AS statistics_owner,
+           ( SELECT array_agg(a.attname ORDER BY a.attnum)
+             FROM unnest(s.stxkeys) k
+                  JOIN pg_attribute a
+                       ON (a.attrelid = s.stxrelid AND a.attnum = k)
+           ) AS attnames,
+           s.stxkind AS kinds,
+           sd.stxndistinct AS n_distinct,
+           sd.stxdependencies AS dependencies,
+           m.most_common_vals,
+           m.most_common_val_nulls,
+           m.most_common_freqs,
+           m.most_common_base_freqs
+    FROM pg_statistic_ext s JOIN pg_class c ON (c.oid = s.stxrelid)
+         JOIN pg_statistic_ext_data sd ON (s.oid = sd.stxoid)
+         LEFT JOIN pg_namespace cn ON (cn.oid = c.relnamespace)
+         LEFT JOIN pg_namespace sn ON (sn.oid = s.stxnamespace)
+         LEFT JOIN LATERAL
+                   ( SELECT array_agg(values) AS most_common_vals,
+                            array_agg(nulls) AS most_common_val_nulls,
+                            array_agg(frequency) AS most_common_freqs,
+                            array_agg(base_frequency) AS most_common_base_freqs
+                     FROM pg_mcv_list_items(sd.stxmcv)
+                   ) m ON sd.stxmcv IS NOT NULL
+    WHERE NOT EXISTS
+              ( SELECT 1
+                FROM unnest(stxkeys) k
+                     JOIN pg_attribute a
+                          ON (a.attrelid = s.stxrelid AND a.attnum = k)
+                WHERE NOT has_column_privilege(c.oid, a.attnum, 'select') )
+    AND (c.relrowsecurity = false OR NOT row_security_active(c.oid));
+
+REVOKE ALL on pg_statistic_ext FROM public;
+GRANT SELECT (oid, stxrelid, stxname, stxnamespace, stxowner, stxkeys, stxkind)
+    ON pg_statistic_ext TO public;
+
 CREATE VIEW pg_publication_tables AS
     SELECT
         P.pubname AS pubname,
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 7d365c48d1..80faf46648 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2284,6 +2284,35 @@ pg_stats| SELECT n.nspname AS schemaname,
      JOIN pg_attribute a ON (((c.oid = a.attrelid) AND (a.attnum = s.staattnum))))
      LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)))
   WHERE ((NOT a.attisdropped) AND has_column_privilege(c.oid, a.attnum, 'select'::text) AND ((c.relrowsecurity = false) OR (NOT row_security_active(c.oid))));
+pg_stats_ext| SELECT cn.nspname AS schemaname,
+    c.relname AS tablename,
+    sn.nspname AS statistics_schemaname,
+    s.stxname AS statistics_name,
+    pg_get_userbyid(c.relowner) AS statistics_owner,
+    ( SELECT array_agg(a.attname ORDER BY a.attnum) AS array_agg
+           FROM (unnest(s.stxkeys) k(k)
+             JOIN pg_attribute a ON (((a.attrelid = s.stxrelid) AND (a.attnum = k.k))))) AS attnames,
+    s.stxkind AS kinds,
+    sd.stxndistinct AS n_distinct,
+    sd.stxdependencies AS dependencies,
+    m.most_common_vals,
+    m.most_common_val_nulls,
+    m.most_common_freqs,
+    m.most_common_base_freqs
+   FROM (((((pg_statistic_ext s
+     JOIN pg_class c ON ((c.oid = s.stxrelid)))
+     JOIN pg_statistic_ext_data sd ON ((s.oid = sd.stxoid)))
+     LEFT JOIN pg_namespace cn ON ((cn.oid = c.relnamespace)))
+     LEFT JOIN pg_namespace sn ON ((sn.oid = s.stxnamespace)))
+     LEFT JOIN LATERAL ( SELECT array_agg(pg_mcv_list_items."values") AS most_common_vals,
+            array_agg(pg_mcv_list_items.nulls) AS most_common_val_nulls,
+            array_agg(pg_mcv_list_items.frequency) AS most_common_freqs,
+            array_agg(pg_mcv_list_items.base_frequency) AS most_common_base_freqs
+           FROM pg_mcv_list_items(sd.stxmcv) pg_mcv_list_items(index, "values", nulls, frequency, base_frequency)) m ON ((sd.stxmcv IS NOT NULL)))
+  WHERE ((NOT (EXISTS ( SELECT 1
+           FROM (unnest(s.stxkeys) k(k)
+             JOIN pg_attribute a ON (((a.attrelid = s.stxrelid) AND (a.attnum = k.k))))
+          WHERE (NOT has_column_privilege(c.oid, a.attnum, 'select'::text))))) AND ((c.relrowsecurity = false) OR (NOT row_security_active(c.oid))));
 pg_tables| SELECT n.nspname AS schemaname,
     c.relname AS tablename,
     pg_get_userbyid(c.relowner) AS tableowner,
-- 
2.20.1

#29

Dean Rasheed

dean.a.rasheed@gmail.com

over 6 years ago

In reply to: Tomas Vondra (#28)

Re: Multivariate MCV stats can leak data to unprivileged users

On Thu, 6 Jun 2019 at 21:33, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

Hi,

Attached are three patches tweaking the stats - two were already posted
in this thread, the third one is just updating docs.

1) 0001 - split pg_statistic_ext into definition + data

This is pretty much the patch Dean posted some time ago, rebased to
current master (fixing just minor pgindent bitrot).

2) 0002 - update sgml docs to reflect changes from 0001

3) 0003 - define pg_stats_ext view, similar to pg_stats

Seems reasonable on a quick read-through, except I spotted a bug in
the view (my fault) -- the statistics_owner column should come from
s.stxowner rather than c.relowner.

The question is whether we want to also redesign pg_statistic_ext_data
per Tom's proposal (more about that later), but I think we can treat
that as an additional step on top of 0001. So I propose we get those
changes committed, and then perhaps also switch the data table to the
EAV model.

Barring objections, I'll do that early next week, after cleaning up
those patches a bit more.

One thing I think we should fix is naming of the attributes in the 0001
patch. At the moment both catalogs use "stx" prefix - e.g. "stxkind" is
in pg_statistic_ext, and "stxmcv" is in pg_statistic_ext_data. We should
probably switch to "stxd" in the _data catalog. Opinions?

Yes, that makes sense. Especially when joining the 2 tables, since it
makes it more obvious which table a given column is coming from in a
join clause.

Now, back to the proposal to split the _data catalog rows to EAV form,
with a new data type replacing the multiple types we have at the moment.
I've started hacking on it today, but the more I work on it the less
useful it seems to me.

My understanding is that with that approach we'd replace the _data
catalog (which currently has one column per statistic type, with a
separate data type) with 1:M generic rows, with a generic data type.
That is, we'd replace this

CREATE TABLE pg_statistic_ext_data (
stxoid OID,
stxdependencies pg_dependencies,
stxndistinct pg_ndistinct,
stxmcv pg_mcv_list,
... histograms ...
);

with something like this:

CREATE TABLE pg_statistiex_ext_data (
stxoid OID,
stxkind CHAR,
stxdata pg_statistic_ext_type
);

where pg_statistic_ext would store all existing statistic types. along
with a "flag" saying which value it actually stored (essentially a copy
of the stxkind column, which we however need to lookup a statistic of a
certain type, without having to detoast the statistic itself).

As I mentioned before, I kinda dislike the fact that this obfuscates the
actual statistic type by hiding it behing the "wrapper" type.

The other thing is that we have to deal with 1:M relationship every time
we (re)build the statistics, or when we need to access them. Now, it may
not be a huge amount of code, but it just seems unnecessary. It would
make sense if we planned to add large number of additional statistic
types, but that seems unlikely - I personally can think of maybe one new
statistic type, but that's about it.

I'll continue working on it and I'll share the results early next week,
after playing with it a bit, but I think we should get the existing
patches committed and then continue discussing this as an additional
improvement.

I wonder ... would it be completely crazy to just use a JSON column to
store the extended stats data?

It wouldn't be as compact as your representation, but it would allow
for future stats kinds without changing the catalog definitions, and
it wouldn't obfuscate the stats types. You could keep the 1:1
relationship, and have top-level JSON keys for each stats kind built,
and you wouldn't need the pg_mcv_list_items() function because you
could just put the MCV data in JSON arrays, which would be much more
transparent, and would make the user-accessible view much simpler. One
could also imagine writing regression tests that checked for specific
expected MCV values like "stxdata->'mcv'->'frequency'->0".

Just a thought.

Regards,
Dean

#30

John Naylor

john.naylor@2ndquadrant.com

over 6 years ago

In reply to: Tomas Vondra (#28)

Re: Multivariate MCV stats can leak data to unprivileged users

On Fri, Jun 7, 2019 at 4:33 AM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

2) 0002 - update sgml docs to reflect changes from 0001

There is some copypasta here in the new section referring to the old catalog:

+ <sect1 id="catalog-pg-statistic-ext-data">
+  <title><structname>pg_statistic_ext_data</structname></title>
+
+  <indexterm zone="catalog-pg-statistic-ext">
+   <primary>pg_statistic_ext</primary>
+  </indexterm>
+
+  <para>
+   The catalog <structname>pg_statistic_ext</structname>
+   holds extended planner statistics.
+   Each row in this catalog corresponds to a <firstterm>statistics
object</firstterm>
+   created with <xref linkend="sql-createstatistics"/>.
+  </para>

And a minor stylistic nit -- it might be good to capitalize "JOIN" and
"ON" in the queries in the docs and tests.

One thing I think we should fix is naming of the attributes in the 0001
patch. At the moment both catalogs use "stx" prefix - e.g. "stxkind" is
in pg_statistic_ext, and "stxmcv" is in pg_statistic_ext_data. We should
probably switch to "stxd" in the _data catalog. Opinions?

That's probably a good idea.

--
John Naylor https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#31

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 6 years ago

In reply to: Dean Rasheed (#29)

2 attachment(s)

Re: Multivariate MCV stats can leak data to unprivileged users

On Mon, Jun 10, 2019 at 02:32:04PM +0100, Dean Rasheed wrote:

On Thu, 6 Jun 2019 at 21:33, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

Hi,

Attached are three patches tweaking the stats - two were already posted
in this thread, the third one is just updating docs.

1) 0001 - split pg_statistic_ext into definition + data

This is pretty much the patch Dean posted some time ago, rebased to
current master (fixing just minor pgindent bitrot).

2) 0002 - update sgml docs to reflect changes from 0001

3) 0003 - define pg_stats_ext view, similar to pg_stats

Seems reasonable on a quick read-through, except I spotted a bug in
the view (my fault) -- the statistics_owner column should come from
s.stxowner rather than c.relowner.

The question is whether we want to also redesign pg_statistic_ext_data
per Tom's proposal (more about that later), but I think we can treat
that as an additional step on top of 0001. So I propose we get those
changes committed, and then perhaps also switch the data table to the
EAV model.

Barring objections, I'll do that early next week, after cleaning up
those patches a bit more.

One thing I think we should fix is naming of the attributes in the 0001
patch. At the moment both catalogs use "stx" prefix - e.g. "stxkind" is
in pg_statistic_ext, and "stxmcv" is in pg_statistic_ext_data. We should
probably switch to "stxd" in the _data catalog. Opinions?

Yes, that makes sense. Especially when joining the 2 tables, since it
makes it more obvious which table a given column is coming from in a
join clause.

OK, attached are patches fixing the issues reported by you and John
Naylor, and squashing the parts into just two patches (catalog split and
pg_stats_ext). Barring objections, I'll push those tomorrow.

I've renamed columns in the _data catalog from 'stx' to 'stxd', which I
think is appropriate given the "data" in catalog name.

I'm wondering if we should change the examples in SGML docs (say, in
planstats.sgml) to use the new pg_stats_ext view, instead of querying the
catalogs directly. I've tried doing that, but I found the results less
readable than what we currently have (especially for the MCV list, where
it'd require matching elements in multiple arrays). So I've left this
unchanged for now.

Now, back to the proposal to split the _data catalog rows to EAV form,
with a new data type replacing the multiple types we have at the moment.
I've started hacking on it today, but the more I work on it the less
useful it seems to me.

My understanding is that with that approach we'd replace the _data
catalog (which currently has one column per statistic type, with a
separate data type) with 1:M generic rows, with a generic data type.
That is, we'd replace this

CREATE TABLE pg_statistic_ext_data (
stxoid OID,
stxdependencies pg_dependencies,
stxndistinct pg_ndistinct,
stxmcv pg_mcv_list,
... histograms ...
);

with something like this:

CREATE TABLE pg_statistiex_ext_data (
stxoid OID,
stxkind CHAR,
stxdata pg_statistic_ext_type
);

where pg_statistic_ext would store all existing statistic types. along
with a "flag" saying which value it actually stored (essentially a copy
of the stxkind column, which we however need to lookup a statistic of a
certain type, without having to detoast the statistic itself).

As I mentioned before, I kinda dislike the fact that this obfuscates the
actual statistic type by hiding it behing the "wrapper" type.

The other thing is that we have to deal with 1:M relationship every time
we (re)build the statistics, or when we need to access them. Now, it may
not be a huge amount of code, but it just seems unnecessary. It would
make sense if we planned to add large number of additional statistic
types, but that seems unlikely - I personally can think of maybe one new
statistic type, but that's about it.

I'll continue working on it and I'll share the results early next week,
after playing with it a bit, but I think we should get the existing
patches committed and then continue discussing this as an additional
improvement.

I wonder ... would it be completely crazy to just use a JSON column to
store the extended stats data?

It wouldn't be as compact as your representation, but it would allow
for future stats kinds without changing the catalog definitions, and
it wouldn't obfuscate the stats types. You could keep the 1:1
relationship, and have top-level JSON keys for each stats kind built,
and you wouldn't need the pg_mcv_list_items() function because you
could just put the MCV data in JSON arrays, which would be much more
transparent, and would make the user-accessible view much simpler. One
could also imagine writing regression tests that checked for specific
expected MCV values like "stxdata->'mcv'->'frequency'->0".

You mean storing it as JSONB, I presume?

I've actually considered that at some point, but eventually concluded it's
not a good match. I mean, JSON(B) is pretty versatile and can be whacked
to store pretty much anything, but it has various limitations - e.g. it
does not support arbitrary data types, so we'd have to store a lot of
stuff as text (through input/output functions). That doesn't seem very
nice, I guess.

If we want JSONB output, that should not be difficult to generate. But I
guess your point was about generic storage format.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

0001-Rework-the-pg_statistic_ext-catalog.patchtext/plain; charset=us-asciiDownload

From 55942036a1c00ea5a2bc818eca6e00c3cb469381 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@2ndquadrant.com>
Date: Thu, 13 Jun 2019 17:19:21 +0200
Subject: [PATCH 1/2] Rework the pg_statistic_ext catalog

Since conception, the extended statistics were both defined and stored
in a single catalog pg_statistic_ext. That however is problematic when
a user is supposed to have access only to the definitions, but not to
data stored in the statistic objects.

Consider for example pg_dump on a database with RLS enabled - if the
pg_statistic_ext catalog respects RLS (which it probably should, as it
contains user data), the dump would not define any statistic objects.
That would be a rather surprising behavior, though.

Until now this was not a pressing issue, because the existing statistic
types (dependencies and ndistinct coefficients) did not actually include
any user data directly. This however changed with introduction of MCV
lists, which does include values for the most common combinations.

The easiest way to fix this is to split the pg_statistic_ext catalog
into two - one for definitions, one for the built statistic values.
The new catalog is called pg_statistic_ext_data, and we're maintaining
a 1:1 relationship with the old catalog - either there are matching
records in both catalogs, or neither of them.

Author: Dean Rasheed, with improvements by me
Reviewed-by: Dean Rasheed, John Naylor
Discussion: https://postgr.es/m/CAEZATCUhT9rt7Ui%3DVdx4N%3D%3DVV5XOK5dsXfnGgVOz_JhAicB%3DZA%40mail.gmail.com
---
 doc/src/sgml/catalogs.sgml                  | 70 ++++++++++++++++----
 doc/src/sgml/func.sgml                      |  6 +-
 doc/src/sgml/perform.sgml                   | 20 +++---
 doc/src/sgml/planstats.sgml                 |  4 +-
 src/backend/catalog/Makefile                |  2 +-
 src/backend/commands/statscmds.c            | 73 ++++++++++++++++-----
 src/backend/optimizer/util/plancat.c        | 12 +++-
 src/backend/statistics/README.mcv           |  9 ++-
 src/backend/statistics/dependencies.c       |  7 +-
 src/backend/statistics/extended_stats.c     | 61 +++++++++--------
 src/backend/statistics/mcv.c                |  7 +-
 src/backend/statistics/mvdistinct.c         |  7 +-
 src/backend/utils/cache/syscache.c          | 12 ++++
 src/include/catalog/indexing.h              | 17 +++--
 src/include/catalog/pg_statistic_ext.h      |  9 +--
 src/include/catalog/pg_statistic_ext_data.h | 52 +++++++++++++++
 src/include/catalog/toasting.h              |  1 +
 src/include/utils/syscache.h                |  1 +
 src/test/regress/expected/oidjoins.out      |  8 +++
 src/test/regress/expected/sanity_check.out  |  1 +
 src/test/regress/expected/stats_ext.out     | 38 +++++++----
 src/test/regress/sql/oidjoins.sql           |  4 ++
 src/test/regress/sql/stats_ext.sql          | 30 ++++++---
 src/tools/pgindent/typedefs.list            |  2 +
 24 files changed, 336 insertions(+), 117 deletions(-)
 create mode 100644 src/include/catalog/pg_statistic_ext_data.h

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 36193d1491..ef4345524a 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -297,7 +297,12 @@
 
      <row>
       <entry><link linkend="catalog-pg-statistic-ext"><structname>pg_statistic_ext</structname></link></entry>
-      <entry>extended planner statistics</entry>
+      <entry>extended planner statistics (definition)</entry>
+     </row>
+
+     <row>
+      <entry><link linkend="catalog-pg-statistic-ext-data"><structname>pg_statistic_ext_data</structname></link></entry>
+      <entry>extended planner statistics (built statistics)</entry>
      </row>
 
      <row>
@@ -6506,7 +6511,7 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
 
   <para>
    The catalog <structname>pg_statistic_ext</structname>
-   holds extended planner statistics.
+   holds definitions of extended planner statistics.
    Each row in this catalog corresponds to a <firstterm>statistics object</firstterm>
    created with <xref linkend="sql-createstatistics"/>.
   </para>
@@ -6581,8 +6586,57 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
       </entry>
      </row>
 
+    </tbody>
+   </tgroup>
+  </table>
+
+  <para>
+   The <structfield>stxkind</structfield> field is filled at creation of the
+   statistics object, indicating which statistic type(s) are desired. The
+   statistics (once computed by <command>ANALYZE</command>) are stored in
+   <link linkend="catalog-pg-statistic-ext-data"><structname>pg_statistic_ext_data</structname></link>
+   catalog.
+  </para>
+ </sect1>
+
+ <sect1 id="catalog-pg-statistic-ext-data">
+  <title><structname>pg_statistic_ext_data</structname></title>
+
+  <indexterm zone="catalog-pg-statistic-ext">
+   <primary>pg_statistic_ext_data</primary>
+  </indexterm>
+
+  <para>
+   The catalog <structname>pg_statistic_ext_data</structname>
+   holds data for extended planner statistics defined in <structname>pg_statistic_ext</structname>.
+   Each row in this catalog corresponds to a <firstterm>statistics object</firstterm>
+   created with <xref linkend="sql-createstatistics"/>.
+  </para>
+
+  <table>
+   <title><structname>pg_statistic_ext_data</structname> Columns</title>
+
+   <tgroup cols="4">
+    <thead>
+     <row>
+      <entry>Name</entry>
+      <entry>Type</entry>
+      <entry>References</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+
+    <tbody>
+
+     <row>
+      <entry><structfield>stxoid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry><literal><link linkend="catalog-pg-statistic-ext"><structname>pg_statistic_ext</structname></link>.oid</literal></entry>
+      <entry>Extended statistic containing the definition for this data.</entry>
+     </row>
+
      <row>
-      <entry><structfield>stxndistinct</structfield></entry>
+      <entry><structfield>stxdndistinct</structfield></entry>
       <entry><type>pg_ndistinct</type></entry>
       <entry></entry>
       <entry>
@@ -6591,7 +6645,7 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
      </row>
 
      <row>
-      <entry><structfield>stxdependencies</structfield></entry>
+      <entry><structfield>stxddependencies</structfield></entry>
       <entry><type>pg_dependencies</type></entry>
       <entry></entry>
       <entry>
@@ -6601,7 +6655,7 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
      </row>
 
      <row>
-      <entry><structfield>stxmcv</structfield></entry>
+      <entry><structfield>stxdmcv</structfield></entry>
       <entry><type>pg_mcv_list</type></entry>
       <entry></entry>
       <entry>
@@ -6614,12 +6668,6 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
    </tgroup>
   </table>
 
-  <para>
-   The <structfield>stxkind</structfield> field is filled at creation of the
-   statistics object, indicating which statistic type(s) are desired.
-   The fields after it are initially NULL and are filled only when the
-   corresponding statistic has been computed by <command>ANALYZE</command>.
-  </para>
  </sect1>
 
  <sect1 id="catalog-pg-subscription">
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index a072b97616..e918133874 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -22427,12 +22427,12 @@ CREATE EVENT TRIGGER test_table_rewrite_oid
     The <function>pg_mcv_list_items</function> function can be used like this:
 
 <programlisting>
-SELECT m.* FROM pg_statistic_ext,
-                pg_mcv_list_items(stxmcv) m WHERE stxname = 'stts';
+SELECT m.* FROM pg_statistic_ext join pg_statistic_ext_data on (oid = stxoid),
+                pg_mcv_list_items(stxdmcv) m WHERE stxname = 'stts';
 </programlisting>
 
      Values of the <type>pg_mcv_list</type> can be obtained only from the
-     <literal>pg_statistic_ext.stxmcv</literal> column.
+     <literal>pg_statistic_ext_data.stxdmcv</literal> column.
    </para>
   </sect2>
 
diff --git a/doc/src/sgml/perform.sgml b/doc/src/sgml/perform.sgml
index a84be85159..8e165832b3 100644
--- a/doc/src/sgml/perform.sgml
+++ b/doc/src/sgml/perform.sgml
@@ -1076,6 +1076,10 @@ WHERE tablename = 'road';
     <primary>pg_statistic_ext</primary>
    </indexterm>
 
+   <indexterm>
+    <primary>pg_statistic_ext_data</primary>
+   </indexterm>
+
    <para>
     It is common to see slow queries running bad execution plans because
     multiple columns used in the query clauses are correlated.
@@ -1104,7 +1108,7 @@ WHERE tablename = 'road';
     interest in the statistics.  Actual data collection is performed
     by <command>ANALYZE</command> (either a manual command, or background
     auto-analyze).  The collected values can be examined in the
-    <link linkend="catalog-pg-statistic-ext"><structname>pg_statistic_ext</structname></link>
+    <link linkend="catalog-pg-statistic-ext-data"><structname>pg_statistic_ext_data</structname></link>
     catalog.
    </para>
 
@@ -1172,10 +1176,10 @@ CREATE STATISTICS stts (dependencies) ON zip, city FROM zipcodes;
 
 ANALYZE zipcodes;
 
-SELECT stxname, stxkeys, stxdependencies
-  FROM pg_statistic_ext
+SELECT stxname, stxkeys, stxddependencies
+  FROM pg_statistic_ext join pg_statistic_ext_data on (oid = stxoid)
   WHERE stxname = 'stts';
- stxname | stxkeys |             stxdependencies               
+ stxname | stxkeys |             stxddependencies             
 ---------+---------+------------------------------------------
  stts    | 1 5     | {"1 => 5": 1.000000, "5 => 1": 0.423130}
 (1 row)
@@ -1262,8 +1266,8 @@ CREATE STATISTICS stts2 (ndistinct) ON zip, state, city FROM zipcodes;
 
 ANALYZE zipcodes;
 
-SELECT stxkeys AS k, stxndistinct AS nd
-  FROM pg_statistic_ext
+SELECT stxkeys AS k, stxdndistinct AS nd
+  FROM pg_statistic_ext join pg_statistic_ext_data on (oid = stxoid)
   WHERE stxname = 'stts2';
 -[ RECORD 1 ]--------------------------------------------------------
 k  | 1 2 5
@@ -1317,8 +1321,8 @@ CREATE STATISTICS stts3 (mcv) ON state, city FROM zipcodes;
 
 ANALYZE zipcodes;
 
-SELECT m.* FROM pg_statistic_ext,
-                pg_mcv_list_items(stxmcv) m WHERE stxname = 'stts3';
+SELECT m.* FROM pg_statistic_ext join pg_statistic_ext_data on (oid = stxoid),
+                pg_mcv_list_items(stxdmcv) m WHERE stxname = 'stts3';
 
  index |         values         | nulls | frequency | base_frequency 
 -------+------------------------+-------+-----------+----------------
diff --git a/doc/src/sgml/planstats.sgml b/doc/src/sgml/planstats.sgml
index 4b1d3f4952..a25ce152ac 100644
--- a/doc/src/sgml/planstats.sgml
+++ b/doc/src/sgml/planstats.sgml
@@ -635,8 +635,8 @@ EXPLAIN (ANALYZE, TIMING OFF) SELECT * FROM t WHERE a = 1 AND b = 1;
     <function>pg_mcv_list_items</function> set-returning function.
 
 <programlisting>
-SELECT m.* FROM pg_statistic_ext,
-                pg_mcv_list_items(stxmcv) m WHERE stxname = 'stts2';
+SELECT m.* FROM pg_statistic_ext join pg_statistic_ext_data on (oid = stxoid),
+                pg_mcv_list_items(stxdmcv) m WHERE stxname = 'stts2';
  index |  values  | nulls | frequency | base_frequency 
 -------+----------+-------+-----------+----------------
      0 | {0, 0}   | {f,f} |      0.01 |         0.0001
diff --git a/src/backend/catalog/Makefile b/src/backend/catalog/Makefile
index f186198fc6..8bece078dd 100644
--- a/src/backend/catalog/Makefile
+++ b/src/backend/catalog/Makefile
@@ -34,7 +34,7 @@ CATALOG_HEADERS := \
 	pg_attrdef.h pg_constraint.h pg_inherits.h pg_index.h pg_operator.h \
 	pg_opfamily.h pg_opclass.h pg_am.h pg_amop.h pg_amproc.h \
 	pg_language.h pg_largeobject_metadata.h pg_largeobject.h pg_aggregate.h \
-	pg_statistic_ext.h \
+	pg_statistic_ext.h pg_statistic_ext_data.h \
 	pg_statistic.h pg_rewrite.h pg_trigger.h pg_event_trigger.h pg_description.h \
 	pg_cast.h pg_enum.h pg_namespace.h pg_conversion.h pg_depend.h \
 	pg_database.h pg_db_role_setting.h pg_tablespace.h pg_pltemplate.h \
diff --git a/src/backend/commands/statscmds.c b/src/backend/commands/statscmds.c
index 217d3a4533..cf406f6f96 100644
--- a/src/backend/commands/statscmds.c
+++ b/src/backend/commands/statscmds.c
@@ -23,6 +23,7 @@
 #include "catalog/namespace.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_statistic_ext.h"
+#include "catalog/pg_statistic_ext_data.h"
 #include "commands/comment.h"
 #include "commands/defrem.h"
 #include "miscadmin.h"
@@ -67,8 +68,11 @@ CreateStatistics(CreateStatsStmt *stmt)
 	HeapTuple	htup;
 	Datum		values[Natts_pg_statistic_ext];
 	bool		nulls[Natts_pg_statistic_ext];
+	Datum		datavalues[Natts_pg_statistic_ext_data];
+	bool		datanulls[Natts_pg_statistic_ext_data];
 	int2vector *stxkeys;
 	Relation	statrel;
+	Relation	datarel;
 	Relation	rel = NULL;
 	Oid			relid;
 	ObjectAddress parentobject,
@@ -336,11 +340,6 @@ CreateStatistics(CreateStatsStmt *stmt)
 	values[Anum_pg_statistic_ext_stxkeys - 1] = PointerGetDatum(stxkeys);
 	values[Anum_pg_statistic_ext_stxkind - 1] = PointerGetDatum(stxkind);
 
-	/* no statistics built yet */
-	nulls[Anum_pg_statistic_ext_stxndistinct - 1] = true;
-	nulls[Anum_pg_statistic_ext_stxdependencies - 1] = true;
-	nulls[Anum_pg_statistic_ext_stxmcv - 1] = true;
-
 	/* insert it into pg_statistic_ext */
 	htup = heap_form_tuple(statrel->rd_att, values, nulls);
 	CatalogTupleInsert(statrel, htup);
@@ -348,6 +347,29 @@ CreateStatistics(CreateStatsStmt *stmt)
 
 	relation_close(statrel, RowExclusiveLock);
 
+	/*
+	 * Also build the pg_statistic_ext_data tuple, to hold the actual
+	 * statistics data.
+	 */
+	datarel = table_open(StatisticExtDataRelationId, RowExclusiveLock);
+
+	memset(datavalues, 0, sizeof(datavalues));
+	memset(datanulls, false, sizeof(datanulls));
+
+	datavalues[Anum_pg_statistic_ext_data_stxoid - 1] = ObjectIdGetDatum(statoid);
+
+	/* no statistics built yet */
+	datanulls[Anum_pg_statistic_ext_data_stxdndistinct - 1] = true;
+	datanulls[Anum_pg_statistic_ext_data_stxddependencies - 1] = true;
+	datanulls[Anum_pg_statistic_ext_data_stxdmcv - 1] = true;
+
+	/* insert it into pg_statistic_ext_data */
+	htup = heap_form_tuple(datarel->rd_att, datavalues, datanulls);
+	CatalogTupleInsert(datarel, htup);
+	heap_freetuple(htup);
+
+	relation_close(datarel, RowExclusiveLock);
+
 	/*
 	 * Invalidate relcache so that others see the new statistics object.
 	 */
@@ -403,6 +425,23 @@ RemoveStatisticsById(Oid statsOid)
 	Form_pg_statistic_ext statext;
 	Oid			relid;
 
+	/*
+	 * First delete the pg_statistic_ext_data tuple holding the actual
+	 * statistical data.
+	 */
+	relation = table_open(StatisticExtDataRelationId, RowExclusiveLock);
+
+	tup = SearchSysCache1(STATEXTDATASTXOID, ObjectIdGetDatum(statsOid));
+
+	if (!HeapTupleIsValid(tup)) /* should not happen */
+		elog(ERROR, "cache lookup failed for statistics data %u", statsOid);
+
+	CatalogTupleDelete(relation, &tup->t_self);
+
+	ReleaseSysCache(tup);
+
+	table_close(relation, RowExclusiveLock);
+
 	/*
 	 * Delete the pg_statistic_ext tuple.  Also send out a cache inval on the
 	 * associated table, so that dependent plans will be rebuilt.
@@ -431,8 +470,8 @@ RemoveStatisticsById(Oid statsOid)
  *
  * This could throw an error if the type change can't be supported.
  * If it can be supported, but the stats must be recomputed, a likely choice
- * would be to set the relevant column(s) of the pg_statistic_ext tuple to
- * null until the next ANALYZE.  (Note that the type change hasn't actually
+ * would be to set the relevant column(s) of the pg_statistic_ext_data tuple
+ * to null until the next ANALYZE.  (Note that the type change hasn't actually
  * happened yet, so one option that's *not* on the table is to recompute
  * immediately.)
  *
@@ -456,11 +495,11 @@ UpdateStatisticsForTypeChange(Oid statsOid, Oid relationOid, int attnum,
 
 	Relation	rel;
 
-	Datum		values[Natts_pg_statistic_ext];
-	bool		nulls[Natts_pg_statistic_ext];
-	bool		replaces[Natts_pg_statistic_ext];
+	Datum		values[Natts_pg_statistic_ext_data];
+	bool		nulls[Natts_pg_statistic_ext_data];
+	bool		replaces[Natts_pg_statistic_ext_data];
 
-	oldtup = SearchSysCache1(STATEXTOID, ObjectIdGetDatum(statsOid));
+	oldtup = SearchSysCache1(STATEXTDATASTXOID, ObjectIdGetDatum(statsOid));
 	if (!HeapTupleIsValid(oldtup))
 		elog(ERROR, "cache lookup failed for statistics object %u", statsOid);
 
@@ -479,14 +518,14 @@ UpdateStatisticsForTypeChange(Oid statsOid, Oid relationOid, int attnum,
 	 * OK, we need to reset some statistics. So let's build the new tuple,
 	 * replacing the affected statistics types with NULL.
 	 */
-	memset(nulls, 0, Natts_pg_statistic_ext * sizeof(bool));
-	memset(replaces, 0, Natts_pg_statistic_ext * sizeof(bool));
-	memset(values, 0, Natts_pg_statistic_ext * sizeof(Datum));
+	memset(nulls, 0, Natts_pg_statistic_ext_data * sizeof(bool));
+	memset(replaces, 0, Natts_pg_statistic_ext_data * sizeof(bool));
+	memset(values, 0, Natts_pg_statistic_ext_data * sizeof(Datum));
 
-	replaces[Anum_pg_statistic_ext_stxmcv - 1] = true;
-	nulls[Anum_pg_statistic_ext_stxmcv - 1] = true;
+	replaces[Anum_pg_statistic_ext_data_stxdmcv - 1] = true;
+	nulls[Anum_pg_statistic_ext_data_stxdmcv - 1] = true;
 
-	rel = heap_open(StatisticExtRelationId, RowExclusiveLock);
+	rel = heap_open(StatisticExtDataRelationId, RowExclusiveLock);
 
 	/* replace the old tuple */
 	stup = heap_modify_tuple(oldtup,
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 2405acbf6f..40f497660d 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1308,6 +1308,7 @@ get_relation_statistics(RelOptInfo *rel, Relation relation)
 		Oid			statOid = lfirst_oid(l);
 		Form_pg_statistic_ext staForm;
 		HeapTuple	htup;
+		HeapTuple	dtup;
 		Bitmapset  *keys = NULL;
 		int			i;
 
@@ -1316,6 +1317,10 @@ get_relation_statistics(RelOptInfo *rel, Relation relation)
 			elog(ERROR, "cache lookup failed for statistics object %u", statOid);
 		staForm = (Form_pg_statistic_ext) GETSTRUCT(htup);
 
+		dtup = SearchSysCache1(STATEXTDATASTXOID, ObjectIdGetDatum(statOid));
+		if (!HeapTupleIsValid(dtup))
+			elog(ERROR, "cache lookup failed for statistics object %u", statOid);
+
 		/*
 		 * First, build the array of columns covered.  This is ultimately
 		 * wasted if no stats within the object have actually been built, but
@@ -1325,7 +1330,7 @@ get_relation_statistics(RelOptInfo *rel, Relation relation)
 			keys = bms_add_member(keys, staForm->stxkeys.values[i]);
 
 		/* add one StatisticExtInfo for each kind built */
-		if (statext_is_kind_built(htup, STATS_EXT_NDISTINCT))
+		if (statext_is_kind_built(dtup, STATS_EXT_NDISTINCT))
 		{
 			StatisticExtInfo *info = makeNode(StatisticExtInfo);
 
@@ -1337,7 +1342,7 @@ get_relation_statistics(RelOptInfo *rel, Relation relation)
 			stainfos = lcons(info, stainfos);
 		}
 
-		if (statext_is_kind_built(htup, STATS_EXT_DEPENDENCIES))
+		if (statext_is_kind_built(dtup, STATS_EXT_DEPENDENCIES))
 		{
 			StatisticExtInfo *info = makeNode(StatisticExtInfo);
 
@@ -1349,7 +1354,7 @@ get_relation_statistics(RelOptInfo *rel, Relation relation)
 			stainfos = lcons(info, stainfos);
 		}
 
-		if (statext_is_kind_built(htup, STATS_EXT_MCV))
+		if (statext_is_kind_built(dtup, STATS_EXT_MCV))
 		{
 			StatisticExtInfo *info = makeNode(StatisticExtInfo);
 
@@ -1362,6 +1367,7 @@ get_relation_statistics(RelOptInfo *rel, Relation relation)
 		}
 
 		ReleaseSysCache(htup);
+		ReleaseSysCache(dtup);
 		bms_free(keys);
 	}
 
diff --git a/src/backend/statistics/README.mcv b/src/backend/statistics/README.mcv
index c18878f5d2..8455b0d13f 100644
--- a/src/backend/statistics/README.mcv
+++ b/src/backend/statistics/README.mcv
@@ -86,11 +86,14 @@ So instead the MCV lists are stored in a custom data type (pg_mcv_list),
 which however makes it more difficult to inspect the contents. To make that
 easier, there's a SRF returning detailed information about the MCV lists.
 
-    SELECT m.* FROM pg_statistic_ext,
-                    pg_mcv_list_items(stxmcv) m WHERE stxname = 'stts2';
+    SELECT m.* FROM pg_statistic_ext s,
+                    pg_statistic_ext_data d,
+                    pg_mcv_list_items(stxdmcv) m
+              WHERE s.stxname = 'stts2'
+                AND d.stxoid = s.oid;
 
 It accepts one parameter - a pg_mcv_list value (which can only be obtained
-from pg_statistic_ext catalog, to defend against malicious input), and
+from pg_statistic_ext_data catalog, to defend against malicious input), and
 returns these columns:
 
     - item index (0, ..., (nitems-1))
diff --git a/src/backend/statistics/dependencies.c b/src/backend/statistics/dependencies.c
index cd318faf3b..66c38ce2bc 100644
--- a/src/backend/statistics/dependencies.c
+++ b/src/backend/statistics/dependencies.c
@@ -17,6 +17,7 @@
 #include "access/sysattr.h"
 #include "catalog/pg_operator.h"
 #include "catalog/pg_statistic_ext.h"
+#include "catalog/pg_statistic_ext_data.h"
 #include "lib/stringinfo.h"
 #include "nodes/nodeFuncs.h"
 #include "optimizer/clauses.h"
@@ -637,12 +638,12 @@ statext_dependencies_load(Oid mvoid)
 	Datum		deps;
 	HeapTuple	htup;
 
-	htup = SearchSysCache1(STATEXTOID, ObjectIdGetDatum(mvoid));
+	htup = SearchSysCache1(STATEXTDATASTXOID, ObjectIdGetDatum(mvoid));
 	if (!HeapTupleIsValid(htup))
 		elog(ERROR, "cache lookup failed for statistics object %u", mvoid);
 
-	deps = SysCacheGetAttr(STATEXTOID, htup,
-						   Anum_pg_statistic_ext_stxdependencies, &isnull);
+	deps = SysCacheGetAttr(STATEXTDATASTXOID, htup,
+						   Anum_pg_statistic_ext_data_stxddependencies, &isnull);
 	if (isnull)
 		elog(ERROR,
 			 "requested statistic kind \"%c\" is not yet built for statistics object %u",
diff --git a/src/backend/statistics/extended_stats.c b/src/backend/statistics/extended_stats.c
index ab187915c1..96db32f0a0 100644
--- a/src/backend/statistics/extended_stats.c
+++ b/src/backend/statistics/extended_stats.c
@@ -23,6 +23,7 @@
 #include "catalog/indexing.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_statistic_ext.h"
+#include "catalog/pg_statistic_ext_data.h"
 #include "nodes/nodeFuncs.h"
 #include "optimizer/clauses.h"
 #include "optimizer/optimizer.h"
@@ -65,9 +66,9 @@ typedef struct StatExtEntry
 static List *fetch_statentries_for_relation(Relation pg_statext, Oid relid);
 static VacAttrStats **lookup_var_attr_stats(Relation rel, Bitmapset *attrs,
 											int nvacatts, VacAttrStats **vacatts);
-static void statext_store(Relation pg_stext, Oid relid,
+static void statext_store(Oid relid,
 						  MVNDistinct *ndistinct, MVDependencies *dependencies,
-						  MCVList *mcvlist, VacAttrStats **stats);
+						  MCVList *mcv, VacAttrStats **stats);
 
 
 /*
@@ -145,7 +146,7 @@ BuildRelationExtStatistics(Relation onerel, double totalrows,
 		}
 
 		/* store the statistics in the catalog */
-		statext_store(pg_stext, stat->statOid, ndistinct, dependencies, mcv, stats);
+		statext_store(stat->statOid, ndistinct, dependencies, mcv, stats);
 	}
 
 	table_close(pg_stext, RowExclusiveLock);
@@ -156,7 +157,7 @@ BuildRelationExtStatistics(Relation onerel, double totalrows,
 
 /*
  * statext_is_kind_built
- *		Is this stat kind built in the given pg_statistic_ext tuple?
+ *		Is this stat kind built in the given pg_statistic_ext_data tuple?
  */
 bool
 statext_is_kind_built(HeapTuple htup, char type)
@@ -166,15 +167,15 @@ statext_is_kind_built(HeapTuple htup, char type)
 	switch (type)
 	{
 		case STATS_EXT_NDISTINCT:
-			attnum = Anum_pg_statistic_ext_stxndistinct;
+			attnum = Anum_pg_statistic_ext_data_stxdndistinct;
 			break;
 
 		case STATS_EXT_DEPENDENCIES:
-			attnum = Anum_pg_statistic_ext_stxdependencies;
+			attnum = Anum_pg_statistic_ext_data_stxddependencies;
 			break;
 
 		case STATS_EXT_MCV:
-			attnum = Anum_pg_statistic_ext_stxmcv;
+			attnum = Anum_pg_statistic_ext_data_stxdmcv;
 			break;
 
 		default:
@@ -312,70 +313,76 @@ lookup_var_attr_stats(Relation rel, Bitmapset *attrs,
 
 /*
  * statext_store
- *	Serializes the statistics and stores them into the pg_statistic_ext tuple.
+ *	Serializes the statistics and stores them into the pg_statistic_ext_data
+ *	tuple.
  */
 static void
-statext_store(Relation pg_stext, Oid statOid,
+statext_store(Oid statOid,
 			  MVNDistinct *ndistinct, MVDependencies *dependencies,
 			  MCVList *mcv, VacAttrStats **stats)
 {
 	HeapTuple	stup,
 				oldtup;
-	Datum		values[Natts_pg_statistic_ext];
-	bool		nulls[Natts_pg_statistic_ext];
-	bool		replaces[Natts_pg_statistic_ext];
+	Datum		values[Natts_pg_statistic_ext_data];
+	bool		nulls[Natts_pg_statistic_ext_data];
+	bool		replaces[Natts_pg_statistic_ext_data];
+	Relation	pg_stextdata;
 
 	memset(nulls, true, sizeof(nulls));
 	memset(replaces, false, sizeof(replaces));
 	memset(values, 0, sizeof(values));
 
 	/*
-	 * Construct a new pg_statistic_ext tuple, replacing the calculated stats.
+	 * Construct a new pg_statistic_ext_data tuple, replacing the calculated
+	 * stats.
 	 */
 	if (ndistinct != NULL)
 	{
 		bytea	   *data = statext_ndistinct_serialize(ndistinct);
 
-		nulls[Anum_pg_statistic_ext_stxndistinct - 1] = (data == NULL);
-		values[Anum_pg_statistic_ext_stxndistinct - 1] = PointerGetDatum(data);
+		nulls[Anum_pg_statistic_ext_data_stxdndistinct - 1] = (data == NULL);
+		values[Anum_pg_statistic_ext_data_stxdndistinct - 1] = PointerGetDatum(data);
 	}
 
 	if (dependencies != NULL)
 	{
 		bytea	   *data = statext_dependencies_serialize(dependencies);
 
-		nulls[Anum_pg_statistic_ext_stxdependencies - 1] = (data == NULL);
-		values[Anum_pg_statistic_ext_stxdependencies - 1] = PointerGetDatum(data);
+		nulls[Anum_pg_statistic_ext_data_stxddependencies - 1] = (data == NULL);
+		values[Anum_pg_statistic_ext_data_stxddependencies - 1] = PointerGetDatum(data);
 	}
-
 	if (mcv != NULL)
 	{
 		bytea	   *data = statext_mcv_serialize(mcv, stats);
 
-		nulls[Anum_pg_statistic_ext_stxmcv - 1] = (data == NULL);
-		values[Anum_pg_statistic_ext_stxmcv - 1] = PointerGetDatum(data);
+		nulls[Anum_pg_statistic_ext_data_stxdmcv - 1] = (data == NULL);
+		values[Anum_pg_statistic_ext_data_stxdmcv - 1] = PointerGetDatum(data);
 	}
 
 	/* always replace the value (either by bytea or NULL) */
-	replaces[Anum_pg_statistic_ext_stxndistinct - 1] = true;
-	replaces[Anum_pg_statistic_ext_stxdependencies - 1] = true;
-	replaces[Anum_pg_statistic_ext_stxmcv - 1] = true;
+	replaces[Anum_pg_statistic_ext_data_stxdndistinct - 1] = true;
+	replaces[Anum_pg_statistic_ext_data_stxddependencies - 1] = true;
+	replaces[Anum_pg_statistic_ext_data_stxdmcv - 1] = true;
 
-	/* there should already be a pg_statistic_ext tuple */
-	oldtup = SearchSysCache1(STATEXTOID, ObjectIdGetDatum(statOid));
+	/* there should already be a pg_statistic_ext_data tuple */
+	oldtup = SearchSysCache1(STATEXTDATASTXOID, ObjectIdGetDatum(statOid));
 	if (!HeapTupleIsValid(oldtup))
 		elog(ERROR, "cache lookup failed for statistics object %u", statOid);
 
 	/* replace it */
+	pg_stextdata = table_open(StatisticExtDataRelationId, RowExclusiveLock);
+
 	stup = heap_modify_tuple(oldtup,
-							 RelationGetDescr(pg_stext),
+							 RelationGetDescr(pg_stextdata),
 							 values,
 							 nulls,
 							 replaces);
 	ReleaseSysCache(oldtup);
-	CatalogTupleUpdate(pg_stext, &stup->t_self, stup);
+	CatalogTupleUpdate(pg_stextdata, &stup->t_self, stup);
 
 	heap_freetuple(stup);
+
+	table_close(pg_stextdata, RowExclusiveLock);
 }
 
 /* initialize multi-dimensional sort */
diff --git a/src/backend/statistics/mcv.c b/src/backend/statistics/mcv.c
index d1f0fd55e8..2feb17ed44 100644
--- a/src/backend/statistics/mcv.c
+++ b/src/backend/statistics/mcv.c
@@ -19,6 +19,7 @@
 #include "access/htup_details.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_statistic_ext.h"
+#include "catalog/pg_statistic_ext_data.h"
 #include "fmgr.h"
 #include "funcapi.h"
 #include "nodes/nodeFuncs.h"
@@ -429,13 +430,13 @@ statext_mcv_load(Oid mvoid)
 	MCVList    *result;
 	bool		isnull;
 	Datum		mcvlist;
-	HeapTuple	htup = SearchSysCache1(STATEXTOID, ObjectIdGetDatum(mvoid));
+	HeapTuple	htup = SearchSysCache1(STATEXTDATASTXOID, ObjectIdGetDatum(mvoid));
 
 	if (!HeapTupleIsValid(htup))
 		elog(ERROR, "cache lookup failed for statistics object %u", mvoid);
 
-	mcvlist = SysCacheGetAttr(STATEXTOID, htup,
-							  Anum_pg_statistic_ext_stxmcv, &isnull);
+	mcvlist = SysCacheGetAttr(STATEXTDATASTXOID, htup,
+							  Anum_pg_statistic_ext_data_stxdmcv, &isnull);
 
 	if (isnull)
 		elog(ERROR,
diff --git a/src/backend/statistics/mvdistinct.c b/src/backend/statistics/mvdistinct.c
index 7432a6a396..9ebf183d90 100644
--- a/src/backend/statistics/mvdistinct.c
+++ b/src/backend/statistics/mvdistinct.c
@@ -27,6 +27,7 @@
 
 #include "access/htup_details.h"
 #include "catalog/pg_statistic_ext.h"
+#include "catalog/pg_statistic_ext_data.h"
 #include "utils/fmgrprotos.h"
 #include "utils/lsyscache.h"
 #include "lib/stringinfo.h"
@@ -145,12 +146,12 @@ statext_ndistinct_load(Oid mvoid)
 	Datum		ndist;
 	HeapTuple	htup;
 
-	htup = SearchSysCache1(STATEXTOID, ObjectIdGetDatum(mvoid));
+	htup = SearchSysCache1(STATEXTDATASTXOID, ObjectIdGetDatum(mvoid));
 	if (!HeapTupleIsValid(htup))
 		elog(ERROR, "cache lookup failed for statistics object %u", mvoid);
 
-	ndist = SysCacheGetAttr(STATEXTOID, htup,
-							Anum_pg_statistic_ext_stxndistinct, &isnull);
+	ndist = SysCacheGetAttr(STATEXTDATASTXOID, htup,
+							Anum_pg_statistic_ext_data_stxdndistinct, &isnull);
 	if (isnull)
 		elog(ERROR,
 			 "requested statistic kind \"%c\" is not yet built for statistics object %u",
diff --git a/src/backend/utils/cache/syscache.c b/src/backend/utils/cache/syscache.c
index 476538354d..99976468e5 100644
--- a/src/backend/utils/cache/syscache.c
+++ b/src/backend/utils/cache/syscache.c
@@ -62,6 +62,7 @@
 #include "catalog/pg_replication_origin.h"
 #include "catalog/pg_statistic.h"
 #include "catalog/pg_statistic_ext.h"
+#include "catalog/pg_statistic_ext_data.h"
 #include "catalog/pg_subscription.h"
 #include "catalog/pg_subscription_rel.h"
 #include "catalog/pg_tablespace.h"
@@ -727,6 +728,17 @@ static const struct cachedesc cacheinfo[] = {
 		},
 		32
 	},
+	{StatisticExtDataRelationId, /* STATEXTDATASTXOID */
+		StatisticExtDataStxoidIndexId,
+		1,
+		{
+			Anum_pg_statistic_ext_data_stxoid,
+			0,
+			0,
+			0
+		},
+		4
+	},
 	{StatisticExtRelationId,	/* STATEXTNAMENSP */
 		StatisticExtNameIndexId,
 		2,
diff --git a/src/include/catalog/indexing.h b/src/include/catalog/indexing.h
index 5f2aee8a4a..ef4445b017 100644
--- a/src/include/catalog/indexing.h
+++ b/src/include/catalog/indexing.h
@@ -186,13 +186,6 @@ DECLARE_UNIQUE_INDEX(pg_largeobject_loid_pn_index, 2683, on pg_largeobject using
 DECLARE_UNIQUE_INDEX(pg_largeobject_metadata_oid_index, 2996, on pg_largeobject_metadata using btree(oid oid_ops));
 #define LargeObjectMetadataOidIndexId	2996
 
-DECLARE_UNIQUE_INDEX(pg_statistic_ext_oid_index, 3380, on pg_statistic_ext using btree(oid oid_ops));
-#define StatisticExtOidIndexId	3380
-DECLARE_UNIQUE_INDEX(pg_statistic_ext_name_index, 3997, on pg_statistic_ext using btree(stxname name_ops, stxnamespace oid_ops));
-#define StatisticExtNameIndexId 3997
-DECLARE_INDEX(pg_statistic_ext_relid_index, 3379, on pg_statistic_ext using btree(stxrelid oid_ops));
-#define StatisticExtRelidIndexId 3379
-
 DECLARE_UNIQUE_INDEX(pg_namespace_nspname_index, 2684, on pg_namespace using btree(nspname name_ops));
 #define NamespaceNameIndexId  2684
 DECLARE_UNIQUE_INDEX(pg_namespace_oid_index, 2685, on pg_namespace using btree(oid oid_ops));
@@ -237,6 +230,16 @@ DECLARE_INDEX(pg_shdepend_reference_index, 1233, on pg_shdepend using btree(refc
 DECLARE_UNIQUE_INDEX(pg_statistic_relid_att_inh_index, 2696, on pg_statistic using btree(starelid oid_ops, staattnum int2_ops, stainherit bool_ops));
 #define StatisticRelidAttnumInhIndexId	2696
 
+DECLARE_UNIQUE_INDEX(pg_statistic_ext_oid_index, 3380, on pg_statistic_ext using btree(oid oid_ops));
+#define StatisticExtOidIndexId	3380
+DECLARE_UNIQUE_INDEX(pg_statistic_ext_name_index, 3997, on pg_statistic_ext using btree(stxname name_ops, stxnamespace oid_ops));
+#define StatisticExtNameIndexId 3997
+DECLARE_INDEX(pg_statistic_ext_relid_index, 3379, on pg_statistic_ext using btree(stxrelid oid_ops));
+#define StatisticExtRelidIndexId 3379
+
+DECLARE_UNIQUE_INDEX(pg_statistic_ext_data_stxoid_index, 3433, on pg_statistic_ext_data using btree(stxoid oid_ops));
+#define StatisticExtDataStxoidIndexId 3433
+
 DECLARE_UNIQUE_INDEX(pg_tablespace_oid_index, 2697, on pg_tablespace using btree(oid oid_ops));
 #define TablespaceOidIndexId  2697
 DECLARE_UNIQUE_INDEX(pg_tablespace_spcname_index, 2698, on pg_tablespace using btree(spcname name_ops));
diff --git a/src/include/catalog/pg_statistic_ext.h b/src/include/catalog/pg_statistic_ext.h
index e449f9efe8..d8c5e0651e 100644
--- a/src/include/catalog/pg_statistic_ext.h
+++ b/src/include/catalog/pg_statistic_ext.h
@@ -1,8 +1,12 @@
 /*-------------------------------------------------------------------------
  *
  * pg_statistic_ext.h
- *	  definition of the "extended statistics" system catalog (pg_statistic_ext)
+ *	  definition of the "extended statistics" system catalog
+ *	  (pg_statistic_ext)
  *
+ * Note that pg_statistic_ext contains the definitions of extended statistics
+ * objects, created by CREATE STATISTICS, but not the actual statistical data,
+ * created by running ANALYZE.
  *
  * Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
@@ -47,9 +51,6 @@ CATALOG(pg_statistic_ext,3381,StatisticExtRelationId)
 #ifdef CATALOG_VARLEN
 	char		stxkind[1] BKI_FORCE_NOT_NULL;	/* statistics kinds requested
 												 * to build */
-	pg_ndistinct stxndistinct;	/* ndistinct coefficients (serialized) */
-	pg_dependencies stxdependencies;	/* dependencies (serialized) */
-	pg_mcv_list stxmcv;			/* MCV (serialized) */
 #endif
 
 } FormData_pg_statistic_ext;
diff --git a/src/include/catalog/pg_statistic_ext_data.h b/src/include/catalog/pg_statistic_ext_data.h
new file mode 100644
index 0000000000..5da9bc8ae2
--- /dev/null
+++ b/src/include/catalog/pg_statistic_ext_data.h
@@ -0,0 +1,52 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_statistic_ext_data.h
+ *	  definition of the "extended statistics data" system catalog
+ *	  (pg_statistic_ext_data)
+ *
+ * This catalog stores the statistical data for extended statistics objects.
+ *
+ * Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/catalog/pg_statistic_ext_data.h
+ *
+ * NOTES
+ *	  The Catalog.pm module reads this file and derives schema
+ *	  information.
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PG_STATISTIC_EXT_DATA_H
+#define PG_STATISTIC_EXT_DATA_H
+
+#include "catalog/genbki.h"
+#include "catalog/pg_statistic_ext_data_d.h"
+
+/* ----------------
+ *		pg_statistic_ext_data definition.  cpp turns this into
+ *		typedef struct FormData_pg_statistic_ext_data
+ * ----------------
+ */
+CATALOG(pg_statistic_ext_data,3429,StatisticExtDataRelationId)
+{
+	Oid			stxoid;			/* statistics object this data is for */
+
+#ifdef CATALOG_VARLEN			/* variable-length fields start here */
+
+	pg_ndistinct stxdndistinct;	/* ndistinct coefficients (serialized) */
+	pg_dependencies stxddependencies;	/* dependencies (serialized) */
+	pg_mcv_list stxdmcv;			/* MCV (serialized) */
+
+#endif
+
+} FormData_pg_statistic_ext_data;
+
+/* ----------------
+ *		Form_pg_statistic_ext_data corresponds to a pointer to a tuple with
+ *		the format of pg_statistic_ext_data relation.
+ * ----------------
+ */
+typedef FormData_pg_statistic_ext_data *Form_pg_statistic_ext_data;
+
+#endif							/* PG_STATISTIC_EXT_DATA_H */
diff --git a/src/include/catalog/toasting.h b/src/include/catalog/toasting.h
index a9e633d6bb..cc5dfed0bf 100644
--- a/src/include/catalog/toasting.h
+++ b/src/include/catalog/toasting.h
@@ -70,6 +70,7 @@ DECLARE_TOAST(pg_rewrite, 2838, 2839);
 DECLARE_TOAST(pg_seclabel, 3598, 3599);
 DECLARE_TOAST(pg_statistic, 2840, 2841);
 DECLARE_TOAST(pg_statistic_ext, 3439, 3440);
+DECLARE_TOAST(pg_statistic_ext_data, 3430, 3431);
 DECLARE_TOAST(pg_trigger, 2336, 2337);
 DECLARE_TOAST(pg_ts_dict, 4169, 4170);
 DECLARE_TOAST(pg_type, 4171, 4172);
diff --git a/src/include/utils/syscache.h b/src/include/utils/syscache.h
index a6307474ee..918765cc99 100644
--- a/src/include/utils/syscache.h
+++ b/src/include/utils/syscache.h
@@ -86,6 +86,7 @@ enum SysCacheIdentifier
 	REPLORIGNAME,
 	RULERELNAME,
 	SEQRELID,
+	STATEXTDATASTXOID,
 	STATEXTNAMENSP,
 	STATEXTOID,
 	STATRELATTINH,
diff --git a/src/test/regress/expected/oidjoins.out b/src/test/regress/expected/oidjoins.out
index 4edc8175aa..1302cc271b 100644
--- a/src/test/regress/expected/oidjoins.out
+++ b/src/test/regress/expected/oidjoins.out
@@ -985,6 +985,14 @@ WHERE	stxowner != 0 AND
 ------+----------
 (0 rows)
 
+SELECT	ctid, stxoid
+FROM	pg_catalog.pg_statistic_ext_data fk
+WHERE	stxoid != 0 AND
+	NOT EXISTS(SELECT 1 FROM pg_catalog.pg_statistic_ext pk WHERE pk.oid = fk.stxoid);
+ ctid | stxoid 
+------+--------
+(0 rows)
+
 SELECT	ctid, spcowner
 FROM	pg_catalog.pg_tablespace fk
 WHERE	spcowner != 0 AND
diff --git a/src/test/regress/expected/sanity_check.out b/src/test/regress/expected/sanity_check.out
index 392e8a4957..8ff0da185e 100644
--- a/src/test/regress/expected/sanity_check.out
+++ b/src/test/regress/expected/sanity_check.out
@@ -149,6 +149,7 @@ pg_shdescription|t
 pg_shseclabel|t
 pg_statistic|t
 pg_statistic_ext|t
+pg_statistic_ext_data|t
 pg_subscription|t
 pg_subscription_rel|t
 pg_tablespace|t
diff --git a/src/test/regress/expected/stats_ext.out b/src/test/regress/expected/stats_ext.out
index 046d0b1721..def95d80c9 100644
--- a/src/test/regress/expected/stats_ext.out
+++ b/src/test/regress/expected/stats_ext.out
@@ -199,9 +199,11 @@ SELECT * FROM check_estimated_rows('SELECT COUNT(*) FROM ndistinct GROUP BY b, c
 -- correct command
 CREATE STATISTICS s10 ON a, b, c FROM ndistinct;
 ANALYZE ndistinct;
-SELECT stxkind, stxndistinct
-  FROM pg_statistic_ext WHERE stxrelid = 'ndistinct'::regclass;
- stxkind |                    stxndistinct                     
+SELECT s.stxkind, d.stxdndistinct
+  FROM pg_statistic_ext s, pg_statistic_ext_data d
+ WHERE s.stxrelid = 'ndistinct'::regclass
+   AND d.stxoid = s.oid;
+ stxkind |                    stxdndistinct                    
 ---------+-----------------------------------------------------
  {d,f,m} | {"3, 4": 11, "3, 6": 11, "4, 6": 11, "3, 4, 6": 11}
 (1 row)
@@ -246,9 +248,11 @@ INSERT INTO ndistinct (a, b, c, filler1)
             cash_words(mod(i,33)::int::money)
        FROM generate_series(1,5000) s(i);
 ANALYZE ndistinct;
-SELECT stxkind, stxndistinct
-  FROM pg_statistic_ext WHERE stxrelid = 'ndistinct'::regclass;
- stxkind |                        stxndistinct                        
+SELECT s.stxkind, d.stxdndistinct
+  FROM pg_statistic_ext s, pg_statistic_ext_data d
+ WHERE s.stxrelid = 'ndistinct'::regclass
+   AND d.stxoid = s.oid;
+ stxkind |                       stxdndistinct                        
 ---------+------------------------------------------------------------
  {d,f,m} | {"3, 4": 2550, "3, 6": 800, "4, 6": 1632, "3, 4, 6": 5000}
 (1 row)
@@ -285,10 +289,12 @@ SELECT * FROM check_estimated_rows('SELECT COUNT(*) FROM ndistinct GROUP BY a, d
 (1 row)
 
 DROP STATISTICS s10;
-SELECT stxkind, stxndistinct
-  FROM pg_statistic_ext WHERE stxrelid = 'ndistinct'::regclass;
- stxkind | stxndistinct 
----------+--------------
+SELECT s.stxkind, d.stxdndistinct
+  FROM pg_statistic_ext s, pg_statistic_ext_data d
+ WHERE s.stxrelid = 'ndistinct'::regclass
+   AND d.stxoid = s.oid;
+ stxkind | stxdndistinct 
+---------+---------------
 (0 rows)
 
 -- dropping the statistics results in under-estimates
@@ -537,7 +543,10 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a <= 4 AND b <
 
 -- check change of unrelated column type does not reset the MCV statistics
 ALTER TABLE mcv_lists ALTER COLUMN d TYPE VARCHAR(64);
-SELECT stxmcv IS NOT NULL FROM pg_statistic_ext WHERE stxname = 'mcv_lists_stats';
+SELECT d.stxdmcv IS NOT NULL
+  FROM pg_statistic_ext s, pg_statistic_ext_data d
+ WHERE s.stxname = 'mcv_lists_stats'
+   AND d.stxoid = s.oid;
  ?column? 
 ----------
  t
@@ -600,8 +609,11 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a IS NULL AND
 TRUNCATE mcv_lists;
 INSERT INTO mcv_lists (a, b, c) SELECT 1, 2, 3 FROM generate_series(1,1000) s(i);
 ANALYZE mcv_lists;
-SELECT m.* FROM pg_statistic_ext,
-              pg_mcv_list_items(stxmcv) m WHERE stxname = 'mcv_lists_stats';
+SELECT m.*
+  FROM pg_statistic_ext s, pg_statistic_ext_data d,
+       pg_mcv_list_items(d.stxdmcv) m
+ WHERE s.stxname = 'mcv_lists_stats'
+   AND d.stxoid = s.oid;
  index |  values   |  nulls  | frequency | base_frequency 
 -------+-----------+---------+-----------+----------------
      0 | {1, 2, 3} | {f,f,f} |         1 |              1
diff --git a/src/test/regress/sql/oidjoins.sql b/src/test/regress/sql/oidjoins.sql
index dbe4a5857d..b774cbca5b 100644
--- a/src/test/regress/sql/oidjoins.sql
+++ b/src/test/regress/sql/oidjoins.sql
@@ -493,6 +493,10 @@ SELECT	ctid, stxowner
 FROM	pg_catalog.pg_statistic_ext fk
 WHERE	stxowner != 0 AND
 	NOT EXISTS(SELECT 1 FROM pg_catalog.pg_authid pk WHERE pk.oid = fk.stxowner);
+SELECT	ctid, stxoid
+FROM	pg_catalog.pg_statistic_ext_data fk
+WHERE	stxoid != 0 AND
+	NOT EXISTS(SELECT 1 FROM pg_catalog.pg_statistic_ext pk WHERE pk.oid = fk.stxoid);
 SELECT	ctid, spcowner
 FROM	pg_catalog.pg_tablespace fk
 WHERE	spcowner != 0 AND
diff --git a/src/test/regress/sql/stats_ext.sql b/src/test/regress/sql/stats_ext.sql
index d333730872..3aa99d7bc8 100644
--- a/src/test/regress/sql/stats_ext.sql
+++ b/src/test/regress/sql/stats_ext.sql
@@ -144,8 +144,10 @@ CREATE STATISTICS s10 ON a, b, c FROM ndistinct;
 
 ANALYZE ndistinct;
 
-SELECT stxkind, stxndistinct
-  FROM pg_statistic_ext WHERE stxrelid = 'ndistinct'::regclass;
+SELECT s.stxkind, d.stxdndistinct
+  FROM pg_statistic_ext s, pg_statistic_ext_data d
+ WHERE s.stxrelid = 'ndistinct'::regclass
+   AND d.stxoid = s.oid;
 
 -- Hash Aggregate, thanks to estimates improved by the statistic
 SELECT * FROM check_estimated_rows('SELECT COUNT(*) FROM ndistinct GROUP BY a, b');
@@ -170,8 +172,10 @@ INSERT INTO ndistinct (a, b, c, filler1)
 
 ANALYZE ndistinct;
 
-SELECT stxkind, stxndistinct
-  FROM pg_statistic_ext WHERE stxrelid = 'ndistinct'::regclass;
+SELECT s.stxkind, d.stxdndistinct
+  FROM pg_statistic_ext s, pg_statistic_ext_data d
+ WHERE s.stxrelid = 'ndistinct'::regclass
+   AND d.stxoid = s.oid;
 
 -- correct esimates
 SELECT * FROM check_estimated_rows('SELECT COUNT(*) FROM ndistinct GROUP BY a, b');
@@ -186,8 +190,10 @@ SELECT * FROM check_estimated_rows('SELECT COUNT(*) FROM ndistinct GROUP BY a, d
 
 DROP STATISTICS s10;
 
-SELECT stxkind, stxndistinct
-  FROM pg_statistic_ext WHERE stxrelid = 'ndistinct'::regclass;
+SELECT s.stxkind, d.stxdndistinct
+  FROM pg_statistic_ext s, pg_statistic_ext_data d
+ WHERE s.stxrelid = 'ndistinct'::regclass
+   AND d.stxoid = s.oid;
 
 -- dropping the statistics results in under-estimates
 SELECT * FROM check_estimated_rows('SELECT COUNT(*) FROM ndistinct GROUP BY a, b');
@@ -335,7 +341,10 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a <= 4 AND b <
 -- check change of unrelated column type does not reset the MCV statistics
 ALTER TABLE mcv_lists ALTER COLUMN d TYPE VARCHAR(64);
 
-SELECT stxmcv IS NOT NULL FROM pg_statistic_ext WHERE stxname = 'mcv_lists_stats';
+SELECT d.stxdmcv IS NOT NULL
+  FROM pg_statistic_ext s, pg_statistic_ext_data d
+ WHERE s.stxname = 'mcv_lists_stats'
+   AND d.stxoid = s.oid;
 
 -- check change of column type resets the MCV statistics
 ALTER TABLE mcv_lists ALTER COLUMN c TYPE numeric;
@@ -378,8 +387,11 @@ TRUNCATE mcv_lists;
 INSERT INTO mcv_lists (a, b, c) SELECT 1, 2, 3 FROM generate_series(1,1000) s(i);
 ANALYZE mcv_lists;
 
-SELECT m.* FROM pg_statistic_ext,
-              pg_mcv_list_items(stxmcv) m WHERE stxname = 'mcv_lists_stats';
+SELECT m.*
+  FROM pg_statistic_ext s, pg_statistic_ext_data d,
+       pg_mcv_list_items(d.stxdmcv) m
+ WHERE s.stxname = 'mcv_lists_stats'
+   AND d.stxoid = s.oid;
 
 -- mcv with arrays
 CREATE TABLE mcv_lists_arrays (
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 8cc033eb13..bdcbc8d15e 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -729,6 +729,7 @@ FormData_pg_sequence_data
 FormData_pg_shdepend
 FormData_pg_statistic
 FormData_pg_statistic_ext
+FormData_pg_statistic_ext_data
 FormData_pg_subscription
 FormData_pg_subscription_rel
 FormData_pg_tablespace
@@ -786,6 +787,7 @@ Form_pg_sequence_data
 Form_pg_shdepend
 Form_pg_statistic
 Form_pg_statistic_ext
+Form_pg_statistic_ext_data
 Form_pg_subscription
 Form_pg_subscription_rel
 Form_pg_tablespace
-- 
2.20.1

0002-Add-pg_stats_ext-view.patchtext/plain; charset=us-asciiDownload

From 619dc7fac8d8ee57d0fef83fae0ee6b102e9bf14 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@2ndquadrant.com>
Date: Thu, 13 Jun 2019 17:25:04 +0200
Subject: [PATCH 2/2] Add pg_stats_ext view

Introduces a view on top of pg_statistic_ext and pg_statistic_ext_data,
showing data in a way that is easier to read for humans. This is similar
to what pg_stats does for pg_statistic.
---
 src/backend/catalog/system_views.sql | 41 ++++++++++++++++++++++++++++
 src/test/regress/expected/rules.out  | 29 ++++++++++++++++++++
 2 files changed, 70 insertions(+)

diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 78a103cdb9..c889890118 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -253,6 +253,47 @@ CREATE VIEW pg_stats WITH (security_barrier) AS
 
 REVOKE ALL on pg_statistic FROM public;
 
+CREATE VIEW pg_stats_ext WITH (security_barrier) AS
+    SELECT cn.nspname AS schemaname,
+           c.relname AS tablename,
+           sn.nspname AS statistics_schemaname,
+           s.stxname AS statistics_name,
+           pg_get_userbyid(s.stxowner) AS statistics_owner,
+           ( SELECT array_agg(a.attname ORDER BY a.attnum)
+             FROM unnest(s.stxkeys) k
+                  JOIN pg_attribute a
+                       ON (a.attrelid = s.stxrelid AND a.attnum = k)
+           ) AS attnames,
+           s.stxkind AS kinds,
+           sd.stxdndistinct AS n_distinct,
+           sd.stxddependencies AS dependencies,
+           m.most_common_vals,
+           m.most_common_val_nulls,
+           m.most_common_freqs,
+           m.most_common_base_freqs
+    FROM pg_statistic_ext s JOIN pg_class c ON (c.oid = s.stxrelid)
+         JOIN pg_statistic_ext_data sd ON (s.oid = sd.stxoid)
+         LEFT JOIN pg_namespace cn ON (cn.oid = c.relnamespace)
+         LEFT JOIN pg_namespace sn ON (sn.oid = s.stxnamespace)
+         LEFT JOIN LATERAL
+                   ( SELECT array_agg(values) AS most_common_vals,
+                            array_agg(nulls) AS most_common_val_nulls,
+                            array_agg(frequency) AS most_common_freqs,
+                            array_agg(base_frequency) AS most_common_base_freqs
+                     FROM pg_mcv_list_items(sd.stxdmcv)
+                   ) m ON sd.stxdmcv IS NOT NULL
+    WHERE NOT EXISTS
+              ( SELECT 1
+                FROM unnest(stxkeys) k
+                     JOIN pg_attribute a
+                          ON (a.attrelid = s.stxrelid AND a.attnum = k)
+                WHERE NOT has_column_privilege(c.oid, a.attnum, 'select') )
+    AND (c.relrowsecurity = false OR NOT row_security_active(c.oid));
+
+REVOKE ALL on pg_statistic_ext FROM public;
+GRANT SELECT (oid, stxrelid, stxname, stxnamespace, stxowner, stxkeys, stxkind)
+    ON pg_statistic_ext TO public;
+
 CREATE VIEW pg_publication_tables AS
     SELECT
         P.pubname AS pubname,
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 7d365c48d1..210e9cd146 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2284,6 +2284,35 @@ pg_stats| SELECT n.nspname AS schemaname,
      JOIN pg_attribute a ON (((c.oid = a.attrelid) AND (a.attnum = s.staattnum))))
      LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)))
   WHERE ((NOT a.attisdropped) AND has_column_privilege(c.oid, a.attnum, 'select'::text) AND ((c.relrowsecurity = false) OR (NOT row_security_active(c.oid))));
+pg_stats_ext| SELECT cn.nspname AS schemaname,
+    c.relname AS tablename,
+    sn.nspname AS statistics_schemaname,
+    s.stxname AS statistics_name,
+    pg_get_userbyid(s.stxowner) AS statistics_owner,
+    ( SELECT array_agg(a.attname ORDER BY a.attnum) AS array_agg
+           FROM (unnest(s.stxkeys) k(k)
+             JOIN pg_attribute a ON (((a.attrelid = s.stxrelid) AND (a.attnum = k.k))))) AS attnames,
+    s.stxkind AS kinds,
+    sd.stxdndistinct AS n_distinct,
+    sd.stxddependencies AS dependencies,
+    m.most_common_vals,
+    m.most_common_val_nulls,
+    m.most_common_freqs,
+    m.most_common_base_freqs
+   FROM (((((pg_statistic_ext s
+     JOIN pg_class c ON ((c.oid = s.stxrelid)))
+     JOIN pg_statistic_ext_data sd ON ((s.oid = sd.stxoid)))
+     LEFT JOIN pg_namespace cn ON ((cn.oid = c.relnamespace)))
+     LEFT JOIN pg_namespace sn ON ((sn.oid = s.stxnamespace)))
+     LEFT JOIN LATERAL ( SELECT array_agg(pg_mcv_list_items."values") AS most_common_vals,
+            array_agg(pg_mcv_list_items.nulls) AS most_common_val_nulls,
+            array_agg(pg_mcv_list_items.frequency) AS most_common_freqs,
+            array_agg(pg_mcv_list_items.base_frequency) AS most_common_base_freqs
+           FROM pg_mcv_list_items(sd.stxdmcv) pg_mcv_list_items(index, "values", nulls, frequency, base_frequency)) m ON ((sd.stxdmcv IS NOT NULL)))
+  WHERE ((NOT (EXISTS ( SELECT 1
+           FROM (unnest(s.stxkeys) k(k)
+             JOIN pg_attribute a ON (((a.attrelid = s.stxrelid) AND (a.attnum = k.k))))
+          WHERE (NOT has_column_privilege(c.oid, a.attnum, 'select'::text))))) AND ((c.relrowsecurity = false) OR (NOT row_security_active(c.oid))));
 pg_tables| SELECT n.nspname AS schemaname,
     c.relname AS tablename,
     pg_get_userbyid(c.relowner) AS tableowner,
-- 
2.20.1

#32

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 6 years ago

In reply to: Tomas Vondra (#31)

Re: Multivariate MCV stats can leak data to unprivileged users

On Thu, Jun 13, 2019 at 07:37:45PM +0200, Tomas Vondra wrote:

...

OK, attached are patches fixing the issues reported by you and John
Naylor, and squashing the parts into just two patches (catalog split and
pg_stats_ext). Barring objections, I'll push those tomorrow.

I've renamed columns in the _data catalog from 'stx' to 'stxd', which I
think is appropriate given the "data" in catalog name.

I'm wondering if we should change the examples in SGML docs (say, in
planstats.sgml) to use the new pg_stats_ext view, instead of querying the
catalogs directly. I've tried doing that, but I found the results less
readable than what we currently have (especially for the MCV list, where
it'd require matching elements in multiple arrays). So I've left this
unchanged for now.

I've pushed those changes, after adding docs for the pg_stats_ext view.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#33

Dean Rasheed

dean.a.rasheed@gmail.com

over 6 years ago

In reply to: Tomas Vondra (#2)

Re: Multivariate MCV stats can leak data to unprivileged users

On Mon, 13 May 2019 at 23:36, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

On Fri, May 10, 2019 at 10:19:44AM +0100, Dean Rasheed wrote:

While working on 1aebfbea83c, I noticed that the new multivariate MCV
stats feature suffers from the same problem, and also the original
problems that were fixed in e2d4ef8de8 and earlier --- namely that a
user can see values in the MCV lists that they shouldn't see (values
from tables that they don't have privileges on).

I think there are 2 separate issues here:

1). The table pg_statistic_ext is accessible to anyone, so any user
can see the MCV lists of any table. I think we should give this the
same treatment as pg_statistic, and hide it behind a security barrier
view, revoking public access from the table.

2). The multivariate MCV stats planner code can be made to invoke
user-defined operators, so a user can create a leaky operator and use
it to reveal data values from the MCV lists even if they have no
permissions on the table.

Attached is a draft patch to fix (2), which hooks into
statext_is_compatible_clause().

I think that patch is good.

I realised that we forgot to push this second part, so I've just done so.

Regards,
Dean

#34

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 6 years ago

In reply to: Dean Rasheed (#33)

Re: Multivariate MCV stats can leak data to unprivileged users

On Sun, Jun 23, 2019 at 06:56:53PM +0100, Dean Rasheed wrote:

On Mon, 13 May 2019 at 23:36, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

On Fri, May 10, 2019 at 10:19:44AM +0100, Dean Rasheed wrote:

While working on 1aebfbea83c, I noticed that the new multivariate MCV
stats feature suffers from the same problem, and also the original
problems that were fixed in e2d4ef8de8 and earlier --- namely that a
user can see values in the MCV lists that they shouldn't see (values
from tables that they don't have privileges on).

I think there are 2 separate issues here:

1). The table pg_statistic_ext is accessible to anyone, so any user
can see the MCV lists of any table. I think we should give this the
same treatment as pg_statistic, and hide it behind a security barrier
view, revoking public access from the table.

2). The multivariate MCV stats planner code can be made to invoke
user-defined operators, so a user can create a leaky operator and use
it to reveal data values from the MCV lists even if they have no
permissions on the table.

Attached is a draft patch to fix (2), which hooks into
statext_is_compatible_clause().

I think that patch is good.

I realised that we forgot to push this second part, so I've just done so.

Whoops! Too many patches in this thread. Thanks for noticing.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services