pg_stats and range statistics

Started by Egor Rogovover 4 years ago25 messages

e.rogov@postgrespro.ru

over 4 years ago

1 attachment(s)

Hi,

Statistics for range types are not currently exposed in pg_stats view
(i.e. STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM and
STATISTIC_KIND_BOUNDS_HISTOGRAM).

Shouldn't they? If so, here is a patch for adding them.

The following is a simple example of what it looks like:

CREATE TABLE test(r int4range);
INSERT INTO test
SELECT int4range((random()*10)::integer,(10+random()*10)::integer)
FROM generate_series(1,10000);
SET default_statistics_target = 10;
ANALYZE test;

SELECT range_length_histogram, range_length_empty_frac,
range_bounds_histogram
FROM pg_stats
WHERE tablename = 'test' \gx

-[ RECORD 1
]-----------+------------------------------------------------------------------------------------------------------
range_length_histogram | {1,4,6,8,9,10,11,12,14,16,20}
range_length_empty_frac | {0.0036666666}
range_bounds_histogram |
{"[0,10)","[1,11)","[2,12)","[3,13)","[4,14)","[5,15)","[6,16)","[7,17)","[8,18)","[9,19)","[10,20)"}

Regards,
Egor Rogov.

Attachments:

pgstats.patchtext/plain; charset=UTF-8; name=pgstats.patch; x-mac-creator=0; x-mac-type=0Download

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index f517a7d4af..4d037b590e 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -12877,6 +12877,36 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
        non-null elements.  (Null for scalar types.)
       </para></entry>
      </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>range_length_histogram</structfield> <type>anyarray</type>
+      </para>
+      <para>
+       A histogram of lengths of non-empty, non-null ranges.
+       (Null for non-range types.)
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>range_length_empty_frac</structfield> <type>float4</type>
+      </para>
+      <para>
+       A fraction of empty ranges.  (Null for non-range types.)
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>range_bounds_histogram</structfield> <type>anyarray</type>
+      </para>
+      <para>
+       Histograms of lower and upper bounds of non-empty, non-null ranges,
+       combined into a single array of range values.
+       (Null for non-range types.)
+      </para></entry>
+     </row>
     </tbody>
    </tgroup>
   </table>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 999d984068..93fadfff15 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -243,7 +243,28 @@ CREATE VIEW pg_stats WITH (security_barrier) AS
             WHEN stakind3 = 5 THEN stanumbers3
             WHEN stakind4 = 5 THEN stanumbers4
             WHEN stakind5 = 5 THEN stanumbers5
-        END AS elem_count_histogram
+        END AS elem_count_histogram,
+        CASE
+            WHEN stakind1 = 6 THEN stavalues1
+            WHEN stakind2 = 6 THEN stavalues2
+            WHEN stakind3 = 6 THEN stavalues3
+            WHEN stakind4 = 6 THEN stavalues4
+            WHEN stakind5 = 6 THEN stavalues5
+        END AS range_length_histogram,
+        CASE
+            WHEN stakind1 = 6 THEN stanumbers1[1]
+            WHEN stakind2 = 6 THEN stanumbers2[1]
+            WHEN stakind3 = 6 THEN stanumbers3[1]
+            WHEN stakind4 = 6 THEN stanumbers4[1]
+            WHEN stakind5 = 6 THEN stanumbers5[1]
+        END AS range_length_empty_frac,
+        CASE
+            WHEN stakind1 = 7 THEN stavalues1
+            WHEN stakind2 = 7 THEN stavalues2
+            WHEN stakind3 = 7 THEN stavalues3
+            WHEN stakind4 = 7 THEN stavalues4
+            WHEN stakind5 = 7 THEN stavalues5
+        END AS range_bounds_histogram
     FROM pg_statistic s JOIN pg_class c ON (c.oid = s.starelid)
          JOIN pg_attribute a ON (c.oid = attrelid AND attnum = s.staattnum)
          LEFT JOIN pg_namespace n ON (n.oid = c.relnamespace)
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index e5ab11275d..780aefdc26 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2409,7 +2409,31 @@ pg_stats| SELECT n.nspname AS schemaname,
             WHEN (s.stakind4 = 5) THEN s.stanumbers4
             WHEN (s.stakind5 = 5) THEN s.stanumbers5
             ELSE NULL::real[]
-        END AS elem_count_histogram
+        END AS elem_count_histogram,
+        CASE
+            WHEN (s.stakind1 = 6) THEN s.stavalues1
+            WHEN (s.stakind2 = 6) THEN s.stavalues2
+            WHEN (s.stakind3 = 6) THEN s.stavalues3
+            WHEN (s.stakind4 = 6) THEN s.stavalues4
+            WHEN (s.stakind5 = 6) THEN s.stavalues5
+            ELSE NULL::anyarray
+        END AS range_length_histogram,
+        CASE
+            WHEN (s.stakind1 = 6) THEN s.stanumbers1[1]
+            WHEN (s.stakind2 = 6) THEN s.stanumbers2[1]
+            WHEN (s.stakind3 = 6) THEN s.stanumbers3[1]
+            WHEN (s.stakind4 = 6) THEN s.stanumbers4[1]
+            WHEN (s.stakind5 = 6) THEN s.stanumbers5[1]
+            ELSE NULL::real
+        END AS range_length_empty_frac,
+        CASE
+            WHEN (s.stakind1 = 7) THEN s.stavalues1
+            WHEN (s.stakind2 = 7) THEN s.stavalues2
+            WHEN (s.stakind3 = 7) THEN s.stavalues3
+            WHEN (s.stakind4 = 7) THEN s.stavalues4
+            WHEN (s.stakind5 = 7) THEN s.stavalues5
+            ELSE NULL::anyarray
+        END AS range_bounds_histogram
    FROM (((pg_statistic s
      JOIN pg_class c ON ((c.oid = s.starelid)))
      JOIN pg_attribute a ON (((c.oid = a.attrelid) AND (a.attnum = s.staattnum))))

Tomas Vondra

tomas.vondra@enterprisedb.com

over 4 years ago

In reply to: Egor Rogov (#1)

Re: pg_stats and range statistics

On 6/18/21 6:22 PM, Egor Rogov wrote:

Hi,

Statistics for range types are not currently exposed in pg_stats view
(i.e. STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM and
STATISTIC_KIND_BOUNDS_HISTOGRAM).

Shouldn't they? If so, here is a patch for adding them.

I think they should be exposed - I don't see why not to do that. I
noticed this when working on the count-min sketch experiment too, so
thanks for this patch.

FWIW I've added the patch to the next CF:

https://commitfest.postgresql.org/33/3184/

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Soumyadeep Chakraborty

soumyadeep2007@gmail.com

over 4 years ago

In reply to: Tomas Vondra (#2)

1 attachment(s)

Re: pg_stats and range statistics

Hello,

This should have been added with [1]https://github.com/postgres/postgres/commit/918eee0c497c88260a2e107318843c9b1947bc6f.

Excerpt from the documentation:
"pg_stats is also designed to present the information in a more readable
format than the underlying catalog — at the cost that its schema must
be extended whenever new slot types are defined for pg_statistic." [2]https://www.postgresql.org/docs/devel/view-pg-stats.html

So, I added a reminder in pg_statistic.h.

Attached is v2 of this patch with some cosmetic changes. Renamed the columns a
bit and updated the docs to be a bit more descriptive.
(range_length_empty_frac -> empty_range_frac, range_bounds_histogram ->
range_bounds_histograms)

One question:

We do have the option of representing the histogram of lower bounds separately
from the histogram of upper bounds, as two separate view columns. Don't know if
there is much utility though and there is a fair bit of added complexity: see
below. Thoughts?

My attempts via SQL (unnest -> lower|upper -> array_agg) were futile given
unnest does not play nice with anyarray. For instance:

select unnest(stavalues1) from pg_statistic;
ERROR: cannot determine element type of "anyarray" argument

Maybe the only option is to write a UDF pg_get_{lower|upper}_bounds_histogram
which can do something similar to what calc_hist_selectivity does:

/*
* Convert histogram of ranges into histograms of its lower and upper
* bounds.
*/
nhist = hslot.nvalues;
hist_lower = (RangeBound *) palloc(sizeof(RangeBound) * nhist);
hist_upper = (RangeBound *) palloc(sizeof(RangeBound) * nhist);
for (i = 0; i < nhist; i++)
{
bool empty;

range_deserialize(rng_typcache, DatumGetRangeTypeP(hslot.values[i]),
&hist_lower[i], &hist_upper[i], &empty);
/* The histogram should not contain any empty ranges */
if (empty)
elog(ERROR, "bounds histogram contains an empty range");
}

This is looking good and ready.

[1]: https://github.com/postgres/postgres/commit/918eee0c497c88260a2e107318843c9b1947bc6f
[2]: https://www.postgresql.org/docs/devel/view-pg-stats.html

Regards,
Soumyadeep (VMware)

Attachments:

v2-pgstats.patchtext/x-patch; charset=US-ASCII; name=v2-pgstats.patchDownload

From 1f2ed0911eb3f7a53b16047cefb499692daf5ef6 Mon Sep 17 00:00:00 2001
From: Egor Rogov <e.rogov@postgrespro.ru>
Date: Sun, 11 Jul 2021 11:09:45 -0700
Subject: [PATCH v2 1/1] Display length and bounds histograms in pg_stats

Values corresponding to STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM and
STATISTIC_KIND_BOUNDS_HISTOGRAM were not exposed to pg_stats when these
slot kinds were introduced in 918eee0c49.

This commit adds the missing fields to pg_stats.

TODO: catalog version bump
---
 doc/src/sgml/catalogs.sgml           | 32 ++++++++++++++++++++++++++++
 src/backend/catalog/system_views.sql | 23 +++++++++++++++++++-
 src/include/catalog/pg_statistic.h   |  3 +++
 src/test/regress/expected/rules.out  | 26 +++++++++++++++++++++-
 4 files changed, 82 insertions(+), 2 deletions(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index f517a7d4af..7168ff9a13 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -12877,6 +12877,38 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
        non-null elements.  (Null for scalar types.)
       </para></entry>
      </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>range_length_histogram</structfield> <type>anyarray</type>
+      </para>
+      <para>
+       A histogram of the lengths of non-empty and non-null range values of a
+       range type column. (Null for non-range types.)
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>empty_range_frac</structfield> <type>float4</type>
+      </para>
+      <para>
+       Fraction of column entries whose values are empty ranges.
+       (Null for non-range types.)
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>range_bounds_histograms</structfield> <type>anyarray</type>
+      </para>
+      <para>
+       Histograms of lower and upper bounds of non-empty, non-null ranges,
+       combined into a single array of range values. The lower and upper bounds
+       of each value correspond to the histograms of lower and upper bounds
+       respectively. (Null for non-range types.)
+      </para></entry>
+     </row>
     </tbody>
    </tgroup>
   </table>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 999d984068..d8bc622ad5 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -243,7 +243,28 @@ CREATE VIEW pg_stats WITH (security_barrier) AS
             WHEN stakind3 = 5 THEN stanumbers3
             WHEN stakind4 = 5 THEN stanumbers4
             WHEN stakind5 = 5 THEN stanumbers5
-        END AS elem_count_histogram
+        END AS elem_count_histogram,
+        CASE
+            WHEN stakind1 = 6 THEN stavalues1
+            WHEN stakind2 = 6 THEN stavalues2
+            WHEN stakind3 = 6 THEN stavalues3
+            WHEN stakind4 = 6 THEN stavalues4
+            WHEN stakind5 = 6 THEN stavalues5
+        END AS range_length_histogram,
+        CASE
+            WHEN stakind1 = 6 THEN stanumbers1[1]
+            WHEN stakind2 = 6 THEN stanumbers2[1]
+            WHEN stakind3 = 6 THEN stanumbers3[1]
+            WHEN stakind4 = 6 THEN stanumbers4[1]
+            WHEN stakind5 = 6 THEN stanumbers5[1]
+        END AS empty_range_frac,
+        CASE
+            WHEN stakind1 = 7 THEN stavalues1
+            WHEN stakind2 = 7 THEN stavalues2
+            WHEN stakind3 = 7 THEN stavalues3
+            WHEN stakind4 = 7 THEN stavalues4
+            WHEN stakind5 = 7 THEN stavalues5
+            END AS range_bounds_histograms
     FROM pg_statistic s JOIN pg_class c ON (c.oid = s.starelid)
          JOIN pg_attribute a ON (c.oid = attrelid AND attnum = s.staattnum)
          LEFT JOIN pg_namespace n ON (n.oid = c.relnamespace)
diff --git a/src/include/catalog/pg_statistic.h b/src/include/catalog/pg_statistic.h
index 4f95d7ade4..d0b4923424 100644
--- a/src/include/catalog/pg_statistic.h
+++ b/src/include/catalog/pg_statistic.h
@@ -152,6 +152,9 @@ DECLARE_FOREIGN_KEY((starelid, staattnum), pg_attribute, (attrelid, attnum));
  * data "kind" will appear in any particular slot.  Instead, search the
  * stakind fields to see if the desired data is available.  (The standard
  * function get_attstatsslot() may be used for this.)
+ *
+ * Note: The pg_stats view needs to be modified whenever a new slot kind is
+ * added to core.
  */
 
 /*
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index e5ab11275d..1ce26aa717 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2409,7 +2409,31 @@ pg_stats| SELECT n.nspname AS schemaname,
             WHEN (s.stakind4 = 5) THEN s.stanumbers4
             WHEN (s.stakind5 = 5) THEN s.stanumbers5
             ELSE NULL::real[]
-        END AS elem_count_histogram
+        END AS elem_count_histogram,
+        CASE
+            WHEN (s.stakind1 = 6) THEN s.stavalues1
+            WHEN (s.stakind2 = 6) THEN s.stavalues2
+            WHEN (s.stakind3 = 6) THEN s.stavalues3
+            WHEN (s.stakind4 = 6) THEN s.stavalues4
+            WHEN (s.stakind5 = 6) THEN s.stavalues5
+            ELSE NULL::anyarray
+        END AS range_length_histogram,
+        CASE
+            WHEN (s.stakind1 = 6) THEN s.stanumbers1[1]
+            WHEN (s.stakind2 = 6) THEN s.stanumbers2[1]
+            WHEN (s.stakind3 = 6) THEN s.stanumbers3[1]
+            WHEN (s.stakind4 = 6) THEN s.stanumbers4[1]
+            WHEN (s.stakind5 = 6) THEN s.stanumbers5[1]
+            ELSE NULL::real
+        END AS empty_range_frac,
+        CASE
+            WHEN (s.stakind1 = 7) THEN s.stavalues1
+            WHEN (s.stakind2 = 7) THEN s.stavalues2
+            WHEN (s.stakind3 = 7) THEN s.stavalues3
+            WHEN (s.stakind4 = 7) THEN s.stavalues4
+            WHEN (s.stakind5 = 7) THEN s.stavalues5
+            ELSE NULL::anyarray
+        END AS range_bounds_histograms
    FROM (((pg_statistic s
      JOIN pg_class c ON ((c.oid = s.starelid)))
      JOIN pg_attribute a ON (((c.oid = a.attrelid) AND (a.attnum = s.staattnum))))
-- 
2.25.1

Egor Rogov

e.rogov@postgrespro.ru

over 4 years ago

In reply to: Soumyadeep Chakraborty (#3)

Re: pg_stats and range statistics

Hi,

thanks for the review and corrections.

On 11.07.2021 21:54, Soumyadeep Chakraborty wrote:

Hello,

This should have been added with [1].

Excerpt from the documentation:
"pg_stats is also designed to present the information in a more readable
format than the underlying catalog — at the cost that its schema must
be extended whenever new slot types are defined for pg_statistic." [2]

So, I added a reminder in pg_statistic.h.

Good point.

Attached is v2 of this patch with some cosmetic changes.

I wonder why "TODO: catalog version bump"? This patch doesn't change
catalog structure, or I miss something?

Renamed the columns a
bit and updated the docs to be a bit more descriptive.
(range_length_empty_frac -> empty_range_frac, range_bounds_histogram ->
range_bounds_histograms)

I intended to make the same prefix ("range_") for all columns concerned
with range types, although I'm fine with the proposed naming.

One question:

We do have the option of representing the histogram of lower bounds separately
from the histogram of upper bounds, as two separate view columns. Don't know if
there is much utility though and there is a fair bit of added complexity: see
below. Thoughts?

I thought about it too, and decided not to transform the underlying data
structure. As far as I can see, pg_stats never employed such
transformations. For example, STATISTIC_KIND_DECHIST is an array
containing the histogram followed by the average in its last element. It
is shown in pg_stats.elem_count_histogram as is, although it arguably
may be splitted into two fields. All in all, I believe pg_stats's job is
to "unpack" stavalues and stanumbers into meaningful fields, and not to
try to go deeper than that.

Show quoted text

My attempts via SQL (unnest -> lower|upper -> array_agg) were futile given
unnest does not play nice with anyarray. For instance:

select unnest(stavalues1) from pg_statistic;
ERROR: cannot determine element type of "anyarray" argument

Maybe the only option is to write a UDF pg_get_{lower|upper}_bounds_histogram
which can do something similar to what calc_hist_selectivity does:

/*
* Convert histogram of ranges into histograms of its lower and upper
* bounds.
*/
nhist = hslot.nvalues;
hist_lower = (RangeBound *) palloc(sizeof(RangeBound) * nhist);
hist_upper = (RangeBound *) palloc(sizeof(RangeBound) * nhist);
for (i = 0; i < nhist; i++)
{
bool empty;

range_deserialize(rng_typcache, DatumGetRangeTypeP(hslot.values[i]),
&hist_lower[i], &hist_upper[i], &empty);
/* The histogram should not contain any empty ranges */
if (empty)
elog(ERROR, "bounds histogram contains an empty range");
}

This is looking good and ready.

[1] https://github.com/postgres/postgres/commit/918eee0c497c88260a2e107318843c9b1947bc6f
[2] https://www.postgresql.org/docs/devel/view-pg-stats.html

Regards,
Soumyadeep (VMware)

Tomas Vondra

tomas.vondra@enterprisedb.com

over 4 years ago

In reply to: Egor Rogov (#4)

Re: pg_stats and range statistics

On 7/12/21 1:10 PM, Egor Rogov wrote:

Hi,

thanks for the review and corrections.

On 11.07.2021 21:54, Soumyadeep Chakraborty wrote:

Hello,

This should have been added with [1].

Excerpt from the documentation:
"pg_stats is also designed to present the information in a more readable
format than the underlying catalog — at the cost that its schema must
be extended whenever new slot types are defined for pg_statistic." [2]

So, I added a reminder in pg_statistic.h.

Good point.

Attached is v2 of this patch with some cosmetic changes.

I wonder why "TODO: catalog version bump"? This patch doesn't change
catalog structure, or I miss something?

It changes system_views.sql, which is catalog change, as it redefines
the pg_stats system view (it adds 3 more columns). So it changes what
you get after initdb, hence catversion has to be bumped.

Renamed the columns a
bit and updated the docs to be a bit more descriptive.
(range_length_empty_frac -> empty_range_frac, range_bounds_histogram ->
range_bounds_histograms)

I intended to make the same prefix ("range_") for all columns concerned
with range types, although I'm fine with the proposed naming.

Yeah, I'd vote to change empty_range_frac -> range_empty_frac.

One question:

We do have the option of representing the histogram of lower bounds
separately
from the histogram of upper bounds, as two separate view columns.
Don't know if
there is much utility though and there is a fair bit of added
complexity: see
below. Thoughts?

I thought about it too, and decided not to transform the underlying data
structure. As far as I can see, pg_stats never employed such
transformations. For example, STATISTIC_KIND_DECHIST is an array
containing the histogram followed by the average in its last element. It
is shown in pg_stats.elem_count_histogram as is, although it arguably
may be splitted into two fields. All in all, I believe pg_stats's job is
to "unpack" stavalues and stanumbers into meaningful fields, and not to
try to go deeper than that.

Not firm opinion, but the pg_stats is meant to be easier to
read/understand for humans. So far the transformation were simple
because all the data was fairly simple, but the range stuff may need
more complex transformation.

For example we do quite a bit more in pg_stats_ext views, because it
deals with multi-column stats.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Egor Rogov

e.rogov@postgrespro.ru

over 4 years ago

In reply to: Tomas Vondra (#5)

Re: pg_stats and range statistics

Hi Tomas,

On 12.07.2021 16:04, Tomas Vondra wrote:

On 7/12/21 1:10 PM, Egor Rogov wrote:

Hi,

thanks for the review and corrections.

On 11.07.2021 21:54, Soumyadeep Chakraborty wrote:

Hello,

This should have been added with [1].

Excerpt from the documentation:
"pg_stats is also designed to present the information in a more readable
format than the underlying catalog — at the cost that its schema must
be extended whenever new slot types are defined for pg_statistic." [2]

So, I added a reminder in pg_statistic.h.

Good point.

Attached is v2 of this patch with some cosmetic changes.

I wonder why "TODO: catalog version bump"? This patch doesn't change
catalog structure, or I miss something?

It changes system_views.sql, which is catalog change, as it redefines
the pg_stats system view (it adds 3 more columns). So it changes what
you get after initdb, hence catversion has to be bumped.

Renamed the columns a
bit and updated the docs to be a bit more descriptive.
(range_length_empty_frac -> empty_range_frac, range_bounds_histogram ->
range_bounds_histograms)

I intended to make the same prefix ("range_") for all columns concerned
with range types, although I'm fine with the proposed naming.

Yeah, I'd vote to change empty_range_frac -> range_empty_frac.

One question:

We do have the option of representing the histogram of lower bounds
separately
from the histogram of upper bounds, as two separate view columns.
Don't know if
there is much utility though and there is a fair bit of added
complexity: see
below. Thoughts?

I thought about it too, and decided not to transform the underlying data
structure. As far as I can see, pg_stats never employed such
transformations. For example, STATISTIC_KIND_DECHIST is an array
containing the histogram followed by the average in its last element. It
is shown in pg_stats.elem_count_histogram as is, although it arguably
may be splitted into two fields. All in all, I believe pg_stats's job is
to "unpack" stavalues and stanumbers into meaningful fields, and not to
try to go deeper than that.

Not firm opinion, but the pg_stats is meant to be easier to
read/understand for humans. So far the transformation were simple
because all the data was fairly simple, but the range stuff may need
more complex transformation.

For example we do quite a bit more in pg_stats_ext views, because it
deals with multi-column stats.

In pg_stats_ext, yes, but not in pg_stats (at least until now).

Since no one has expressed a strong desire for a more complex
transformation, should we proceed with the proposed approach (with
further renaming empty_range_frac -> range_empty_frac as you suggested)?
Or should we wait more for someone to weigh in?

Show quoted text

regards

Tomas Vondra

tomas.vondra@enterprisedb.com

almost 3 years ago

In reply to: Egor Rogov (#6)

3 attachment(s)

Re: pg_stats and range statistics

Hi Egor,

While reviewing a patch improving join estimates for ranges [1]https://commitfest.postgresql.org/41/3821/ I
realized we don't show stats for ranges in pg_stats, and I recalled we
had this patch.

I rebased the v2, and I decided to took a stab at showing separate
histograms for lower/upper histogram bounds. I believe it makes it way
more readable, which is what pg_stats is about IMHO.

This simply adds two functions, accepting/producing anyarray - one for
lower bounds, one for upper bounds. I don't think it can be done with a
plain subquery (or at least I don't know how).

Finally, it renames the empty_range_frac to start with range_, per the
earlier discussion. I wonder if the new column names for lower/upper
bounds (range_lower_bounds_histograms/range_upper_bounds_histograms) are
too long ...

regards

[1]: https://commitfest.postgresql.org/41/3821/

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

0001-Display-length-and-bounds-histograms-in-pg_-20230120.patchtext/x-patch; charset=UTF-8; name=0001-Display-length-and-bounds-histograms-in-pg_-20230120.patchDownload

From b339c0eab5616cec61e9d9e85398034861608d30 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Fri, 20 Jan 2023 20:50:41 +0100
Subject: [PATCH 1/3] Display length and bounds histograms in pg_stats

Values corresponding to STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM and
STATISTIC_KIND_BOUNDS_HISTOGRAM were not exposed to pg_stats when these
slot kinds were introduced in 918eee0c49.

This commit adds the missing fields to pg_stats.

TODO: catalog version bump
---
 doc/src/sgml/catalogs.sgml           | 32 ++++++++++++++++++++++++++++
 src/backend/catalog/system_views.sql | 23 +++++++++++++++++++-
 src/include/catalog/pg_statistic.h   |  3 +++
 src/test/regress/expected/rules.out  | 26 +++++++++++++++++++++-
 4 files changed, 82 insertions(+), 2 deletions(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index c1e4048054e..c8bd84c56eb 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -9634,6 +9634,38 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
        User mapping specific options, as <quote>keyword=value</quote> strings
       </para></entry>
      </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>range_length_histogram</structfield> <type>anyarray</type>
+      </para>
+      <para>
+       A histogram of the lengths of non-empty and non-null range values of a
+       range type column. (Null for non-range types.)
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>empty_range_frac</structfield> <type>float4</type>
+      </para>
+      <para>
+       Fraction of column entries whose values are empty ranges.
+       (Null for non-range types.)
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>range_bounds_histograms</structfield> <type>anyarray</type>
+      </para>
+      <para>
+       Histograms of lower and upper bounds of non-empty, non-null ranges,
+       combined into a single array of range values. The lower and upper bounds
+       of each value correspond to the histograms of lower and upper bounds
+       respectively. (Null for non-range types.)
+      </para></entry>
+     </row>
     </tbody>
    </tgroup>
   </table>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 8608e3fa5b1..ccd6c7ffdb7 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -243,7 +243,28 @@ CREATE VIEW pg_stats WITH (security_barrier) AS
             WHEN stakind3 = 5 THEN stanumbers3
             WHEN stakind4 = 5 THEN stanumbers4
             WHEN stakind5 = 5 THEN stanumbers5
-        END AS elem_count_histogram
+        END AS elem_count_histogram,
+        CASE
+            WHEN stakind1 = 6 THEN stavalues1
+            WHEN stakind2 = 6 THEN stavalues2
+            WHEN stakind3 = 6 THEN stavalues3
+            WHEN stakind4 = 6 THEN stavalues4
+            WHEN stakind5 = 6 THEN stavalues5
+        END AS range_length_histogram,
+        CASE
+            WHEN stakind1 = 6 THEN stanumbers1[1]
+            WHEN stakind2 = 6 THEN stanumbers2[1]
+            WHEN stakind3 = 6 THEN stanumbers3[1]
+            WHEN stakind4 = 6 THEN stanumbers4[1]
+            WHEN stakind5 = 6 THEN stanumbers5[1]
+        END AS empty_range_frac,
+        CASE
+            WHEN stakind1 = 7 THEN stavalues1
+            WHEN stakind2 = 7 THEN stavalues2
+            WHEN stakind3 = 7 THEN stavalues3
+            WHEN stakind4 = 7 THEN stavalues4
+            WHEN stakind5 = 7 THEN stavalues5
+            END AS range_bounds_histograms
     FROM pg_statistic s JOIN pg_class c ON (c.oid = s.starelid)
          JOIN pg_attribute a ON (c.oid = attrelid AND attnum = s.staattnum)
          LEFT JOIN pg_namespace n ON (n.oid = c.relnamespace)
diff --git a/src/include/catalog/pg_statistic.h b/src/include/catalog/pg_statistic.h
index 8770c5b4c60..10401dece0d 100644
--- a/src/include/catalog/pg_statistic.h
+++ b/src/include/catalog/pg_statistic.h
@@ -152,6 +152,9 @@ DECLARE_FOREIGN_KEY((starelid, staattnum), pg_attribute, (attrelid, attnum));
  * data "kind" will appear in any particular slot.  Instead, search the
  * stakind fields to see if the desired data is available.  (The standard
  * function get_attstatsslot() may be used for this.)
+ *
+ * Note: The pg_stats view needs to be modified whenever a new slot kind is
+ * added to core.
  */
 
 /*
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index e7a2f5856aa..ced5933271c 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2436,7 +2436,31 @@ pg_stats| SELECT n.nspname AS schemaname,
             WHEN (s.stakind4 = 5) THEN s.stanumbers4
             WHEN (s.stakind5 = 5) THEN s.stanumbers5
             ELSE NULL::real[]
-        END AS elem_count_histogram
+        END AS elem_count_histogram,
+        CASE
+            WHEN (s.stakind1 = 6) THEN s.stavalues1
+            WHEN (s.stakind2 = 6) THEN s.stavalues2
+            WHEN (s.stakind3 = 6) THEN s.stavalues3
+            WHEN (s.stakind4 = 6) THEN s.stavalues4
+            WHEN (s.stakind5 = 6) THEN s.stavalues5
+            ELSE NULL::anyarray
+        END AS range_length_histogram,
+        CASE
+            WHEN (s.stakind1 = 6) THEN s.stanumbers1[1]
+            WHEN (s.stakind2 = 6) THEN s.stanumbers2[1]
+            WHEN (s.stakind3 = 6) THEN s.stanumbers3[1]
+            WHEN (s.stakind4 = 6) THEN s.stanumbers4[1]
+            WHEN (s.stakind5 = 6) THEN s.stanumbers5[1]
+            ELSE NULL::real
+        END AS empty_range_frac,
+        CASE
+            WHEN (s.stakind1 = 7) THEN s.stavalues1
+            WHEN (s.stakind2 = 7) THEN s.stavalues2
+            WHEN (s.stakind3 = 7) THEN s.stavalues3
+            WHEN (s.stakind4 = 7) THEN s.stavalues4
+            WHEN (s.stakind5 = 7) THEN s.stavalues5
+            ELSE NULL::anyarray
+        END AS range_bounds_histograms
    FROM (((pg_statistic s
      JOIN pg_class c ON ((c.oid = s.starelid)))
      JOIN pg_attribute a ON (((c.oid = a.attrelid) AND (a.attnum = s.staattnum))))
-- 
2.39.0

0002-add-functions-to-extract-lower-upper-ramge--20230120.patchtext/x-patch; charset=UTF-8; name=0002-add-functions-to-extract-lower-upper-ramge--20230120.patchDownload

From 9bd2bbef9dc092ab5fdace74e26f3afeaa62830d Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Fri, 20 Jan 2023 22:29:53 +0100
Subject: [PATCH 2/3] add functions to extract lower/upper ramge bounds

---
 src/backend/catalog/system_views.sql          |  19 ++-
 src/backend/utils/adt/rangetypes_typanalyze.c | 118 ++++++++++++++++++
 src/include/catalog/pg_proc.dat               |  10 ++
 3 files changed, 141 insertions(+), 6 deletions(-)

diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index ccd6c7ffdb7..e3314550e32 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -259,12 +259,19 @@ CREATE VIEW pg_stats WITH (security_barrier) AS
             WHEN stakind5 = 6 THEN stanumbers5[1]
         END AS empty_range_frac,
         CASE
-            WHEN stakind1 = 7 THEN stavalues1
-            WHEN stakind2 = 7 THEN stavalues2
-            WHEN stakind3 = 7 THEN stavalues3
-            WHEN stakind4 = 7 THEN stavalues4
-            WHEN stakind5 = 7 THEN stavalues5
-            END AS range_bounds_histograms
+            WHEN stakind1 = 7 THEN ranges_lower_bounds(stavalues1)
+            WHEN stakind2 = 7 THEN ranges_lower_bounds(stavalues2)
+            WHEN stakind3 = 7 THEN ranges_lower_bounds(stavalues3)
+            WHEN stakind4 = 7 THEN ranges_lower_bounds(stavalues4)
+            WHEN stakind5 = 7 THEN ranges_lower_bounds(stavalues5)
+            END AS range_lower_bounds_histograms,
+        CASE
+            WHEN stakind1 = 7 THEN ranges_upper_bounds(stavalues1)
+            WHEN stakind2 = 7 THEN ranges_upper_bounds(stavalues2)
+            WHEN stakind3 = 7 THEN ranges_upper_bounds(stavalues3)
+            WHEN stakind4 = 7 THEN ranges_upper_bounds(stavalues4)
+            WHEN stakind5 = 7 THEN ranges_upper_bounds(stavalues5)
+            END AS range_upper_bounds_histograms
     FROM pg_statistic s JOIN pg_class c ON (c.oid = s.starelid)
          JOIN pg_attribute a ON (c.oid = attrelid AND attnum = s.staattnum)
          LEFT JOIN pg_namespace n ON (n.oid = c.relnamespace)
diff --git a/src/backend/utils/adt/rangetypes_typanalyze.c b/src/backend/utils/adt/rangetypes_typanalyze.c
index 86810a1a6e6..a0097282fc0 100644
--- a/src/backend/utils/adt/rangetypes_typanalyze.c
+++ b/src/backend/utils/adt/rangetypes_typanalyze.c
@@ -26,6 +26,8 @@
 
 #include "catalog/pg_operator.h"
 #include "commands/vacuum.h"
+#include "utils/array.h"
+#include "utils/arrayaccess.h"
 #include "utils/float.h"
 #include "utils/fmgrprotos.h"
 #include "utils/lsyscache.h"
@@ -427,3 +429,119 @@ compute_range_stats(VacAttrStats *stats, AnalyzeAttrFetchFunc fetchfunc,
 	 * hashtable should also go away, as it used a child memory context.
 	 */
 }
+
+Datum
+ranges_lower_bounds(PG_FUNCTION_ARGS)
+{
+	AnyArrayType *array = PG_GETARG_ANY_ARRAY_P(0);
+	int			ndims = AARR_NDIM(array);
+	int		   *dims = AARR_DIMS(array);
+	Oid			element_type = AARR_ELEMTYPE(array);
+	int			i;
+	array_iter	iter;
+	int			nelems;
+	Datum	   *elems;
+	TypeCacheEntry *typentry;
+	TypeCacheEntry *typentry_element;
+
+	/* Get information about range type; note column might be a domain */
+	typentry = range_get_typcache(fcinfo, getBaseType(element_type));
+
+	if (typentry->typtype != TYPTYPE_RANGE)
+		ereport(ERROR,
+				(errcode(ERRCODE_DATATYPE_MISMATCH),
+				 errmsg("expected array of ranges")));
+
+	ndims = AARR_NDIM(array);
+	dims = AARR_DIMS(array);
+	nelems = ArrayGetNItems(ndims, dims);
+
+	elems = (Datum *) palloc(nelems * sizeof(Datum));
+
+	array_iter_setup(&iter, array);
+
+	for (i = 0; i < nelems; i++)
+	{
+		Datum		itemvalue;
+		bool		isnull;
+		RangeBound	lower;
+		RangeBound	upper;
+		bool		empty;
+
+		/* Get source element, checking for NULL */
+		itemvalue = array_iter_next(&iter, &isnull, i,
+									typentry->typlen, typentry->typbyval,
+									typentry->typalign);
+
+		Assert(!isnull);
+
+		range_deserialize(typentry, (RangeType *) itemvalue, &lower, &upper, &empty);
+		elems[i] = lower.val;
+	}
+
+	typentry_element = typentry->rngelemtype;
+
+	PG_RETURN_ARRAYTYPE_P(construct_array(elems, nelems,
+						  typentry_element->type_id,
+						  typentry_element->typlen,
+						  typentry_element->typbyval,
+						  typentry_element->typalign));
+}
+
+Datum
+ranges_upper_bounds(PG_FUNCTION_ARGS)
+{
+	AnyArrayType *array = PG_GETARG_ANY_ARRAY_P(0);
+	int			ndims = AARR_NDIM(array);
+	int		   *dims = AARR_DIMS(array);
+	Oid			element_type = AARR_ELEMTYPE(array);
+	int			i;
+	array_iter	iter;
+	int			nelems;
+	Datum	   *elems;
+	TypeCacheEntry *typentry;
+	TypeCacheEntry *typentry_element;
+
+	/* Get information about range type; note column might be a domain */
+	typentry = range_get_typcache(fcinfo, getBaseType(element_type));
+
+	if (typentry->typtype != TYPTYPE_RANGE)
+		ereport(ERROR,
+				(errcode(ERRCODE_DATATYPE_MISMATCH),
+				 errmsg("expected array of ranges")));
+
+	ndims = AARR_NDIM(array);
+	dims = AARR_DIMS(array);
+	nelems = ArrayGetNItems(ndims, dims);
+
+	elems = (Datum *) palloc(nelems * sizeof(Datum));
+
+	array_iter_setup(&iter, array);
+
+	for (i = 0; i < nelems; i++)
+	{
+		Datum		itemvalue;
+		bool		isnull;
+		RangeBound	lower;
+		RangeBound	upper;
+		bool		empty;
+
+		/* Get source element, checking for NULL */
+		itemvalue = array_iter_next(&iter, &isnull, i,
+									typentry->typlen, typentry->typbyval,
+									typentry->typalign);
+
+		Assert(!isnull);
+
+		range_deserialize(typentry, (RangeType *) itemvalue, &lower, &upper, &empty);
+		elems[i] = upper.val;
+	}
+
+	typentry_element = typentry->rngelemtype;
+
+	PG_RETURN_ARRAYTYPE_P(construct_array(elems, nelems,
+						  typentry_element->type_id,
+						  typentry_element->typlen,
+						  typentry_element->typbyval,
+						  typentry_element->typalign));
+}
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 86eb8e8c58a..161557e4cb2 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11891,4 +11891,14 @@
   prorettype => 'bytea', proargtypes => 'pg_brin_minmax_multi_summary',
   prosrc => 'brin_minmax_multi_summary_send' },
 
+{ oid => '9693', descr => 'lower bounds of ranges',
+  proname => 'ranges_lower_bounds', provolatile => 's',
+  prorettype => 'anyarray', proargtypes => 'anyarray',
+  prosrc => 'ranges_lower_bounds' },
+
+{ oid => '9694', descr => 'upper bounds of ranges',
+  proname => 'ranges_upper_bounds', provolatile => 's',
+  prorettype => 'anyarray', proargtypes => 'anyarray',
+  prosrc => 'ranges_upper_bounds' },
+
 ]
-- 
2.39.0

0003-rename-empty_range_frac-20230120.patchtext/x-patch; charset=UTF-8; name=0003-rename-empty_range_frac-20230120.patchDownload

From 600fed1faae369b8291ec3a7fc64f6ad8099c124 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Fri, 20 Jan 2023 22:40:28 +0100
Subject: [PATCH 3/3] rename empty_range_frac

---
 src/backend/catalog/system_views.sql | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index e3314550e32..ac99d41f808 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -257,7 +257,7 @@ CREATE VIEW pg_stats WITH (security_barrier) AS
             WHEN stakind3 = 6 THEN stanumbers3[1]
             WHEN stakind4 = 6 THEN stanumbers4[1]
             WHEN stakind5 = 6 THEN stanumbers5[1]
-        END AS empty_range_frac,
+        END AS range_empty_frac,
         CASE
             WHEN stakind1 = 7 THEN ranges_lower_bounds(stavalues1)
             WHEN stakind2 = 7 THEN ranges_lower_bounds(stavalues2)
-- 
2.39.0

Egor Rogov

e.rogov@postgrespro.ru

almost 3 years ago

In reply to: Tomas Vondra (#7)

4 attachment(s)

Re: pg_stats and range statistics

Hi Tomas,

On 21.01.2023 00:50, Tomas Vondra wrote:

Hi Egor,

While reviewing a patch improving join estimates for ranges [1] I
realized we don't show stats for ranges in pg_stats, and I recalled we
had this patch.

I rebased the v2, and I decided to took a stab at showing separate
histograms for lower/upper histogram bounds. I believe it makes it way
more readable, which is what pg_stats is about IMHO.

Thanks for looking into this.

I have to admit it looks much better this way, so +1.

This simply adds two functions, accepting/producing anyarray - one for
lower bounds, one for upper bounds. I don't think it can be done with a
plain subquery (or at least I don't know how).

Anyarray is an alien to SQL, so functions are well justified here. What
makes me a bit uneasy is two almost identical functions. Should we
consider other options like a function with an additional parameter or a
function returning an array of bounds arrays (which is somewhat
wasteful, but probably it doesn't matter much here)?

Finally, it renames the empty_range_frac to start with range_, per the
earlier discussion. I wonder if the new column names for lower/upper
bounds (range_lower_bounds_histograms/range_upper_bounds_histograms) are
too long ...

It seems so. The ending -s should be left out since it's a single
histogram now. And I think that
range_lower_histogram/range_upper_histogram are descriptive enough.

I'm adding one more patch to shorten the column names, refresh the docs,
and make 'make check' happy (unfortunately, we have to edit
src/regress/expected/rules.out every time pg_stats definition changes).

Show quoted text

regards

[1] https://commitfest.postgresql.org/41/3821/

Attachments:

0001-Display-length-and-bounds-histograms-in-pg_-20230120.patchtext/plain; charset=UTF-8; name=0001-Display-length-and-bounds-histograms-in-pg_-20230120.patchDownload

From b339c0eab5616cec61e9d9e85398034861608d30 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Fri, 20 Jan 2023 20:50:41 +0100
Subject: [PATCH 1/3] Display length and bounds histograms in pg_stats

Values corresponding to STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM and
STATISTIC_KIND_BOUNDS_HISTOGRAM were not exposed to pg_stats when these
slot kinds were introduced in 918eee0c49.

This commit adds the missing fields to pg_stats.

TODO: catalog version bump
---
 doc/src/sgml/catalogs.sgml           | 32 ++++++++++++++++++++++++++++
 src/backend/catalog/system_views.sql | 23 +++++++++++++++++++-
 src/include/catalog/pg_statistic.h   |  3 +++
 src/test/regress/expected/rules.out  | 26 +++++++++++++++++++++-
 4 files changed, 82 insertions(+), 2 deletions(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index c1e4048054e..c8bd84c56eb 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -9634,6 +9634,38 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
        User mapping specific options, as <quote>keyword=value</quote> strings
       </para></entry>
      </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>range_length_histogram</structfield> <type>anyarray</type>
+      </para>
+      <para>
+       A histogram of the lengths of non-empty and non-null range values of a
+       range type column. (Null for non-range types.)
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>empty_range_frac</structfield> <type>float4</type>
+      </para>
+      <para>
+       Fraction of column entries whose values are empty ranges.
+       (Null for non-range types.)
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>range_bounds_histograms</structfield> <type>anyarray</type>
+      </para>
+      <para>
+       Histograms of lower and upper bounds of non-empty, non-null ranges,
+       combined into a single array of range values. The lower and upper bounds
+       of each value correspond to the histograms of lower and upper bounds
+       respectively. (Null for non-range types.)
+      </para></entry>
+     </row>
     </tbody>
    </tgroup>
   </table>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 8608e3fa5b1..ccd6c7ffdb7 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -243,7 +243,28 @@ CREATE VIEW pg_stats WITH (security_barrier) AS
             WHEN stakind3 = 5 THEN stanumbers3
             WHEN stakind4 = 5 THEN stanumbers4
             WHEN stakind5 = 5 THEN stanumbers5
-        END AS elem_count_histogram
+        END AS elem_count_histogram,
+        CASE
+            WHEN stakind1 = 6 THEN stavalues1
+            WHEN stakind2 = 6 THEN stavalues2
+            WHEN stakind3 = 6 THEN stavalues3
+            WHEN stakind4 = 6 THEN stavalues4
+            WHEN stakind5 = 6 THEN stavalues5
+        END AS range_length_histogram,
+        CASE
+            WHEN stakind1 = 6 THEN stanumbers1[1]
+            WHEN stakind2 = 6 THEN stanumbers2[1]
+            WHEN stakind3 = 6 THEN stanumbers3[1]
+            WHEN stakind4 = 6 THEN stanumbers4[1]
+            WHEN stakind5 = 6 THEN stanumbers5[1]
+        END AS empty_range_frac,
+        CASE
+            WHEN stakind1 = 7 THEN stavalues1
+            WHEN stakind2 = 7 THEN stavalues2
+            WHEN stakind3 = 7 THEN stavalues3
+            WHEN stakind4 = 7 THEN stavalues4
+            WHEN stakind5 = 7 THEN stavalues5
+            END AS range_bounds_histograms
     FROM pg_statistic s JOIN pg_class c ON (c.oid = s.starelid)
          JOIN pg_attribute a ON (c.oid = attrelid AND attnum = s.staattnum)
          LEFT JOIN pg_namespace n ON (n.oid = c.relnamespace)
diff --git a/src/include/catalog/pg_statistic.h b/src/include/catalog/pg_statistic.h
index 8770c5b4c60..10401dece0d 100644
--- a/src/include/catalog/pg_statistic.h
+++ b/src/include/catalog/pg_statistic.h
@@ -152,6 +152,9 @@ DECLARE_FOREIGN_KEY((starelid, staattnum), pg_attribute, (attrelid, attnum));
  * data "kind" will appear in any particular slot.  Instead, search the
  * stakind fields to see if the desired data is available.  (The standard
  * function get_attstatsslot() may be used for this.)
+ *
+ * Note: The pg_stats view needs to be modified whenever a new slot kind is
+ * added to core.
  */
 
 /*
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index e7a2f5856aa..ced5933271c 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2436,7 +2436,31 @@ pg_stats| SELECT n.nspname AS schemaname,
             WHEN (s.stakind4 = 5) THEN s.stanumbers4
             WHEN (s.stakind5 = 5) THEN s.stanumbers5
             ELSE NULL::real[]
-        END AS elem_count_histogram
+        END AS elem_count_histogram,
+        CASE
+            WHEN (s.stakind1 = 6) THEN s.stavalues1
+            WHEN (s.stakind2 = 6) THEN s.stavalues2
+            WHEN (s.stakind3 = 6) THEN s.stavalues3
+            WHEN (s.stakind4 = 6) THEN s.stavalues4
+            WHEN (s.stakind5 = 6) THEN s.stavalues5
+            ELSE NULL::anyarray
+        END AS range_length_histogram,
+        CASE
+            WHEN (s.stakind1 = 6) THEN s.stanumbers1[1]
+            WHEN (s.stakind2 = 6) THEN s.stanumbers2[1]
+            WHEN (s.stakind3 = 6) THEN s.stanumbers3[1]
+            WHEN (s.stakind4 = 6) THEN s.stanumbers4[1]
+            WHEN (s.stakind5 = 6) THEN s.stanumbers5[1]
+            ELSE NULL::real
+        END AS empty_range_frac,
+        CASE
+            WHEN (s.stakind1 = 7) THEN s.stavalues1
+            WHEN (s.stakind2 = 7) THEN s.stavalues2
+            WHEN (s.stakind3 = 7) THEN s.stavalues3
+            WHEN (s.stakind4 = 7) THEN s.stavalues4
+            WHEN (s.stakind5 = 7) THEN s.stavalues5
+            ELSE NULL::anyarray
+        END AS range_bounds_histograms
    FROM (((pg_statistic s
      JOIN pg_class c ON ((c.oid = s.starelid)))
      JOIN pg_attribute a ON (((c.oid = a.attrelid) AND (a.attnum = s.staattnum))))
-- 
2.39.0

0002-add-functions-to-extract-lower-upper-ramge--20230120.patchtext/plain; charset=UTF-8; name=0002-add-functions-to-extract-lower-upper-ramge--20230120.patchDownload

From 9bd2bbef9dc092ab5fdace74e26f3afeaa62830d Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Fri, 20 Jan 2023 22:29:53 +0100
Subject: [PATCH 2/3] add functions to extract lower/upper ramge bounds

---
 src/backend/catalog/system_views.sql          |  19 ++-
 src/backend/utils/adt/rangetypes_typanalyze.c | 118 ++++++++++++++++++
 src/include/catalog/pg_proc.dat               |  10 ++
 3 files changed, 141 insertions(+), 6 deletions(-)

diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index ccd6c7ffdb7..e3314550e32 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -259,12 +259,19 @@ CREATE VIEW pg_stats WITH (security_barrier) AS
             WHEN stakind5 = 6 THEN stanumbers5[1]
         END AS empty_range_frac,
         CASE
-            WHEN stakind1 = 7 THEN stavalues1
-            WHEN stakind2 = 7 THEN stavalues2
-            WHEN stakind3 = 7 THEN stavalues3
-            WHEN stakind4 = 7 THEN stavalues4
-            WHEN stakind5 = 7 THEN stavalues5
-            END AS range_bounds_histograms
+            WHEN stakind1 = 7 THEN ranges_lower_bounds(stavalues1)
+            WHEN stakind2 = 7 THEN ranges_lower_bounds(stavalues2)
+            WHEN stakind3 = 7 THEN ranges_lower_bounds(stavalues3)
+            WHEN stakind4 = 7 THEN ranges_lower_bounds(stavalues4)
+            WHEN stakind5 = 7 THEN ranges_lower_bounds(stavalues5)
+            END AS range_lower_bounds_histograms,
+        CASE
+            WHEN stakind1 = 7 THEN ranges_upper_bounds(stavalues1)
+            WHEN stakind2 = 7 THEN ranges_upper_bounds(stavalues2)
+            WHEN stakind3 = 7 THEN ranges_upper_bounds(stavalues3)
+            WHEN stakind4 = 7 THEN ranges_upper_bounds(stavalues4)
+            WHEN stakind5 = 7 THEN ranges_upper_bounds(stavalues5)
+            END AS range_upper_bounds_histograms
     FROM pg_statistic s JOIN pg_class c ON (c.oid = s.starelid)
          JOIN pg_attribute a ON (c.oid = attrelid AND attnum = s.staattnum)
          LEFT JOIN pg_namespace n ON (n.oid = c.relnamespace)
diff --git a/src/backend/utils/adt/rangetypes_typanalyze.c b/src/backend/utils/adt/rangetypes_typanalyze.c
index 86810a1a6e6..a0097282fc0 100644
--- a/src/backend/utils/adt/rangetypes_typanalyze.c
+++ b/src/backend/utils/adt/rangetypes_typanalyze.c
@@ -26,6 +26,8 @@
 
 #include "catalog/pg_operator.h"
 #include "commands/vacuum.h"
+#include "utils/array.h"
+#include "utils/arrayaccess.h"
 #include "utils/float.h"
 #include "utils/fmgrprotos.h"
 #include "utils/lsyscache.h"
@@ -427,3 +429,119 @@ compute_range_stats(VacAttrStats *stats, AnalyzeAttrFetchFunc fetchfunc,
 	 * hashtable should also go away, as it used a child memory context.
 	 */
 }
+
+Datum
+ranges_lower_bounds(PG_FUNCTION_ARGS)
+{
+	AnyArrayType *array = PG_GETARG_ANY_ARRAY_P(0);
+	int			ndims = AARR_NDIM(array);
+	int		   *dims = AARR_DIMS(array);
+	Oid			element_type = AARR_ELEMTYPE(array);
+	int			i;
+	array_iter	iter;
+	int			nelems;
+	Datum	   *elems;
+	TypeCacheEntry *typentry;
+	TypeCacheEntry *typentry_element;
+
+	/* Get information about range type; note column might be a domain */
+	typentry = range_get_typcache(fcinfo, getBaseType(element_type));
+
+	if (typentry->typtype != TYPTYPE_RANGE)
+		ereport(ERROR,
+				(errcode(ERRCODE_DATATYPE_MISMATCH),
+				 errmsg("expected array of ranges")));
+
+	ndims = AARR_NDIM(array);
+	dims = AARR_DIMS(array);
+	nelems = ArrayGetNItems(ndims, dims);
+
+	elems = (Datum *) palloc(nelems * sizeof(Datum));
+
+	array_iter_setup(&iter, array);
+
+	for (i = 0; i < nelems; i++)
+	{
+		Datum		itemvalue;
+		bool		isnull;
+		RangeBound	lower;
+		RangeBound	upper;
+		bool		empty;
+
+		/* Get source element, checking for NULL */
+		itemvalue = array_iter_next(&iter, &isnull, i,
+									typentry->typlen, typentry->typbyval,
+									typentry->typalign);
+
+		Assert(!isnull);
+
+		range_deserialize(typentry, (RangeType *) itemvalue, &lower, &upper, &empty);
+		elems[i] = lower.val;
+	}
+
+	typentry_element = typentry->rngelemtype;
+
+	PG_RETURN_ARRAYTYPE_P(construct_array(elems, nelems,
+						  typentry_element->type_id,
+						  typentry_element->typlen,
+						  typentry_element->typbyval,
+						  typentry_element->typalign));
+}
+
+Datum
+ranges_upper_bounds(PG_FUNCTION_ARGS)
+{
+	AnyArrayType *array = PG_GETARG_ANY_ARRAY_P(0);
+	int			ndims = AARR_NDIM(array);
+	int		   *dims = AARR_DIMS(array);
+	Oid			element_type = AARR_ELEMTYPE(array);
+	int			i;
+	array_iter	iter;
+	int			nelems;
+	Datum	   *elems;
+	TypeCacheEntry *typentry;
+	TypeCacheEntry *typentry_element;
+
+	/* Get information about range type; note column might be a domain */
+	typentry = range_get_typcache(fcinfo, getBaseType(element_type));
+
+	if (typentry->typtype != TYPTYPE_RANGE)
+		ereport(ERROR,
+				(errcode(ERRCODE_DATATYPE_MISMATCH),
+				 errmsg("expected array of ranges")));
+
+	ndims = AARR_NDIM(array);
+	dims = AARR_DIMS(array);
+	nelems = ArrayGetNItems(ndims, dims);
+
+	elems = (Datum *) palloc(nelems * sizeof(Datum));
+
+	array_iter_setup(&iter, array);
+
+	for (i = 0; i < nelems; i++)
+	{
+		Datum		itemvalue;
+		bool		isnull;
+		RangeBound	lower;
+		RangeBound	upper;
+		bool		empty;
+
+		/* Get source element, checking for NULL */
+		itemvalue = array_iter_next(&iter, &isnull, i,
+									typentry->typlen, typentry->typbyval,
+									typentry->typalign);
+
+		Assert(!isnull);
+
+		range_deserialize(typentry, (RangeType *) itemvalue, &lower, &upper, &empty);
+		elems[i] = upper.val;
+	}
+
+	typentry_element = typentry->rngelemtype;
+
+	PG_RETURN_ARRAYTYPE_P(construct_array(elems, nelems,
+						  typentry_element->type_id,
+						  typentry_element->typlen,
+						  typentry_element->typbyval,
+						  typentry_element->typalign));
+}
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 86eb8e8c58a..161557e4cb2 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11891,4 +11891,14 @@
   prorettype => 'bytea', proargtypes => 'pg_brin_minmax_multi_summary',
   prosrc => 'brin_minmax_multi_summary_send' },
 
+{ oid => '9693', descr => 'lower bounds of ranges',
+  proname => 'ranges_lower_bounds', provolatile => 's',
+  prorettype => 'anyarray', proargtypes => 'anyarray',
+  prosrc => 'ranges_lower_bounds' },
+
+{ oid => '9694', descr => 'upper bounds of ranges',
+  proname => 'ranges_upper_bounds', provolatile => 's',
+  prorettype => 'anyarray', proargtypes => 'anyarray',
+  prosrc => 'ranges_upper_bounds' },
+
 ]
-- 
2.39.0

0003-rename-empty_range_frac-20230120.patchtext/plain; charset=UTF-8; name=0003-rename-empty_range_frac-20230120.patchDownload

From 600fed1faae369b8291ec3a7fc64f6ad8099c124 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Fri, 20 Jan 2023 22:40:28 +0100
Subject: [PATCH 3/3] rename empty_range_frac

---
 src/backend/catalog/system_views.sql | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index e3314550e32..ac99d41f808 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -257,7 +257,7 @@ CREATE VIEW pg_stats WITH (security_barrier) AS
             WHEN stakind3 = 6 THEN stanumbers3[1]
             WHEN stakind4 = 6 THEN stanumbers4[1]
             WHEN stakind5 = 6 THEN stanumbers5[1]
-        END AS empty_range_frac,
+        END AS range_empty_frac,
         CASE
             WHEN stakind1 = 7 THEN ranges_lower_bounds(stavalues1)
             WHEN stakind2 = 7 THEN ranges_lower_bounds(stavalues2)
-- 
2.39.0

0004-shorten_names-refresh_docs-tests-20230121.patchtext/plain; charset=UTF-8; name=0004-shorten_names-refresh_docs-tests-20230121.patchDownload

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index c8bd84c56e..e0851c52b4 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -9647,7 +9647,7 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
 
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
-       <structfield>empty_range_frac</structfield> <type>float4</type>
+       <structfield>range_empty_frac</structfield> <type>float4</type>
       </para>
       <para>
        Fraction of column entries whose values are empty ranges.
@@ -9657,13 +9657,21 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
 
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
-       <structfield>range_bounds_histograms</structfield> <type>anyarray</type>
+       <structfield>range_lower_histogram</structfield> <type>anyarray</type>
       </para>
       <para>
-       Histograms of lower and upper bounds of non-empty, non-null ranges,
-       combined into a single array of range values. The lower and upper bounds
-       of each value correspond to the histograms of lower and upper bounds
-       respectively. (Null for non-range types.)
+       A histogram of lower bounds of non-empty and non-null range values.
+       (Null for non-range types.)
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>range_upper_histogram</structfield> <type>anyarray</type>
+      </para>
+      <para>
+       A histogram of upper bounds of non-empty and non-null range values.
+       (Null for non-range types.)
       </para></entry>
      </row>
     </tbody>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index ac99d41f80..b3b9e64ae2 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -264,14 +264,14 @@ CREATE VIEW pg_stats WITH (security_barrier) AS
             WHEN stakind3 = 7 THEN ranges_lower_bounds(stavalues3)
             WHEN stakind4 = 7 THEN ranges_lower_bounds(stavalues4)
             WHEN stakind5 = 7 THEN ranges_lower_bounds(stavalues5)
-            END AS range_lower_bounds_histograms,
+            END AS range_lower_histogram,
         CASE
             WHEN stakind1 = 7 THEN ranges_upper_bounds(stavalues1)
             WHEN stakind2 = 7 THEN ranges_upper_bounds(stavalues2)
             WHEN stakind3 = 7 THEN ranges_upper_bounds(stavalues3)
             WHEN stakind4 = 7 THEN ranges_upper_bounds(stavalues4)
             WHEN stakind5 = 7 THEN ranges_upper_bounds(stavalues5)
-            END AS range_upper_bounds_histograms
+            END AS range_upper_histogram
     FROM pg_statistic s JOIN pg_class c ON (c.oid = s.starelid)
          JOIN pg_attribute a ON (c.oid = attrelid AND attnum = s.staattnum)
          LEFT JOIN pg_namespace n ON (n.oid = c.relnamespace)
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index ced5933271..26087dfdf5 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2452,15 +2452,23 @@ pg_stats| SELECT n.nspname AS schemaname,
             WHEN (s.stakind4 = 6) THEN s.stanumbers4[1]
             WHEN (s.stakind5 = 6) THEN s.stanumbers5[1]
             ELSE NULL::real
-        END AS empty_range_frac,
+        END AS range_empty_frac,
         CASE
-            WHEN (s.stakind1 = 7) THEN s.stavalues1
-            WHEN (s.stakind2 = 7) THEN s.stavalues2
-            WHEN (s.stakind3 = 7) THEN s.stavalues3
-            WHEN (s.stakind4 = 7) THEN s.stavalues4
-            WHEN (s.stakind5 = 7) THEN s.stavalues5
+            WHEN (s.stakind1 = 7) THEN ranges_lower_bounds(s.stavalues1)
+            WHEN (s.stakind2 = 7) THEN ranges_lower_bounds(s.stavalues2)
+            WHEN (s.stakind3 = 7) THEN ranges_lower_bounds(s.stavalues3)
+            WHEN (s.stakind4 = 7) THEN ranges_lower_bounds(s.stavalues4)
+            WHEN (s.stakind5 = 7) THEN ranges_lower_bounds(s.stavalues5)
             ELSE NULL::anyarray
-        END AS range_bounds_histograms
+        END AS range_lower_histogram,
+        CASE
+            WHEN (s.stakind1 = 7) THEN ranges_upper_bounds(s.stavalues1)
+            WHEN (s.stakind2 = 7) THEN ranges_upper_bounds(s.stavalues2)
+            WHEN (s.stakind3 = 7) THEN ranges_upper_bounds(s.stavalues3)
+            WHEN (s.stakind4 = 7) THEN ranges_upper_bounds(s.stavalues4)
+            WHEN (s.stakind5 = 7) THEN ranges_upper_bounds(s.stavalues5)
+            ELSE NULL::anyarray
+        END AS range_upper_histogram
    FROM (((pg_statistic s
      JOIN pg_class c ON ((c.oid = s.starelid)))
      JOIN pg_attribute a ON (((c.oid = a.attrelid) AND (a.attnum = s.staattnum))))

Tomas Vondra

tomas.vondra@enterprisedb.com

almost 3 years ago

In reply to: Egor Rogov (#8)

2 attachment(s)

Re: pg_stats and range statistics

On 1/21/23 19:53, Egor Rogov wrote:

Hi Tomas,

On 21.01.2023 00:50, Tomas Vondra wrote:

Hi Egor,

While reviewing a patch improving join estimates for ranges [1] I
realized we don't show stats for ranges in pg_stats, and I recalled we
had this patch.

I rebased the v2, and I decided to took a stab at showing separate
histograms for lower/upper histogram bounds. I believe it makes it way
more readable, which is what pg_stats is about IMHO.

Thanks for looking into this.

I have to admit it looks much better this way, so +1.

OK, good to hear.

This simply adds two functions, accepting/producing anyarray - one for
lower bounds, one for upper bounds. I don't think it can be done with a
plain subquery (or at least I don't know how).

Anyarray is an alien to SQL, so functions are well justified here. What
makes me a bit uneasy is two almost identical functions. Should we
consider other options like a function with an additional parameter or a
function returning an array of bounds arrays (which is somewhat
wasteful, but probably it doesn't matter much here)?

I thought about that, but I think the alternatives (e.g. a single
function with a parameter determining which boundary to return). But I
don't think it's better.

Moreover, I think this is pretty similar to lower/upper, which already
work on range values. So if we have separate functions for that, we
should do the same thing here.

I renamed the functions to ranges_lower/ranges_upper, but maybe why not
to even call the functions lower/upper too?

The main trouble with the function I can think of is that we only have
anyarray type, not anyrangearray. So the functions will get called for
arbitrary array, and the check that it's array of ranges happens inside.
I'm not sure if that's a good or bad idea, or what would it take to add
a new polymorphic type ...

For now I at least kept "ranges_" to make it less likely.

Finally, it renames the empty_range_frac to start with range_, per the
earlier discussion. I wonder if the new column names for lower/upper
bounds (range_lower_bounds_histograms/range_upper_bounds_histograms) are
too long ...

It seems so. The ending -s should be left out since it's a single
histogram now. And I think that
range_lower_histogram/range_upper_histogram are descriptive enough.

I'm adding one more patch to shorten the column names, refresh the docs,
and make 'make check' happy (unfortunately, we have to edit
src/regress/expected/rules.out every time pg_stats definition changes).

Thanks. I noticed the docs were added to pg_user_mapping by mistake, not
to pg_stats. So I fixed that, and I also added the new functions.

Finally, I reordered the fields a bit - moved range_empty_frac to keep
the histogram fields together.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

0001-Display-length-and-bounds-histograms-in-pg_-20230122.patchtext/x-patch; charset=UTF-8; name=0001-Display-length-and-bounds-histograms-in-pg_-20230122.patchDownload

From 8049d2b7e5d1636d5fb2b7d421d6b29a39389fb3 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Fri, 20 Jan 2023 20:50:41 +0100
Subject: [PATCH 1/2] Display length and bounds histograms in pg_stats

Values corresponding to STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM and
STATISTIC_KIND_BOUNDS_HISTOGRAM were not exposed to pg_stats when these
slot kinds were introduced in 918eee0c49.

This commit adds the missing fields to pg_stats.

TODO: catalog version bump
---
 doc/src/sgml/catalogs.sgml                    |  40 ++++++
 src/backend/catalog/system_views.sql          |  30 ++++-
 src/backend/utils/adt/rangetypes_typanalyze.c | 118 ++++++++++++++++++
 src/include/catalog/pg_proc.dat               |  10 ++
 src/include/catalog/pg_statistic.h            |   3 +
 src/test/regress/expected/rules.out           |  34 ++++-
 6 files changed, 233 insertions(+), 2 deletions(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index c1e4048054..e0851c52b4 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -9634,6 +9634,46 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
        User mapping specific options, as <quote>keyword=value</quote> strings
       </para></entry>
      </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>range_length_histogram</structfield> <type>anyarray</type>
+      </para>
+      <para>
+       A histogram of the lengths of non-empty and non-null range values of a
+       range type column. (Null for non-range types.)
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>range_empty_frac</structfield> <type>float4</type>
+      </para>
+      <para>
+       Fraction of column entries whose values are empty ranges.
+       (Null for non-range types.)
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>range_lower_histogram</structfield> <type>anyarray</type>
+      </para>
+      <para>
+       A histogram of lower bounds of non-empty and non-null range values.
+       (Null for non-range types.)
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>range_upper_histogram</structfield> <type>anyarray</type>
+      </para>
+      <para>
+       A histogram of upper bounds of non-empty and non-null range values.
+       (Null for non-range types.)
+      </para></entry>
+     </row>
     </tbody>
    </tgroup>
   </table>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 8608e3fa5b..b3b9e64ae2 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -243,7 +243,35 @@ CREATE VIEW pg_stats WITH (security_barrier) AS
             WHEN stakind3 = 5 THEN stanumbers3
             WHEN stakind4 = 5 THEN stanumbers4
             WHEN stakind5 = 5 THEN stanumbers5
-        END AS elem_count_histogram
+        END AS elem_count_histogram,
+        CASE
+            WHEN stakind1 = 6 THEN stavalues1
+            WHEN stakind2 = 6 THEN stavalues2
+            WHEN stakind3 = 6 THEN stavalues3
+            WHEN stakind4 = 6 THEN stavalues4
+            WHEN stakind5 = 6 THEN stavalues5
+        END AS range_length_histogram,
+        CASE
+            WHEN stakind1 = 6 THEN stanumbers1[1]
+            WHEN stakind2 = 6 THEN stanumbers2[1]
+            WHEN stakind3 = 6 THEN stanumbers3[1]
+            WHEN stakind4 = 6 THEN stanumbers4[1]
+            WHEN stakind5 = 6 THEN stanumbers5[1]
+        END AS range_empty_frac,
+        CASE
+            WHEN stakind1 = 7 THEN ranges_lower_bounds(stavalues1)
+            WHEN stakind2 = 7 THEN ranges_lower_bounds(stavalues2)
+            WHEN stakind3 = 7 THEN ranges_lower_bounds(stavalues3)
+            WHEN stakind4 = 7 THEN ranges_lower_bounds(stavalues4)
+            WHEN stakind5 = 7 THEN ranges_lower_bounds(stavalues5)
+            END AS range_lower_histogram,
+        CASE
+            WHEN stakind1 = 7 THEN ranges_upper_bounds(stavalues1)
+            WHEN stakind2 = 7 THEN ranges_upper_bounds(stavalues2)
+            WHEN stakind3 = 7 THEN ranges_upper_bounds(stavalues3)
+            WHEN stakind4 = 7 THEN ranges_upper_bounds(stavalues4)
+            WHEN stakind5 = 7 THEN ranges_upper_bounds(stavalues5)
+            END AS range_upper_histogram
     FROM pg_statistic s JOIN pg_class c ON (c.oid = s.starelid)
          JOIN pg_attribute a ON (c.oid = attrelid AND attnum = s.staattnum)
          LEFT JOIN pg_namespace n ON (n.oid = c.relnamespace)
diff --git a/src/backend/utils/adt/rangetypes_typanalyze.c b/src/backend/utils/adt/rangetypes_typanalyze.c
index 86810a1a6e..a0097282fc 100644
--- a/src/backend/utils/adt/rangetypes_typanalyze.c
+++ b/src/backend/utils/adt/rangetypes_typanalyze.c
@@ -26,6 +26,8 @@
 
 #include "catalog/pg_operator.h"
 #include "commands/vacuum.h"
+#include "utils/array.h"
+#include "utils/arrayaccess.h"
 #include "utils/float.h"
 #include "utils/fmgrprotos.h"
 #include "utils/lsyscache.h"
@@ -427,3 +429,119 @@ compute_range_stats(VacAttrStats *stats, AnalyzeAttrFetchFunc fetchfunc,
 	 * hashtable should also go away, as it used a child memory context.
 	 */
 }
+
+Datum
+ranges_lower_bounds(PG_FUNCTION_ARGS)
+{
+	AnyArrayType *array = PG_GETARG_ANY_ARRAY_P(0);
+	int			ndims = AARR_NDIM(array);
+	int		   *dims = AARR_DIMS(array);
+	Oid			element_type = AARR_ELEMTYPE(array);
+	int			i;
+	array_iter	iter;
+	int			nelems;
+	Datum	   *elems;
+	TypeCacheEntry *typentry;
+	TypeCacheEntry *typentry_element;
+
+	/* Get information about range type; note column might be a domain */
+	typentry = range_get_typcache(fcinfo, getBaseType(element_type));
+
+	if (typentry->typtype != TYPTYPE_RANGE)
+		ereport(ERROR,
+				(errcode(ERRCODE_DATATYPE_MISMATCH),
+				 errmsg("expected array of ranges")));
+
+	ndims = AARR_NDIM(array);
+	dims = AARR_DIMS(array);
+	nelems = ArrayGetNItems(ndims, dims);
+
+	elems = (Datum *) palloc(nelems * sizeof(Datum));
+
+	array_iter_setup(&iter, array);
+
+	for (i = 0; i < nelems; i++)
+	{
+		Datum		itemvalue;
+		bool		isnull;
+		RangeBound	lower;
+		RangeBound	upper;
+		bool		empty;
+
+		/* Get source element, checking for NULL */
+		itemvalue = array_iter_next(&iter, &isnull, i,
+									typentry->typlen, typentry->typbyval,
+									typentry->typalign);
+
+		Assert(!isnull);
+
+		range_deserialize(typentry, (RangeType *) itemvalue, &lower, &upper, &empty);
+		elems[i] = lower.val;
+	}
+
+	typentry_element = typentry->rngelemtype;
+
+	PG_RETURN_ARRAYTYPE_P(construct_array(elems, nelems,
+						  typentry_element->type_id,
+						  typentry_element->typlen,
+						  typentry_element->typbyval,
+						  typentry_element->typalign));
+}
+
+Datum
+ranges_upper_bounds(PG_FUNCTION_ARGS)
+{
+	AnyArrayType *array = PG_GETARG_ANY_ARRAY_P(0);
+	int			ndims = AARR_NDIM(array);
+	int		   *dims = AARR_DIMS(array);
+	Oid			element_type = AARR_ELEMTYPE(array);
+	int			i;
+	array_iter	iter;
+	int			nelems;
+	Datum	   *elems;
+	TypeCacheEntry *typentry;
+	TypeCacheEntry *typentry_element;
+
+	/* Get information about range type; note column might be a domain */
+	typentry = range_get_typcache(fcinfo, getBaseType(element_type));
+
+	if (typentry->typtype != TYPTYPE_RANGE)
+		ereport(ERROR,
+				(errcode(ERRCODE_DATATYPE_MISMATCH),
+				 errmsg("expected array of ranges")));
+
+	ndims = AARR_NDIM(array);
+	dims = AARR_DIMS(array);
+	nelems = ArrayGetNItems(ndims, dims);
+
+	elems = (Datum *) palloc(nelems * sizeof(Datum));
+
+	array_iter_setup(&iter, array);
+
+	for (i = 0; i < nelems; i++)
+	{
+		Datum		itemvalue;
+		bool		isnull;
+		RangeBound	lower;
+		RangeBound	upper;
+		bool		empty;
+
+		/* Get source element, checking for NULL */
+		itemvalue = array_iter_next(&iter, &isnull, i,
+									typentry->typlen, typentry->typbyval,
+									typentry->typalign);
+
+		Assert(!isnull);
+
+		range_deserialize(typentry, (RangeType *) itemvalue, &lower, &upper, &empty);
+		elems[i] = upper.val;
+	}
+
+	typentry_element = typentry->rngelemtype;
+
+	PG_RETURN_ARRAYTYPE_P(construct_array(elems, nelems,
+						  typentry_element->type_id,
+						  typentry_element->typlen,
+						  typentry_element->typbyval,
+						  typentry_element->typalign));
+}
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 86eb8e8c58..161557e4cb 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11891,4 +11891,14 @@
   prorettype => 'bytea', proargtypes => 'pg_brin_minmax_multi_summary',
   prosrc => 'brin_minmax_multi_summary_send' },
 
+{ oid => '9693', descr => 'lower bounds of ranges',
+  proname => 'ranges_lower_bounds', provolatile => 's',
+  prorettype => 'anyarray', proargtypes => 'anyarray',
+  prosrc => 'ranges_lower_bounds' },
+
+{ oid => '9694', descr => 'upper bounds of ranges',
+  proname => 'ranges_upper_bounds', provolatile => 's',
+  prorettype => 'anyarray', proargtypes => 'anyarray',
+  prosrc => 'ranges_upper_bounds' },
+
 ]
diff --git a/src/include/catalog/pg_statistic.h b/src/include/catalog/pg_statistic.h
index 8770c5b4c6..10401dece0 100644
--- a/src/include/catalog/pg_statistic.h
+++ b/src/include/catalog/pg_statistic.h
@@ -152,6 +152,9 @@ DECLARE_FOREIGN_KEY((starelid, staattnum), pg_attribute, (attrelid, attnum));
  * data "kind" will appear in any particular slot.  Instead, search the
  * stakind fields to see if the desired data is available.  (The standard
  * function get_attstatsslot() may be used for this.)
+ *
+ * Note: The pg_stats view needs to be modified whenever a new slot kind is
+ * added to core.
  */
 
 /*
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index e7a2f5856a..26087dfdf5 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2436,7 +2436,39 @@ pg_stats| SELECT n.nspname AS schemaname,
             WHEN (s.stakind4 = 5) THEN s.stanumbers4
             WHEN (s.stakind5 = 5) THEN s.stanumbers5
             ELSE NULL::real[]
-        END AS elem_count_histogram
+        END AS elem_count_histogram,
+        CASE
+            WHEN (s.stakind1 = 6) THEN s.stavalues1
+            WHEN (s.stakind2 = 6) THEN s.stavalues2
+            WHEN (s.stakind3 = 6) THEN s.stavalues3
+            WHEN (s.stakind4 = 6) THEN s.stavalues4
+            WHEN (s.stakind5 = 6) THEN s.stavalues5
+            ELSE NULL::anyarray
+        END AS range_length_histogram,
+        CASE
+            WHEN (s.stakind1 = 6) THEN s.stanumbers1[1]
+            WHEN (s.stakind2 = 6) THEN s.stanumbers2[1]
+            WHEN (s.stakind3 = 6) THEN s.stanumbers3[1]
+            WHEN (s.stakind4 = 6) THEN s.stanumbers4[1]
+            WHEN (s.stakind5 = 6) THEN s.stanumbers5[1]
+            ELSE NULL::real
+        END AS range_empty_frac,
+        CASE
+            WHEN (s.stakind1 = 7) THEN ranges_lower_bounds(s.stavalues1)
+            WHEN (s.stakind2 = 7) THEN ranges_lower_bounds(s.stavalues2)
+            WHEN (s.stakind3 = 7) THEN ranges_lower_bounds(s.stavalues3)
+            WHEN (s.stakind4 = 7) THEN ranges_lower_bounds(s.stavalues4)
+            WHEN (s.stakind5 = 7) THEN ranges_lower_bounds(s.stavalues5)
+            ELSE NULL::anyarray
+        END AS range_lower_histogram,
+        CASE
+            WHEN (s.stakind1 = 7) THEN ranges_upper_bounds(s.stavalues1)
+            WHEN (s.stakind2 = 7) THEN ranges_upper_bounds(s.stavalues2)
+            WHEN (s.stakind3 = 7) THEN ranges_upper_bounds(s.stavalues3)
+            WHEN (s.stakind4 = 7) THEN ranges_upper_bounds(s.stavalues4)
+            WHEN (s.stakind5 = 7) THEN ranges_upper_bounds(s.stavalues5)
+            ELSE NULL::anyarray
+        END AS range_upper_histogram
    FROM (((pg_statistic s
      JOIN pg_class c ON ((c.oid = s.starelid)))
      JOIN pg_attribute a ON (((c.oid = a.attrelid) AND (a.attnum = s.staattnum))))
-- 
2.39.0

0002-tweaks-20230122.patchtext/x-patch; charset=UTF-8; name=0002-tweaks-20230122.patchDownload

From 9e95cf8025f37b819a3396fa24e0ccac92f18ce0 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@2ndquadrant.com>
Date: Sun, 22 Jan 2023 17:28:30 +0100
Subject: [PATCH 2/2] tweaks

---
 doc/src/sgml/catalogs.sgml                    | 40 -------------------
 doc/src/sgml/func.sgml                        | 36 +++++++++++++++++
 doc/src/sgml/system-views.sgml                | 40 +++++++++++++++++++
 src/backend/catalog/system_views.sql          | 34 ++++++++--------
 src/backend/utils/adt/rangetypes_typanalyze.c |  6 +++
 src/include/catalog/pg_proc.dat               |  4 +-
 src/include/catalog/pg_statistic.h            |  3 --
 src/test/regress/expected/rules.out           | 36 ++++++++---------
 8 files changed, 119 insertions(+), 80 deletions(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index e0851c52b4..c1e4048054 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -9634,46 +9634,6 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
        User mapping specific options, as <quote>keyword=value</quote> strings
       </para></entry>
      </row>
-
-     <row>
-      <entry role="catalog_table_entry"><para role="column_definition">
-       <structfield>range_length_histogram</structfield> <type>anyarray</type>
-      </para>
-      <para>
-       A histogram of the lengths of non-empty and non-null range values of a
-       range type column. (Null for non-range types.)
-      </para></entry>
-     </row>
-
-     <row>
-      <entry role="catalog_table_entry"><para role="column_definition">
-       <structfield>range_empty_frac</structfield> <type>float4</type>
-      </para>
-      <para>
-       Fraction of column entries whose values are empty ranges.
-       (Null for non-range types.)
-      </para></entry>
-     </row>
-
-     <row>
-      <entry role="catalog_table_entry"><para role="column_definition">
-       <structfield>range_lower_histogram</structfield> <type>anyarray</type>
-      </para>
-      <para>
-       A histogram of lower bounds of non-empty and non-null range values.
-       (Null for non-range types.)
-      </para></entry>
-     </row>
-
-     <row>
-      <entry role="catalog_table_entry"><para role="column_definition">
-       <structfield>range_upper_histogram</structfield> <type>anyarray</type>
-      </para>
-      <para>
-       A histogram of upper bounds of non-empty and non-null range values.
-       (Null for non-range types.)
-      </para></entry>
-     </row>
     </tbody>
    </tgroup>
   </table>
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index b8dac9ef46..f8eb488b2c 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -19357,6 +19357,24 @@ SELECT NULLIF(value, '(none)') ...
        </para></entry>
       </row>
 
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>ranges_lower</primary>
+        </indexterm>
+        <function>ranges_lower</function> ( <type>anyarray</type> )
+        <returnvalue>anyarray</returnvalue>
+       </para>
+       <para>
+        Extracts lower bounds of ranges in the array (<literal>NULL</literal> if
+        the range is empty or the lower bound is infinite).
+       </para>
+       <para>
+        <literal>lower(numrange(1.1,2.2))</literal>
+        <returnvalue>1.1</returnvalue>
+       </para></entry>
+      </row>
+
       <row>
        <entry role="func_table_entry"><para role="func_signature">
         <indexterm>
@@ -19375,6 +19393,24 @@ SELECT NULLIF(value, '(none)') ...
        </para></entry>
       </row>
 
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>ranges_upper</primary>
+        </indexterm>
+        <function>ranges_upper</function> ( <type>anyarray</type> )
+        <returnvalue>anyarray</returnvalue>
+       </para>
+       <para>
+        Extracts upper bounds of ranges (<literal>NULL</literal> if the
+        range is empty or the upper bound is infinite).
+       </para>
+       <para>
+        <literal>upper(numrange(1.1,2.2))</literal>
+        <returnvalue>2.2</returnvalue>
+       </para></entry>
+      </row>
+
       <row>
        <entry role="func_table_entry"><para role="func_signature">
         <indexterm>
diff --git a/doc/src/sgml/system-views.sgml b/doc/src/sgml/system-views.sgml
index 7c8fc3f654..a48d775467 100644
--- a/doc/src/sgml/system-views.sgml
+++ b/doc/src/sgml/system-views.sgml
@@ -3783,6 +3783,46 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
        non-null elements.  (Null for scalar types.)
       </para></entry>
      </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>range_length_histogram</structfield> <type>anyarray</type>
+      </para>
+      <para>
+       A histogram of the lengths of non-empty and non-null range values of a
+       range type column. (Null for non-range types.)
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>range_empty_frac</structfield> <type>float4</type>
+      </para>
+      <para>
+       Fraction of column entries whose values are empty ranges.
+       (Null for non-range types.)
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>range_lower_histogram</structfield> <type>anyarray</type>
+      </para>
+      <para>
+       A histogram of lower bounds of non-empty and non-null range values.
+       (Null for non-range types.)
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>range_upper_histogram</structfield> <type>anyarray</type>
+      </para>
+      <para>
+       A histogram of upper bounds of non-empty and non-null range values.
+       (Null for non-range types.)
+      </para></entry>
+     </row>
     </tbody>
    </tgroup>
   </table>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index b3b9e64ae2..00e91e5d93 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -244,13 +244,6 @@ CREATE VIEW pg_stats WITH (security_barrier) AS
             WHEN stakind4 = 5 THEN stanumbers4
             WHEN stakind5 = 5 THEN stanumbers5
         END AS elem_count_histogram,
-        CASE
-            WHEN stakind1 = 6 THEN stavalues1
-            WHEN stakind2 = 6 THEN stavalues2
-            WHEN stakind3 = 6 THEN stavalues3
-            WHEN stakind4 = 6 THEN stavalues4
-            WHEN stakind5 = 6 THEN stavalues5
-        END AS range_length_histogram,
         CASE
             WHEN stakind1 = 6 THEN stanumbers1[1]
             WHEN stakind2 = 6 THEN stanumbers2[1]
@@ -259,18 +252,25 @@ CREATE VIEW pg_stats WITH (security_barrier) AS
             WHEN stakind5 = 6 THEN stanumbers5[1]
         END AS range_empty_frac,
         CASE
-            WHEN stakind1 = 7 THEN ranges_lower_bounds(stavalues1)
-            WHEN stakind2 = 7 THEN ranges_lower_bounds(stavalues2)
-            WHEN stakind3 = 7 THEN ranges_lower_bounds(stavalues3)
-            WHEN stakind4 = 7 THEN ranges_lower_bounds(stavalues4)
-            WHEN stakind5 = 7 THEN ranges_lower_bounds(stavalues5)
+            WHEN stakind1 = 6 THEN stavalues1
+            WHEN stakind2 = 6 THEN stavalues2
+            WHEN stakind3 = 6 THEN stavalues3
+            WHEN stakind4 = 6 THEN stavalues4
+            WHEN stakind5 = 6 THEN stavalues5
+        END AS range_length_histogram,
+        CASE
+            WHEN stakind1 = 7 THEN ranges_lower(stavalues1)
+            WHEN stakind2 = 7 THEN ranges_lower(stavalues2)
+            WHEN stakind3 = 7 THEN ranges_lower(stavalues3)
+            WHEN stakind4 = 7 THEN ranges_lower(stavalues4)
+            WHEN stakind5 = 7 THEN ranges_lower(stavalues5)
             END AS range_lower_histogram,
         CASE
-            WHEN stakind1 = 7 THEN ranges_upper_bounds(stavalues1)
-            WHEN stakind2 = 7 THEN ranges_upper_bounds(stavalues2)
-            WHEN stakind3 = 7 THEN ranges_upper_bounds(stavalues3)
-            WHEN stakind4 = 7 THEN ranges_upper_bounds(stavalues4)
-            WHEN stakind5 = 7 THEN ranges_upper_bounds(stavalues5)
+            WHEN stakind1 = 7 THEN ranges_upper(stavalues1)
+            WHEN stakind2 = 7 THEN ranges_upper(stavalues2)
+            WHEN stakind3 = 7 THEN ranges_upper(stavalues3)
+            WHEN stakind4 = 7 THEN ranges_upper(stavalues4)
+            WHEN stakind5 = 7 THEN ranges_upper(stavalues5)
             END AS range_upper_histogram
     FROM pg_statistic s JOIN pg_class c ON (c.oid = s.starelid)
          JOIN pg_attribute a ON (c.oid = attrelid AND attnum = s.staattnum)
diff --git a/src/backend/utils/adt/rangetypes_typanalyze.c b/src/backend/utils/adt/rangetypes_typanalyze.c
index a0097282fc..982f23ae22 100644
--- a/src/backend/utils/adt/rangetypes_typanalyze.c
+++ b/src/backend/utils/adt/rangetypes_typanalyze.c
@@ -430,6 +430,9 @@ compute_range_stats(VacAttrStats *stats, AnalyzeAttrFetchFunc fetchfunc,
 	 */
 }
 
+/*
+ * ranges_lower_bounds() -- return array of lower bounds for ranges
+ */
 Datum
 ranges_lower_bounds(PG_FUNCTION_ARGS)
 {
@@ -488,6 +491,9 @@ ranges_lower_bounds(PG_FUNCTION_ARGS)
 						  typentry_element->typalign));
 }
 
+/*
+ * ranges_upper_bounds() -- return array of upper bounds for ranges
+ */
 Datum
 ranges_upper_bounds(PG_FUNCTION_ARGS)
 {
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 161557e4cb..c53e9fc669 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11892,12 +11892,12 @@
   prosrc => 'brin_minmax_multi_summary_send' },
 
 { oid => '9693', descr => 'lower bounds of ranges',
-  proname => 'ranges_lower_bounds', provolatile => 's',
+  proname => 'ranges_lower', provolatile => 's',
   prorettype => 'anyarray', proargtypes => 'anyarray',
   prosrc => 'ranges_lower_bounds' },
 
 { oid => '9694', descr => 'upper bounds of ranges',
-  proname => 'ranges_upper_bounds', provolatile => 's',
+  proname => 'ranges_upper', provolatile => 's',
   prorettype => 'anyarray', proargtypes => 'anyarray',
   prosrc => 'ranges_upper_bounds' },
 
diff --git a/src/include/catalog/pg_statistic.h b/src/include/catalog/pg_statistic.h
index 10401dece0..8770c5b4c6 100644
--- a/src/include/catalog/pg_statistic.h
+++ b/src/include/catalog/pg_statistic.h
@@ -152,9 +152,6 @@ DECLARE_FOREIGN_KEY((starelid, staattnum), pg_attribute, (attrelid, attnum));
  * data "kind" will appear in any particular slot.  Instead, search the
  * stakind fields to see if the desired data is available.  (The standard
  * function get_attstatsslot() may be used for this.)
- *
- * Note: The pg_stats view needs to be modified whenever a new slot kind is
- * added to core.
  */
 
 /*
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 26087dfdf5..ea4d721e98 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2437,14 +2437,6 @@ pg_stats| SELECT n.nspname AS schemaname,
             WHEN (s.stakind5 = 5) THEN s.stanumbers5
             ELSE NULL::real[]
         END AS elem_count_histogram,
-        CASE
-            WHEN (s.stakind1 = 6) THEN s.stavalues1
-            WHEN (s.stakind2 = 6) THEN s.stavalues2
-            WHEN (s.stakind3 = 6) THEN s.stavalues3
-            WHEN (s.stakind4 = 6) THEN s.stavalues4
-            WHEN (s.stakind5 = 6) THEN s.stavalues5
-            ELSE NULL::anyarray
-        END AS range_length_histogram,
         CASE
             WHEN (s.stakind1 = 6) THEN s.stanumbers1[1]
             WHEN (s.stakind2 = 6) THEN s.stanumbers2[1]
@@ -2454,19 +2446,27 @@ pg_stats| SELECT n.nspname AS schemaname,
             ELSE NULL::real
         END AS range_empty_frac,
         CASE
-            WHEN (s.stakind1 = 7) THEN ranges_lower_bounds(s.stavalues1)
-            WHEN (s.stakind2 = 7) THEN ranges_lower_bounds(s.stavalues2)
-            WHEN (s.stakind3 = 7) THEN ranges_lower_bounds(s.stavalues3)
-            WHEN (s.stakind4 = 7) THEN ranges_lower_bounds(s.stavalues4)
-            WHEN (s.stakind5 = 7) THEN ranges_lower_bounds(s.stavalues5)
+            WHEN (s.stakind1 = 6) THEN s.stavalues1
+            WHEN (s.stakind2 = 6) THEN s.stavalues2
+            WHEN (s.stakind3 = 6) THEN s.stavalues3
+            WHEN (s.stakind4 = 6) THEN s.stavalues4
+            WHEN (s.stakind5 = 6) THEN s.stavalues5
+            ELSE NULL::anyarray
+        END AS range_length_histogram,
+        CASE
+            WHEN (s.stakind1 = 7) THEN ranges_lower(s.stavalues1)
+            WHEN (s.stakind2 = 7) THEN ranges_lower(s.stavalues2)
+            WHEN (s.stakind3 = 7) THEN ranges_lower(s.stavalues3)
+            WHEN (s.stakind4 = 7) THEN ranges_lower(s.stavalues4)
+            WHEN (s.stakind5 = 7) THEN ranges_lower(s.stavalues5)
             ELSE NULL::anyarray
         END AS range_lower_histogram,
         CASE
-            WHEN (s.stakind1 = 7) THEN ranges_upper_bounds(s.stavalues1)
-            WHEN (s.stakind2 = 7) THEN ranges_upper_bounds(s.stavalues2)
-            WHEN (s.stakind3 = 7) THEN ranges_upper_bounds(s.stavalues3)
-            WHEN (s.stakind4 = 7) THEN ranges_upper_bounds(s.stavalues4)
-            WHEN (s.stakind5 = 7) THEN ranges_upper_bounds(s.stavalues5)
+            WHEN (s.stakind1 = 7) THEN ranges_upper(s.stavalues1)
+            WHEN (s.stakind2 = 7) THEN ranges_upper(s.stavalues2)
+            WHEN (s.stakind3 = 7) THEN ranges_upper(s.stavalues3)
+            WHEN (s.stakind4 = 7) THEN ranges_upper(s.stavalues4)
+            WHEN (s.stakind5 = 7) THEN ranges_upper(s.stavalues5)
             ELSE NULL::anyarray
         END AS range_upper_histogram
    FROM (((pg_statistic s
-- 
2.39.0

#10

Justin Pryzby

pryzby@telsasoft.com

almost 3 years ago

In reply to: Tomas Vondra (#9)

Re: pg_stats and range statistics

On Sun, Jan 22, 2023 at 07:19:41PM +0100, Tomas Vondra wrote:

On 1/21/23 19:53, Egor Rogov wrote:

Hi Tomas,
On 21.01.2023 00:50, Tomas Vondra wrote:

This simply adds two functions, accepting/producing anyarray - one for
lower bounds, one for upper bounds. I don't think it can be done with a
plain subquery (or at least I don't know how).

Anyarray is an alien to SQL, so functions are well justified here. What
makes me a bit uneasy is two almost identical functions. Should we
consider other options like a function with an additional parameter or a
function returning an array of bounds arrays (which is somewhat
wasteful, but probably it doesn't matter much here)?

I thought about that, but I think the alternatives (e.g. a single
function with a parameter determining which boundary to return). But I
don't think it's better.

What about a common function, maybe called like:

ranges_upper_bounds(PG_FUNCTION_ARGS)
{
AnyArrayType *array = PG_GETARG_ANY_ARRAY_P(0);
Oid element_type = AARR_ELEMTYPE(array);
TypeCacheEntry *typentry;

/* Get information about range type; note column might be a domain */
typentry = range_get_typcache(fcinfo, getBaseType(element_type));

return ranges_bounds_common(typentry, array, false);
}

That saves 40 LOC.

Shouldn't this add some sql tests ?

--
Justin

#11

Tomas Vondra

tomas.vondra@enterprisedb.com

almost 3 years ago

In reply to: Justin Pryzby (#10)

Re: pg_stats and range statistics

On 1/22/23 22:33, Justin Pryzby wrote:

On Sun, Jan 22, 2023 at 07:19:41PM +0100, Tomas Vondra wrote:

On 1/21/23 19:53, Egor Rogov wrote:

Hi Tomas,
On 21.01.2023 00:50, Tomas Vondra wrote:

This simply adds two functions, accepting/producing anyarray - one for
lower bounds, one for upper bounds. I don't think it can be done with a
plain subquery (or at least I don't know how).

Anyarray is an alien to SQL, so functions are well justified here. What
makes me a bit uneasy is two almost identical functions. Should we
consider other options like a function with an additional parameter or a
function returning an array of bounds arrays (which is somewhat
wasteful, but probably it doesn't matter much here)?

I thought about that, but I think the alternatives (e.g. a single
function with a parameter determining which boundary to return). But I
don't think it's better.

What about a common function, maybe called like:

ranges_upper_bounds(PG_FUNCTION_ARGS)
{
AnyArrayType *array = PG_GETARG_ANY_ARRAY_P(0);
Oid element_type = AARR_ELEMTYPE(array);
TypeCacheEntry *typentry;

/* Get information about range type; note column might be a domain */
typentry = range_get_typcache(fcinfo, getBaseType(element_type));

return ranges_bounds_common(typentry, array, false);
}

That saves 40 LOC.

Thanks, that's better. But I'm still not sure it's a good idea to add
function with anyarray argument, when we need it to be an array of
ranges ...

I wonder if we have other functions doing something similar, i.e.
accepting a polymorphic type and then imposing additional restrictions
on it.

Shouldn't this add some sql tests ?

Yeah, I guess we should have a couple tests calling these functions on
different range arrays.

This reminds me lower()/upper() have some extra rules about handling
empty ranges / infinite boundaries etc. These functions should behave
consistently (as if we called lower() in a loop) and I'm pretty sure
that's not the current state.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#12

Egor Rogov

e.rogov@postgrespro.ru

almost 3 years ago

In reply to: Tomas Vondra (#11)

Re: pg_stats and range statistics

Hi,

On 23.01.2023 02:21, Tomas Vondra wrote:

On 1/22/23 22:33, Justin Pryzby wrote:

On Sun, Jan 22, 2023 at 07:19:41PM +0100, Tomas Vondra wrote:

On 1/21/23 19:53, Egor Rogov wrote:

Hi Tomas,
On 21.01.2023 00:50, Tomas Vondra wrote:

This simply adds two functions, accepting/producing anyarray - one for
lower bounds, one for upper bounds. I don't think it can be done with a
plain subquery (or at least I don't know how).

Anyarray is an alien to SQL, so functions are well justified here. What
makes me a bit uneasy is two almost identical functions. Should we
consider other options like a function with an additional parameter or a
function returning an array of bounds arrays (which is somewhat
wasteful, but probably it doesn't matter much here)?

I thought about that, but I think the alternatives (e.g. a single
function with a parameter determining which boundary to return). But I
don't think it's better.

What about a common function, maybe called like:

ranges_upper_bounds(PG_FUNCTION_ARGS)
{
AnyArrayType *array = PG_GETARG_ANY_ARRAY_P(0);
Oid element_type = AARR_ELEMTYPE(array);
TypeCacheEntry *typentry;

/* Get information about range type; note column might be a domain */
typentry = range_get_typcache(fcinfo, getBaseType(element_type));

return ranges_bounds_common(typentry, array, false);
}

That saves 40 LOC.

Thanks, that's better. But I'm still not sure it's a good idea to add
function with anyarray argument, when we need it to be an array of
ranges ...

I wonder if we have other functions doing something similar, i.e.
accepting a polymorphic type and then imposing additional restrictions
on it.

I couldn't find such examples, but adding an adhoc polymorphic type just
doesn't look right for me. Besides, you'll end up adding not just
anyrangearray type, but also anymultirangearray,
anycompatiblerangearray, anycompatiblemultirangearray, and maybe their
"non"-counterparts like anynonrangearray, and all of these are not of
much use. And one day you may need an array of arrays or something...

I wonder if it's possible to teach SQL to work with anyarray type - at
runtime the actual type of anyarray elements is known, right? In fact,
unnest() alone is enough to eliminate the need of C functions altogether.

Shouldn't this add some sql tests ?

Yeah, I guess we should have a couple tests calling these functions on
different range arrays.

This reminds me lower()/upper() have some extra rules about handling
empty ranges / infinite boundaries etc. These functions should behave
consistently (as if we called lower() in a loop) and I'm pretty sure
that's not the current state.

I can try to tidy things up, but first we need to decide on the general
approach.

Show quoted text

regards

#13

Egor Rogov

e.rogov@postgrespro.ru

almost 3 years ago

In reply to: Egor Rogov (#12)

Re: pg_stats and range statistics

On 23.01.2023 13:01, Egor Rogov wrote:

On 23.01.2023 02:21, Tomas Vondra wrote:

On 1/22/23 22:33, Justin Pryzby wrote:

On Sun, Jan 22, 2023 at 07:19:41PM +0100, Tomas Vondra wrote:

On 1/21/23 19:53, Egor Rogov wrote:

Hi Tomas,
On 21.01.2023 00:50, Tomas Vondra wrote:

This simply adds two functions, accepting/producing anyarray -
one for
lower bounds, one for upper bounds. I don't think it can be done
with a
plain subquery (or at least I don't know how).

Anyarray is an alien to SQL, so functions are well justified here.
What
makes me a bit uneasy is two almost identical functions. Should we
consider other options like a function with an additional
parameter or a
function returning an array of bounds arrays (which is somewhat
wasteful, but probably it doesn't matter much here)?

I thought about that, but I think the alternatives (e.g. a single
function with a parameter determining which boundary to return). But I
don't think it's better.

What about a common function, maybe called like:

ranges_upper_bounds(PG_FUNCTION_ARGS)
{
     AnyArrayType *array = PG_GETARG_ANY_ARRAY_P(0);
     Oid         element_type = AARR_ELEMTYPE(array);
     TypeCacheEntry *typentry;

     /* Get information about range type; note column might be a
domain */
     typentry = range_get_typcache(fcinfo, getBaseType(element_type));

     return ranges_bounds_common(typentry, array, false);
}

That saves 40 LOC.

Thanks, that's better. But I'm still not sure it's a good idea to add
function with anyarray argument, when we need it to be an array of
ranges ...

I wonder if we have other functions doing something similar, i.e.
accepting a polymorphic type and then imposing additional restrictions
on it.

I couldn't find such examples, but adding an adhoc polymorphic type
just doesn't look right for me. Besides, you'll end up adding not just
anyrangearray type, but also anymultirangearray,
anycompatiblerangearray, anycompatiblemultirangearray, and maybe their
"non"-counterparts like anynonrangearray, and all of these are not of
much use. And one day you may need an array of arrays or something...

I wonder if it's possible to teach SQL to work with anyarray type - at
runtime the actual type of anyarray elements is known, right? In fact,
unnest() alone is enough to eliminate the need of C functions altogether.

When started to look at how we deal with anyarray columns, I came across
the following comment in parse_coerce.c for
enforce_generic_type_consistency():

* A special case is that we could see ANYARRAY as an actual_arg_type even
* when allow_poly is false (this is possible only because pg_statistic has
* columns shown as anyarray in the catalogs).

It makes me realize how anyarray as-a-real-type is specific to
pg_statistic. Even if it's possible to somehow postpone type inference
for this case from parse time to execute time, it clearly doesn't worth
the effort.

So, I am for the simplest possible approach, that is, the two proposed
functions ranges_upper(anyarray) and ranges_lower(anyarray). I am not
even sure if it's worth documenting them, as they are very
pg_statistic-specific and likely won't be useful for end users.

Show quoted text

Shouldn't this add some sql tests ?

Yeah, I guess we should have a couple tests calling these functions on
different range arrays.

This reminds me lower()/upper() have some extra rules about handling
empty ranges / infinite boundaries etc. These functions should behave
consistently (as if we called lower() in a loop) and I'm pretty sure
that's not the current state.

I can try to tidy things up, but first we need to decide on the
general approach.

#14

Gregory Stark (as CFM)

stark.cfm@gmail.com

almost 3 years ago

In reply to: Tomas Vondra (#11)

Re: pg_stats and range statistics

On Sun, 22 Jan 2023 at 18:22, Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

I wonder if we have other functions doing something similar, i.e.
accepting a polymorphic type and then imposing additional restrictions
on it.

Meh, there's things like array comparison functions that require both
arguments to be the same kind of arrays. And array_agg that requires
the elements to be the same type as the state array (ie, same type as
the first element). Not sure there are any taking just one specific
type though.

Shouldn't this add some sql tests ?

Yeah, I guess we should have a couple tests calling these functions on
different range arrays.

This reminds me lower()/upper() have some extra rules about handling
empty ranges / infinite boundaries etc. These functions should behave
consistently (as if we called lower() in a loop) and I'm pretty sure
that's not the current state.

Are we still waiting on these two items? Egor, do you think you'll
have a chance to work it for this month?

--
Gregory Stark
As Commitfest Manager

#15

Egor Rogov

e.rogov@postgrespro.ru

almost 3 years ago

In reply to: Gregory Stark (as CFM) (#14)

Re: pg_stats and range statistics

On 20.03.2023 22:27, Gregory Stark (as CFM) wrote:

On Sun, 22 Jan 2023 at 18:22, Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

I wonder if we have other functions doing something similar, i.e.
accepting a polymorphic type and then imposing additional restrictions
on it.

Meh, there's things like array comparison functions that require both
arguments to be the same kind of arrays. And array_agg that requires
the elements to be the same type as the state array (ie, same type as
the first element). Not sure there are any taking just one specific
type though.

Shouldn't this add some sql tests ?

Yeah, I guess we should have a couple tests calling these functions on
different range arrays.

This reminds me lower()/upper() have some extra rules about handling
empty ranges / infinite boundaries etc. These functions should behave
consistently (as if we called lower() in a loop) and I'm pretty sure
that's not the current state.

Are we still waiting on these two items? Egor, do you think you'll
have a chance to work it for this month?

I can try to tidy things up, but I'm not sure if we reached a consensus.

Do we stick with the ranges_upper(anyarray) and ranges_lower(anyarray)
functions? This approach is okay with me. Tomas, have you made up your mind?

Do we want to document these functions? They are very
pg_statistic-specific and won't be useful for end users imo.

#16

Tomas Vondra

tomas.vondra@enterprisedb.com

almost 3 years ago

In reply to: Egor Rogov (#15)

Re: pg_stats and range statistics

On 3/20/23 20:54, Egor Rogov wrote:

On 20.03.2023 22:27, Gregory Stark (as CFM) wrote:

On Sun, 22 Jan 2023 at 18:22, Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

I wonder if we have other functions doing something similar, i.e.
accepting a polymorphic type and then imposing additional restrictions
on it.

Meh, there's things like array comparison functions that require both
arguments to be the same kind of arrays. And array_agg that requires
the elements to be the same type as the state array (ie, same type as
the first element). Not sure there are any taking just one specific
type though.

Shouldn't this add some sql tests ?

Yeah, I guess we should have a couple tests calling these functions on
different range arrays.

This reminds me lower()/upper() have some extra rules about handling
empty ranges / infinite boundaries etc. These functions should behave
consistently (as if we called lower() in a loop) and I'm pretty sure
that's not the current state.

Are we still waiting on these two items? Egor, do you think you'll
have a chance to work it for this month?

I can try to tidy things up, but I'm not sure if we reached a consensus.

We don't have any objections, and that's probably the best consensus we
can get here, I guess ...

So if you could clean it up a bit, and do something about the two open
items I mentioned (a bunch of tests on different array, and behavior
consistent with lower/upper), that'd be great.

Do we stick with the ranges_upper(anyarray) and ranges_lower(anyarray)
functions? This approach is okay with me. Tomas, have you made up your
mind?

I think the function approach is fine, but in my January 22 message I
was wondering why we're not actually naming them simply lower/upper.

Do we want to document these functions? They are very
pg_statistic-specific and won't be useful for end users imo.

I don't see why not to document them. Sure, we're using them in a fairly
specific context, but I don't see why not to let people use them too
(which would be hard without docs).

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#17

Egor Rogov

e.rogov@postgrespro.ru

almost 3 years ago

In reply to: Tomas Vondra (#16)

1 attachment(s)

Re: pg_stats and range statistics

On 24.03.2023 01:46, Tomas Vondra wrote:

So if you could clean it up a bit, and do something about the two open
items I mentioned (a bunch of tests on different array,

I've added some tests to resgress/sql/rangetypes.sql, based on the same
dataset that is used to test lower() and upper().

and behavior
consistent with lower/upper),

Done. This required to switch from construct_array(), which doesn't
support NULLs, to construct_md_array(), which does. A nice side effect
is that now we also support multidimentional arrays.

I've moved a common part of ranges_lower_bounds() and
ranges_upper_bounds() to ranges_bounds_common(), following Justin's advice.

There is one thing I'm not sure what to do about. This check:

     if (typentry->typtype != TYPTYPE_RANGE)
         ereport(ERROR,
                 (errcode(ERRCODE_DATATYPE_MISMATCH),
                  errmsg("expected array of ranges")));

doesn't work, because the range_get_typcache() call errors out first
("type %u is not a range type"). The message doesn't look friendly
enough for user-faced SQL function. Should we duplicate
range_get_typcache's logic and replace the error message?

that'd be great.

Do we stick with the ranges_upper(anyarray) and ranges_lower(anyarray)
functions? This approach is okay with me. Tomas, have you made up your
mind?

I think the function approach is fine, but in my January 22 message I
was wondering why we're not actually naming them simply lower/upper.

I'd expect from lower(anyarray) function to return the lowest element in
the array. This name doesn't hint that the function takes an array of
ranges. So, ranges_ prefix seems justified to me.

Do we want to document these functions? They are very
pg_statistic-specific and won't be useful for end users imo.

I don't see why not to document them. Sure, we're using them in a fairly
specific context, but I don't see why not to let people use them too
(which would be hard without docs).

Okay. I've corrected the examples a bit.

The patch is attached.

Thanks,
Egor

Attachments:

pgstats_20230324.patchtext/plain; charset=UTF-8; name=pgstats_20230324.patchDownload

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 0987eb805a..c4d86e1679 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -19446,6 +19446,24 @@ SELECT NULLIF(value, '(none)') ...
        </para></entry>
       </row>
 
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>ranges_lower</primary>
+        </indexterm>
+        <function>ranges_lower</function> ( <type>anyarray</type> )
+        <returnvalue>anyarray</returnvalue>
+       </para>
+       <para>
+        Extracts lower bounds of ranges in the array (<literal>NULL</literal> if
+        the range is empty or the lower bound is infinite).
+       </para>
+       <para>
+        <literal>lower(ARRAY[numrange(1.1,2.2),numrange(3.3,4.4)])</literal>
+        <returnvalue>{1.1,3.3}</returnvalue>
+       </para></entry>
+      </row>
+
       <row>
        <entry role="func_table_entry"><para role="func_signature">
         <indexterm>
@@ -19464,6 +19482,24 @@ SELECT NULLIF(value, '(none)') ...
        </para></entry>
       </row>
 
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>ranges_upper</primary>
+        </indexterm>
+        <function>ranges_upper</function> ( <type>anyarray</type> )
+        <returnvalue>anyarray</returnvalue>
+       </para>
+       <para>
+        Extracts upper bounds of ranges (<literal>NULL</literal> if the
+        range is empty or the upper bound is infinite).
+       </para>
+       <para>
+        <literal>upper(ARRAY[numrange(1.1,2.2),numrange(3.3,4.4)])</literal>
+        <returnvalue>{2.2,4.4}</returnvalue>
+       </para></entry>
+      </row>
+
       <row>
        <entry role="func_table_entry"><para role="func_signature">
         <indexterm>
diff --git a/doc/src/sgml/system-views.sgml b/doc/src/sgml/system-views.sgml
index 7c8fc3f654..a48d775467 100644
--- a/doc/src/sgml/system-views.sgml
+++ b/doc/src/sgml/system-views.sgml
@@ -3783,6 +3783,46 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
        non-null elements.  (Null for scalar types.)
       </para></entry>
      </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>range_length_histogram</structfield> <type>anyarray</type>
+      </para>
+      <para>
+       A histogram of the lengths of non-empty and non-null range values of a
+       range type column. (Null for non-range types.)
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>range_empty_frac</structfield> <type>float4</type>
+      </para>
+      <para>
+       Fraction of column entries whose values are empty ranges.
+       (Null for non-range types.)
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>range_lower_histogram</structfield> <type>anyarray</type>
+      </para>
+      <para>
+       A histogram of lower bounds of non-empty and non-null range values.
+       (Null for non-range types.)
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>range_upper_histogram</structfield> <type>anyarray</type>
+      </para>
+      <para>
+       A histogram of upper bounds of non-empty and non-null range values.
+       (Null for non-range types.)
+      </para></entry>
+     </row>
     </tbody>
    </tgroup>
   </table>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 8ea159dbde..882344a2d7 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -243,7 +243,35 @@ CREATE VIEW pg_stats WITH (security_barrier) AS
             WHEN stakind3 = 5 THEN stanumbers3
             WHEN stakind4 = 5 THEN stanumbers4
             WHEN stakind5 = 5 THEN stanumbers5
-        END AS elem_count_histogram
+        END AS elem_count_histogram,
+        CASE
+            WHEN stakind1 = 6 THEN stanumbers1[1]
+            WHEN stakind2 = 6 THEN stanumbers2[1]
+            WHEN stakind3 = 6 THEN stanumbers3[1]
+            WHEN stakind4 = 6 THEN stanumbers4[1]
+            WHEN stakind5 = 6 THEN stanumbers5[1]
+        END AS range_empty_frac,
+        CASE
+            WHEN stakind1 = 6 THEN stavalues1
+            WHEN stakind2 = 6 THEN stavalues2
+            WHEN stakind3 = 6 THEN stavalues3
+            WHEN stakind4 = 6 THEN stavalues4
+            WHEN stakind5 = 6 THEN stavalues5
+        END AS range_length_histogram,
+        CASE
+            WHEN stakind1 = 7 THEN ranges_lower(stavalues1)
+            WHEN stakind2 = 7 THEN ranges_lower(stavalues2)
+            WHEN stakind3 = 7 THEN ranges_lower(stavalues3)
+            WHEN stakind4 = 7 THEN ranges_lower(stavalues4)
+            WHEN stakind5 = 7 THEN ranges_lower(stavalues5)
+            END AS range_lower_histogram,
+        CASE
+            WHEN stakind1 = 7 THEN ranges_upper(stavalues1)
+            WHEN stakind2 = 7 THEN ranges_upper(stavalues2)
+            WHEN stakind3 = 7 THEN ranges_upper(stavalues3)
+            WHEN stakind4 = 7 THEN ranges_upper(stavalues4)
+            WHEN stakind5 = 7 THEN ranges_upper(stavalues5)
+            END AS range_upper_histogram
     FROM pg_statistic s JOIN pg_class c ON (c.oid = s.starelid)
          JOIN pg_attribute a ON (c.oid = attrelid AND attnum = s.staattnum)
          LEFT JOIN pg_namespace n ON (n.oid = c.relnamespace)
diff --git a/src/backend/utils/adt/rangetypes_typanalyze.c b/src/backend/utils/adt/rangetypes_typanalyze.c
index 86810a1a6e..a288ee3ebc 100644
--- a/src/backend/utils/adt/rangetypes_typanalyze.c
+++ b/src/backend/utils/adt/rangetypes_typanalyze.c
@@ -26,6 +26,8 @@
 
 #include "catalog/pg_operator.h"
 #include "commands/vacuum.h"
+#include "utils/array.h"
+#include "utils/arrayaccess.h"
 #include "utils/float.h"
 #include "utils/fmgrprotos.h"
 #include "utils/lsyscache.h"
@@ -38,6 +40,7 @@ static int	range_bound_qsort_cmp(const void *a1, const void *a2, void *arg);
 static void compute_range_stats(VacAttrStats *stats,
 								AnalyzeAttrFetchFunc fetchfunc, int samplerows,
 								double totalrows);
+static Datum ranges_bounds_common(TypeCacheEntry *typentry, AnyArrayType *array, bool need_lower);
 
 /*
  * range_typanalyze -- typanalyze function for range columns
@@ -427,3 +430,107 @@ compute_range_stats(VacAttrStats *stats, AnalyzeAttrFetchFunc fetchfunc,
 	 * hashtable should also go away, as it used a child memory context.
 	 */
 }
+
+/*
+ * ranges_bounds_common() -- for ranges_upper_bounds() and ranges_lower_bounds()
+ */
+Datum
+ranges_bounds_common(TypeCacheEntry *typentry, AnyArrayType *array, bool need_lower)
+{
+	int			ndims = AARR_NDIM(array);
+	int		   *dims = AARR_DIMS(array);
+	int		   *lbs = AARR_LBOUND(array);
+	int			i;
+	array_iter	iter;
+	int			nelems;
+	Datum	   *elems;
+	bool	   *nulls;
+	TypeCacheEntry *typentry_element;
+
+	if (typentry->typtype != TYPTYPE_RANGE)
+		ereport(ERROR,
+				(errcode(ERRCODE_DATATYPE_MISMATCH),
+				 errmsg("expected array of ranges")));
+
+	ndims = AARR_NDIM(array);
+	dims = AARR_DIMS(array);
+	nelems = ArrayGetNItems(ndims, dims);
+
+	elems = (Datum *) palloc(nelems * sizeof(Datum));
+	nulls = (bool *) palloc(nelems * sizeof(bool));
+
+	array_iter_setup(&iter, array);
+
+	for (i = 0; i < nelems; i++)
+	{
+		Datum		itemvalue;
+		bool		isnull;
+		RangeBound	lower;
+		RangeBound	upper;
+		RangeBound *bound;
+		bool		empty;
+
+		/* Get source element, checking for NULL */
+		itemvalue = array_iter_next(&iter, &isnull, i,
+									typentry->typlen, typentry->typbyval,
+									typentry->typalign);
+
+		Assert(!isnull);
+
+		range_deserialize(typentry, (RangeType *) itemvalue, &lower, &upper, &empty);
+		bound = need_lower ? &lower : &upper;
+	    /* NULL if there's no finite bound - for consistency with range_lower/range_upper */
+		if (empty || bound->infinite)
+		{
+			nulls[i] = true;
+			elems[i] = (Datum) 0;
+		}
+		else
+		{
+			nulls[i] = false;
+			elems[i] = bound->val;
+		}
+	}
+
+	typentry_element = typentry->rngelemtype;
+
+	PG_RETURN_ARRAYTYPE_P(construct_md_array(elems, nulls,
+						  ndims, dims, lbs,
+						  typentry_element->type_id,
+						  typentry_element->typlen,
+						  typentry_element->typbyval,
+						  typentry_element->typalign));
+}
+
+/*
+ * ranges_lower_bounds() -- return array of lower bounds for ranges
+ */
+Datum
+ranges_lower_bounds(PG_FUNCTION_ARGS)
+{
+	AnyArrayType   *array = PG_GETARG_ANY_ARRAY_P(0);
+	Oid				element_type = AARR_ELEMTYPE(array);
+	TypeCacheEntry *typentry;
+
+	/* Get information about range type; note column might be a domain */
+	typentry = range_get_typcache(fcinfo, getBaseType(element_type));
+
+	return ranges_bounds_common(typentry, array, true);
+}
+
+/*
+ * ranges_upper_bounds() -- return array of upper bounds for ranges
+ */
+Datum
+ranges_upper_bounds(PG_FUNCTION_ARGS)
+{
+	AnyArrayType   *array = PG_GETARG_ANY_ARRAY_P(0);
+	Oid				element_type = AARR_ELEMTYPE(array);
+	TypeCacheEntry *typentry;
+
+	/* Get information about range type; note column might be a domain */
+	typentry = range_get_typcache(fcinfo, getBaseType(element_type));
+
+	return ranges_bounds_common(typentry, array, false);
+}
+
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 7c358cff16..b6bfd32111 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11977,4 +11977,14 @@
   proname => 'any_value_transfn', prorettype => 'anyelement',
   proargtypes => 'anyelement anyelement', prosrc => 'any_value_transfn' },
 
+{ oid => '9693', descr => 'lower bounds of ranges',
+  proname => 'ranges_lower', provolatile => 's',
+  prorettype => 'anyarray', proargtypes => 'anyarray',
+  prosrc => 'ranges_lower_bounds' },
+
+{ oid => '9694', descr => 'upper bounds of ranges',
+  proname => 'ranges_upper', provolatile => 's',
+  prorettype => 'anyarray', proargtypes => 'anyarray',
+  prosrc => 'ranges_upper_bounds' },
+
 ]
diff --git a/src/test/regress/expected/rangetypes.out b/src/test/regress/expected/rangetypes.out
index ee02ff0163..6d7461179f 100644
--- a/src/test/regress/expected/rangetypes.out
+++ b/src/test/regress/expected/rangetypes.out
@@ -597,6 +597,22 @@ select range_intersect_agg(nr) from numrange_test where nr @> 4.0;
  [3,5)
 (1 row)
 
+select ranges_lower(ARRAY[1,2,3]); -- should fail
+ERROR:  type 23 is not a range type
+select ranges_lower(array_agg(nr order by nr)), ranges_upper(array_agg(nr order by nr)) from numrange_test;
+        ranges_lower        |        ranges_upper        
+----------------------------+----------------------------
+ {NULL,NULL,NULL,1.1,1.7,3} | {NULL,5,NULL,2.2,1.7,NULL}
+(1 row)
+
+select ranges_lower(a), ranges_upper(a) from (
+  select ARRAY[ array_agg(nr order by nr), array_agg(nr order by nr desc) ] from numrange_test
+) t(a);
+                      ranges_lower                       |                      ranges_upper                       
+---------------------------------------------------------+---------------------------------------------------------
+ {{NULL,NULL,NULL,1.1,1.7,3},{3,1.7,1.1,NULL,NULL,NULL}} | {{NULL,5,NULL,2.2,1.7,NULL},{NULL,1.7,2.2,NULL,5,NULL}}
+(1 row)
+
 analyze numrange_test;
 create table numrange_test2(nr numrange);
 create index numrange_test2_hash_idx on numrange_test2 using hash (nr);
@@ -751,6 +767,12 @@ select * from textrange_test where tr < 'empty';
 ----
 (0 rows)
 
+select ranges_lower(array_agg(tr order by tr)), ranges_upper(array_agg(tr order by tr)) from textrange_test;
+      ranges_lower      |      ranges_upper      
+------------------------+------------------------
+ {NULL,NULL,NULL,a,b,d} | {NULL,q,NULL,NULL,g,d}
+(1 row)
+
 -- test canonical form for int4range
 select int4range(1, 10, '[]');
  int4range 
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 996d22b7dd..41f0310922 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2454,7 +2454,39 @@ pg_stats| SELECT n.nspname AS schemaname,
             WHEN (s.stakind4 = 5) THEN s.stanumbers4
             WHEN (s.stakind5 = 5) THEN s.stanumbers5
             ELSE NULL::real[]
-        END AS elem_count_histogram
+        END AS elem_count_histogram,
+        CASE
+            WHEN (s.stakind1 = 6) THEN s.stanumbers1[1]
+            WHEN (s.stakind2 = 6) THEN s.stanumbers2[1]
+            WHEN (s.stakind3 = 6) THEN s.stanumbers3[1]
+            WHEN (s.stakind4 = 6) THEN s.stanumbers4[1]
+            WHEN (s.stakind5 = 6) THEN s.stanumbers5[1]
+            ELSE NULL::real
+        END AS range_empty_frac,
+        CASE
+            WHEN (s.stakind1 = 6) THEN s.stavalues1
+            WHEN (s.stakind2 = 6) THEN s.stavalues2
+            WHEN (s.stakind3 = 6) THEN s.stavalues3
+            WHEN (s.stakind4 = 6) THEN s.stavalues4
+            WHEN (s.stakind5 = 6) THEN s.stavalues5
+            ELSE NULL::anyarray
+        END AS range_length_histogram,
+        CASE
+            WHEN (s.stakind1 = 7) THEN ranges_lower(s.stavalues1)
+            WHEN (s.stakind2 = 7) THEN ranges_lower(s.stavalues2)
+            WHEN (s.stakind3 = 7) THEN ranges_lower(s.stavalues3)
+            WHEN (s.stakind4 = 7) THEN ranges_lower(s.stavalues4)
+            WHEN (s.stakind5 = 7) THEN ranges_lower(s.stavalues5)
+            ELSE NULL::anyarray
+        END AS range_lower_histogram,
+        CASE
+            WHEN (s.stakind1 = 7) THEN ranges_upper(s.stavalues1)
+            WHEN (s.stakind2 = 7) THEN ranges_upper(s.stavalues2)
+            WHEN (s.stakind3 = 7) THEN ranges_upper(s.stavalues3)
+            WHEN (s.stakind4 = 7) THEN ranges_upper(s.stavalues4)
+            WHEN (s.stakind5 = 7) THEN ranges_upper(s.stavalues5)
+            ELSE NULL::anyarray
+        END AS range_upper_histogram
    FROM (((pg_statistic s
      JOIN pg_class c ON ((c.oid = s.starelid)))
      JOIN pg_attribute a ON (((c.oid = a.attrelid) AND (a.attnum = s.staattnum))))
diff --git a/src/test/regress/sql/rangetypes.sql b/src/test/regress/sql/rangetypes.sql
index c23be928c3..8948ad3e58 100644
--- a/src/test/regress/sql/rangetypes.sql
+++ b/src/test/regress/sql/rangetypes.sql
@@ -133,6 +133,12 @@ select range_intersect_agg(nr) from numrange_test;
 select range_intersect_agg(nr) from numrange_test where false;
 select range_intersect_agg(nr) from numrange_test where nr @> 4.0;
 
+select ranges_lower(ARRAY[1,2,3]); -- should fail
+select ranges_lower(array_agg(nr order by nr)), ranges_upper(array_agg(nr order by nr)) from numrange_test;
+select ranges_lower(a), ranges_upper(a) from (
+  select ARRAY[ array_agg(nr order by nr), array_agg(nr order by nr desc) ] from numrange_test
+) t(a);
+
 analyze numrange_test;
 
 create table numrange_test2(nr numrange);
@@ -195,6 +201,7 @@ select * from textrange_test where tr = '("b","g")';
 select * from textrange_test where tr = '["b","g")';
 select * from textrange_test where tr < 'empty';
 
+select ranges_lower(array_agg(tr order by tr)), ranges_upper(array_agg(tr order by tr)) from textrange_test;
 
 -- test canonical form for int4range
 select int4range(1, 10, '[]');

#18

Gregory Stark (as CFM)

stark.cfm@gmail.com

almost 3 years ago

In reply to: Egor Rogov (#17)

1 attachment(s)

Re: pg_stats and range statistics

On Fri, 24 Mar 2023 at 14:48, Egor Rogov <e.rogov@postgrespro.ru> wrote:

Done.

There is one thing I'm not sure what to do about. This check:

if (typentry->typtype != TYPTYPE_RANGE)
ereport(ERROR,
(errcode(ERRCODE_DATATYPE_MISMATCH),
errmsg("expected array of ranges")));

doesn't work, because the range_get_typcache() call errors out first
("type %u is not a range type"). The message doesn't look friendly
enough for user-faced SQL function. Should we duplicate
range_get_typcache's logic and replace the error message?

Okay. I've corrected the examples a bit.

It sounds like you've addressed Tomas's feedback and still have one
open question.

Fwiw I rebased it, it seemed to merge fine automatically.

I've updated the CF entry to Needs Review. But at this late date it
may have to wait until the next release.

--
Gregory Stark
As Commitfest Manager

Attachments:

0001-pg_stats-and-range-statistics.patchtext/x-patch; charset=US-ASCII; name=0001-pg_stats-and-range-statistics.patchDownload

From 87424880f1a970448979681684e6916f33567eeb Mon Sep 17 00:00:00 2001
From: Greg Stark <stark@mit.edu>
Date: Mon, 3 Apr 2023 17:04:11 -0400
Subject: [PATCH] pg_stats and range statistics

---
 doc/src/sgml/func.sgml                        |  36 ++++++
 doc/src/sgml/system-views.sgml                |  40 +++++++
 src/backend/catalog/system_views.sql          |  30 ++++-
 src/backend/utils/adt/rangetypes_typanalyze.c | 107 ++++++++++++++++++
 src/include/catalog/pg_proc.dat               |  10 ++
 src/test/regress/expected/rangetypes.out      |  22 ++++
 src/test/regress/expected/rules.out           |  34 +++++-
 src/test/regress/sql/rangetypes.sql           |   7 ++
 8 files changed, 284 insertions(+), 2 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 918a492234..548078a12e 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -19643,6 +19643,24 @@ SELECT NULLIF(value, '(none)') ...
        </para></entry>
       </row>
 
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>ranges_lower</primary>
+        </indexterm>
+        <function>ranges_lower</function> ( <type>anyarray</type> )
+        <returnvalue>anyarray</returnvalue>
+       </para>
+       <para>
+        Extracts lower bounds of ranges in the array (<literal>NULL</literal> if
+        the range is empty or the lower bound is infinite).
+       </para>
+       <para>
+        <literal>lower(ARRAY[numrange(1.1,2.2),numrange(3.3,4.4)])</literal>
+        <returnvalue>{1.1,3.3}</returnvalue>
+       </para></entry>
+      </row>
+
       <row>
        <entry role="func_table_entry"><para role="func_signature">
         <indexterm>
@@ -19661,6 +19679,24 @@ SELECT NULLIF(value, '(none)') ...
        </para></entry>
       </row>
 
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>ranges_upper</primary>
+        </indexterm>
+        <function>ranges_upper</function> ( <type>anyarray</type> )
+        <returnvalue>anyarray</returnvalue>
+       </para>
+       <para>
+        Extracts upper bounds of ranges (<literal>NULL</literal> if the
+        range is empty or the upper bound is infinite).
+       </para>
+       <para>
+        <literal>upper(ARRAY[numrange(1.1,2.2),numrange(3.3,4.4)])</literal>
+        <returnvalue>{2.2,4.4}</returnvalue>
+       </para></entry>
+      </row>
+
       <row>
        <entry role="func_table_entry"><para role="func_signature">
         <indexterm>
diff --git a/doc/src/sgml/system-views.sgml b/doc/src/sgml/system-views.sgml
index bb1a418450..d7760838ae 100644
--- a/doc/src/sgml/system-views.sgml
+++ b/doc/src/sgml/system-views.sgml
@@ -3784,6 +3784,46 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
        non-null elements.  (Null for scalar types.)
       </para></entry>
      </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>range_length_histogram</structfield> <type>anyarray</type>
+      </para>
+      <para>
+       A histogram of the lengths of non-empty and non-null range values of a
+       range type column. (Null for non-range types.)
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>range_empty_frac</structfield> <type>float4</type>
+      </para>
+      <para>
+       Fraction of column entries whose values are empty ranges.
+       (Null for non-range types.)
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>range_lower_histogram</structfield> <type>anyarray</type>
+      </para>
+      <para>
+       A histogram of lower bounds of non-empty and non-null range values.
+       (Null for non-range types.)
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>range_upper_histogram</structfield> <type>anyarray</type>
+      </para>
+      <para>
+       A histogram of upper bounds of non-empty and non-null range values.
+       (Null for non-range types.)
+      </para></entry>
+     </row>
     </tbody>
    </tgroup>
   </table>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 574cbc2e44..3fb7ee448f 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -243,7 +243,35 @@ CREATE VIEW pg_stats WITH (security_barrier) AS
             WHEN stakind3 = 5 THEN stanumbers3
             WHEN stakind4 = 5 THEN stanumbers4
             WHEN stakind5 = 5 THEN stanumbers5
-        END AS elem_count_histogram
+        END AS elem_count_histogram,
+        CASE
+            WHEN stakind1 = 6 THEN stanumbers1[1]
+            WHEN stakind2 = 6 THEN stanumbers2[1]
+            WHEN stakind3 = 6 THEN stanumbers3[1]
+            WHEN stakind4 = 6 THEN stanumbers4[1]
+            WHEN stakind5 = 6 THEN stanumbers5[1]
+        END AS range_empty_frac,
+        CASE
+            WHEN stakind1 = 6 THEN stavalues1
+            WHEN stakind2 = 6 THEN stavalues2
+            WHEN stakind3 = 6 THEN stavalues3
+            WHEN stakind4 = 6 THEN stavalues4
+            WHEN stakind5 = 6 THEN stavalues5
+        END AS range_length_histogram,
+        CASE
+            WHEN stakind1 = 7 THEN ranges_lower(stavalues1)
+            WHEN stakind2 = 7 THEN ranges_lower(stavalues2)
+            WHEN stakind3 = 7 THEN ranges_lower(stavalues3)
+            WHEN stakind4 = 7 THEN ranges_lower(stavalues4)
+            WHEN stakind5 = 7 THEN ranges_lower(stavalues5)
+            END AS range_lower_histogram,
+        CASE
+            WHEN stakind1 = 7 THEN ranges_upper(stavalues1)
+            WHEN stakind2 = 7 THEN ranges_upper(stavalues2)
+            WHEN stakind3 = 7 THEN ranges_upper(stavalues3)
+            WHEN stakind4 = 7 THEN ranges_upper(stavalues4)
+            WHEN stakind5 = 7 THEN ranges_upper(stavalues5)
+            END AS range_upper_histogram
     FROM pg_statistic s JOIN pg_class c ON (c.oid = s.starelid)
          JOIN pg_attribute a ON (c.oid = attrelid AND attnum = s.staattnum)
          LEFT JOIN pg_namespace n ON (n.oid = c.relnamespace)
diff --git a/src/backend/utils/adt/rangetypes_typanalyze.c b/src/backend/utils/adt/rangetypes_typanalyze.c
index 86810a1a6e..a288ee3ebc 100644
--- a/src/backend/utils/adt/rangetypes_typanalyze.c
+++ b/src/backend/utils/adt/rangetypes_typanalyze.c
@@ -26,6 +26,8 @@
 
 #include "catalog/pg_operator.h"
 #include "commands/vacuum.h"
+#include "utils/array.h"
+#include "utils/arrayaccess.h"
 #include "utils/float.h"
 #include "utils/fmgrprotos.h"
 #include "utils/lsyscache.h"
@@ -38,6 +40,7 @@ static int	range_bound_qsort_cmp(const void *a1, const void *a2, void *arg);
 static void compute_range_stats(VacAttrStats *stats,
 								AnalyzeAttrFetchFunc fetchfunc, int samplerows,
 								double totalrows);
+static Datum ranges_bounds_common(TypeCacheEntry *typentry, AnyArrayType *array, bool need_lower);
 
 /*
  * range_typanalyze -- typanalyze function for range columns
@@ -427,3 +430,107 @@ compute_range_stats(VacAttrStats *stats, AnalyzeAttrFetchFunc fetchfunc,
 	 * hashtable should also go away, as it used a child memory context.
 	 */
 }
+
+/*
+ * ranges_bounds_common() -- for ranges_upper_bounds() and ranges_lower_bounds()
+ */
+Datum
+ranges_bounds_common(TypeCacheEntry *typentry, AnyArrayType *array, bool need_lower)
+{
+	int			ndims = AARR_NDIM(array);
+	int		   *dims = AARR_DIMS(array);
+	int		   *lbs = AARR_LBOUND(array);
+	int			i;
+	array_iter	iter;
+	int			nelems;
+	Datum	   *elems;
+	bool	   *nulls;
+	TypeCacheEntry *typentry_element;
+
+	if (typentry->typtype != TYPTYPE_RANGE)
+		ereport(ERROR,
+				(errcode(ERRCODE_DATATYPE_MISMATCH),
+				 errmsg("expected array of ranges")));
+
+	ndims = AARR_NDIM(array);
+	dims = AARR_DIMS(array);
+	nelems = ArrayGetNItems(ndims, dims);
+
+	elems = (Datum *) palloc(nelems * sizeof(Datum));
+	nulls = (bool *) palloc(nelems * sizeof(bool));
+
+	array_iter_setup(&iter, array);
+
+	for (i = 0; i < nelems; i++)
+	{
+		Datum		itemvalue;
+		bool		isnull;
+		RangeBound	lower;
+		RangeBound	upper;
+		RangeBound *bound;
+		bool		empty;
+
+		/* Get source element, checking for NULL */
+		itemvalue = array_iter_next(&iter, &isnull, i,
+									typentry->typlen, typentry->typbyval,
+									typentry->typalign);
+
+		Assert(!isnull);
+
+		range_deserialize(typentry, (RangeType *) itemvalue, &lower, &upper, &empty);
+		bound = need_lower ? &lower : &upper;
+	    /* NULL if there's no finite bound - for consistency with range_lower/range_upper */
+		if (empty || bound->infinite)
+		{
+			nulls[i] = true;
+			elems[i] = (Datum) 0;
+		}
+		else
+		{
+			nulls[i] = false;
+			elems[i] = bound->val;
+		}
+	}
+
+	typentry_element = typentry->rngelemtype;
+
+	PG_RETURN_ARRAYTYPE_P(construct_md_array(elems, nulls,
+						  ndims, dims, lbs,
+						  typentry_element->type_id,
+						  typentry_element->typlen,
+						  typentry_element->typbyval,
+						  typentry_element->typalign));
+}
+
+/*
+ * ranges_lower_bounds() -- return array of lower bounds for ranges
+ */
+Datum
+ranges_lower_bounds(PG_FUNCTION_ARGS)
+{
+	AnyArrayType   *array = PG_GETARG_ANY_ARRAY_P(0);
+	Oid				element_type = AARR_ELEMTYPE(array);
+	TypeCacheEntry *typentry;
+
+	/* Get information about range type; note column might be a domain */
+	typentry = range_get_typcache(fcinfo, getBaseType(element_type));
+
+	return ranges_bounds_common(typentry, array, true);
+}
+
+/*
+ * ranges_upper_bounds() -- return array of upper bounds for ranges
+ */
+Datum
+ranges_upper_bounds(PG_FUNCTION_ARGS)
+{
+	AnyArrayType   *array = PG_GETARG_ANY_ARRAY_P(0);
+	Oid				element_type = AARR_ELEMTYPE(array);
+	TypeCacheEntry *typentry;
+
+	/* Get information about range type; note column might be a domain */
+	typentry = range_get_typcache(fcinfo, getBaseType(element_type));
+
+	return ranges_bounds_common(typentry, array, false);
+}
+
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index f9f2642201..b986610f09 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -12053,4 +12053,14 @@
   proname => 'any_value_transfn', prorettype => 'anyelement',
   proargtypes => 'anyelement anyelement', prosrc => 'any_value_transfn' },
 
+{ oid => '9693', descr => 'lower bounds of ranges',
+  proname => 'ranges_lower', provolatile => 's',
+  prorettype => 'anyarray', proargtypes => 'anyarray',
+  prosrc => 'ranges_lower_bounds' },
+
+{ oid => '9694', descr => 'upper bounds of ranges',
+  proname => 'ranges_upper', provolatile => 's',
+  prorettype => 'anyarray', proargtypes => 'anyarray',
+  prosrc => 'ranges_upper_bounds' },
+
 ]
diff --git a/src/test/regress/expected/rangetypes.out b/src/test/regress/expected/rangetypes.out
index ee02ff0163..6d7461179f 100644
--- a/src/test/regress/expected/rangetypes.out
+++ b/src/test/regress/expected/rangetypes.out
@@ -597,6 +597,22 @@ select range_intersect_agg(nr) from numrange_test where nr @> 4.0;
  [3,5)
 (1 row)
 
+select ranges_lower(ARRAY[1,2,3]); -- should fail
+ERROR:  type 23 is not a range type
+select ranges_lower(array_agg(nr order by nr)), ranges_upper(array_agg(nr order by nr)) from numrange_test;
+        ranges_lower        |        ranges_upper        
+----------------------------+----------------------------
+ {NULL,NULL,NULL,1.1,1.7,3} | {NULL,5,NULL,2.2,1.7,NULL}
+(1 row)
+
+select ranges_lower(a), ranges_upper(a) from (
+  select ARRAY[ array_agg(nr order by nr), array_agg(nr order by nr desc) ] from numrange_test
+) t(a);
+                      ranges_lower                       |                      ranges_upper                       
+---------------------------------------------------------+---------------------------------------------------------
+ {{NULL,NULL,NULL,1.1,1.7,3},{3,1.7,1.1,NULL,NULL,NULL}} | {{NULL,5,NULL,2.2,1.7,NULL},{NULL,1.7,2.2,NULL,5,NULL}}
+(1 row)
+
 analyze numrange_test;
 create table numrange_test2(nr numrange);
 create index numrange_test2_hash_idx on numrange_test2 using hash (nr);
@@ -751,6 +767,12 @@ select * from textrange_test where tr < 'empty';
 ----
 (0 rows)
 
+select ranges_lower(array_agg(tr order by tr)), ranges_upper(array_agg(tr order by tr)) from textrange_test;
+      ranges_lower      |      ranges_upper      
+------------------------+------------------------
+ {NULL,NULL,NULL,a,b,d} | {NULL,q,NULL,NULL,g,d}
+(1 row)
+
 -- test canonical form for int4range
 select int4range(1, 10, '[]');
  int4range 
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index ab1aebfde4..7c783ce47a 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2455,7 +2455,39 @@ pg_stats| SELECT n.nspname AS schemaname,
             WHEN (s.stakind4 = 5) THEN s.stanumbers4
             WHEN (s.stakind5 = 5) THEN s.stanumbers5
             ELSE NULL::real[]
-        END AS elem_count_histogram
+        END AS elem_count_histogram,
+        CASE
+            WHEN (s.stakind1 = 6) THEN s.stanumbers1[1]
+            WHEN (s.stakind2 = 6) THEN s.stanumbers2[1]
+            WHEN (s.stakind3 = 6) THEN s.stanumbers3[1]
+            WHEN (s.stakind4 = 6) THEN s.stanumbers4[1]
+            WHEN (s.stakind5 = 6) THEN s.stanumbers5[1]
+            ELSE NULL::real
+        END AS range_empty_frac,
+        CASE
+            WHEN (s.stakind1 = 6) THEN s.stavalues1
+            WHEN (s.stakind2 = 6) THEN s.stavalues2
+            WHEN (s.stakind3 = 6) THEN s.stavalues3
+            WHEN (s.stakind4 = 6) THEN s.stavalues4
+            WHEN (s.stakind5 = 6) THEN s.stavalues5
+            ELSE NULL::anyarray
+        END AS range_length_histogram,
+        CASE
+            WHEN (s.stakind1 = 7) THEN ranges_lower(s.stavalues1)
+            WHEN (s.stakind2 = 7) THEN ranges_lower(s.stavalues2)
+            WHEN (s.stakind3 = 7) THEN ranges_lower(s.stavalues3)
+            WHEN (s.stakind4 = 7) THEN ranges_lower(s.stavalues4)
+            WHEN (s.stakind5 = 7) THEN ranges_lower(s.stavalues5)
+            ELSE NULL::anyarray
+        END AS range_lower_histogram,
+        CASE
+            WHEN (s.stakind1 = 7) THEN ranges_upper(s.stavalues1)
+            WHEN (s.stakind2 = 7) THEN ranges_upper(s.stavalues2)
+            WHEN (s.stakind3 = 7) THEN ranges_upper(s.stavalues3)
+            WHEN (s.stakind4 = 7) THEN ranges_upper(s.stavalues4)
+            WHEN (s.stakind5 = 7) THEN ranges_upper(s.stavalues5)
+            ELSE NULL::anyarray
+        END AS range_upper_histogram
    FROM (((pg_statistic s
      JOIN pg_class c ON ((c.oid = s.starelid)))
      JOIN pg_attribute a ON (((c.oid = a.attrelid) AND (a.attnum = s.staattnum))))
diff --git a/src/test/regress/sql/rangetypes.sql b/src/test/regress/sql/rangetypes.sql
index c23be928c3..8948ad3e58 100644
--- a/src/test/regress/sql/rangetypes.sql
+++ b/src/test/regress/sql/rangetypes.sql
@@ -133,6 +133,12 @@ select range_intersect_agg(nr) from numrange_test;
 select range_intersect_agg(nr) from numrange_test where false;
 select range_intersect_agg(nr) from numrange_test where nr @> 4.0;
 
+select ranges_lower(ARRAY[1,2,3]); -- should fail
+select ranges_lower(array_agg(nr order by nr)), ranges_upper(array_agg(nr order by nr)) from numrange_test;
+select ranges_lower(a), ranges_upper(a) from (
+  select ARRAY[ array_agg(nr order by nr), array_agg(nr order by nr desc) ] from numrange_test
+) t(a);
+
 analyze numrange_test;
 
 create table numrange_test2(nr numrange);
@@ -195,6 +201,7 @@ select * from textrange_test where tr = '("b","g")';
 select * from textrange_test where tr = '["b","g")';
 select * from textrange_test where tr < 'empty';
 
+select ranges_lower(array_agg(tr order by tr)), ranges_upper(array_agg(tr order by tr)) from textrange_test;
 
 -- test canonical form for int4range
 select int4range(1, 10, '[]');
-- 
2.40.0

#19

jian he

jian.universality@gmail.com

over 2 years ago

In reply to: Gregory Stark (as CFM) (#18)

Re: pg_stats and range statistics

hi. I played around with the 2023-Apr 4 latest patch.

+        <literal>lower(ARRAY[numrange(1.1,2.2),numrange(3.3,4.4)])</literal>
should be
+        <literal>ranges_lower(ARRAY[numrange(1.1,2.2),numrange(3.3,4.4)])</literal>

+        <literal>upper(ARRAY[numrange(1.1,2.2),numrange(3.3,4.4)])</literal>
should be
+        <literal>ranges_upper(ARRAY[numrange(1.1,2.2),numrange(3.3,4.4)])</literal>

https://www.postgresql.org/docs/current/catalog-pg-type.html
there is no association between numrange and their base type numeric.
so for template: anyarray ranges_lower(anyarray). I don't think we can
input numrange array and return a numeric array.

https://www.postgresql.org/docs/current/extend-type-system.html#EXTEND-TYPES-POLYMORPHIC

When the return value of a function is declared as a polymorphic type, there must be at least one argument position that is also >> polymorphic, and the actual data type(s) supplied for the polymorphic arguments determine the actual result type for that call.

regression=# select
ranges_lower(ARRAY[numrange(1.1,2.2),numrange(3.3,4.4),
numrange(5.5,6.6)]);
ranges_lower
---------------
{1.1,3.3,5.5}
(1 row)
regression=# \gdesc
Column | Type
--------------+------------
ranges_lower | numrange[]
(1 row)

I don't think you can cast literal ' {1.1,3.3,5.5}' to numrange[].

#20

Alexander Korotkov

aekorotkov@gmail.com

about 2 years ago

In reply to: jian he (#19)

2 attachment(s)

Re: pg_stats and range statistics

Hi!

On Wed, Sep 6, 2023 at 6:18 PM jian he <jian.universality@gmail.com> wrote:

+        <literal>lower(ARRAY[numrange(1.1,2.2),numrange(3.3,4.4)])</literal>
should be
+        <literal>ranges_lower(ARRAY[numrange(1.1,2.2),numrange(3.3,4.4)])</literal>
+        <literal>upper(ARRAY[numrange(1.1,2.2),numrange(3.3,4.4)])</literal>
should be
+        <literal>ranges_upper(ARRAY[numrange(1.1,2.2),numrange(3.3,4.4)])</literal>
https://www.postgresql.org/docs/current/catalog-pg-type.html
there is no association between numrange and their base type numeric.
so for template: anyarray ranges_lower(anyarray). I don't think we can
input numrange array and return a numeric array.

https://www.postgresql.org/docs/current/extend-type-system.html#EXTEND-TYPES-POLYMORPHIC

When the return value of a function is declared as a polymorphic type, there must be at least one argument position that is also >> polymorphic, and the actual data type(s) supplied for the polymorphic arguments determine the actual result type for that call.

regression=# select
ranges_lower(ARRAY[numrange(1.1,2.2),numrange(3.3,4.4),
numrange(5.5,6.6)]);
ranges_lower
---------------
{1.1,3.3,5.5}
(1 row)
regression=# \gdesc
Column | Type
--------------+------------
ranges_lower | numrange[]
(1 row)

I don't think you can cast literal ' {1.1,3.3,5.5}' to numrange[].

Thank you for noticing this. Indeed, our polymorphic type system
doesn't support this case. In order to support this, we need
something like "anyrangearray" pseudo-type. However, it seems
overkill to introduce a new pseudo-type just to update pg_stats.

Additionally, I found that the current patch can't handle infinite
range bounds and discards information about inclusiveness of range
bounds. The infinite bounds could be represented as NULL (while I'm
not sure how good this representation is). Regarding inclusiveness, I
don't see the possibility to represent them in a reasonable way within
an array of base types. I also don't feel good about discarding the
accuracy in the pg_stats view.

In conclusion of all of the above, I decided to revise the patch and
show the bounds histogram as it's stored in pg_statistic. I revised
the docs correspondingly.

Also for some reason, the patch added description of new columns to
the documentation of pg_user_mapping table. I've fixed that by moving
them to the documentation of pg_stats view.

Also, I've extracted the new comment in pg_statistic.h into a separate patch.

I'm going to push this if there are no objections.

------
Regards,
Alexander Korotkov

Attachments:

0001-Add-comment-to-pg_statistic-catalog-table-20231125.patchapplication/octet-stream; name=0001-Add-comment-to-pg_statistic-catalog-table-20231125.patchDownload

From ae3461fd57564200b9fdc1216116bef512d3089c Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Sat, 25 Nov 2023 00:38:05 +0200
Subject: [PATCH 1/2] Add comment to pg_statistic catalog table

Make a reminder that pg_stats view needs to be modified whenever a new slot
kind is added.  To prevent situations like 918eee0c49 when pg_stats was
forgotten to be updated.

Discussion: https://postgr.es/m/flat/b67d8b57-9357-7e82-a2e7-f6ce6eaeec67@postgrespro.ru
Author: Egor Rogov, Soumyadeep Chakraborty
Reviewed-by: Tomas Vondra, Justin Pryzby, Jian He
---
 src/include/catalog/pg_statistic.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/include/catalog/pg_statistic.h b/src/include/catalog/pg_statistic.h
index 0808c5fc893..95878422752 100644
--- a/src/include/catalog/pg_statistic.h
+++ b/src/include/catalog/pg_statistic.h
@@ -152,6 +152,9 @@ DECLARE_FOREIGN_KEY((starelid, staattnum), pg_attribute, (attrelid, attnum));
  * data "kind" will appear in any particular slot.  Instead, search the
  * stakind fields to see if the desired data is available.  (The standard
  * function get_attstatsslot() may be used for this.)
+ *
+ * Note: The pg_stats view needs to be modified whenever a new slot kind is
+ * added to core.
  */
 
 /*
-- 
2.39.3 (Apple Git-145)

0002-Display-length-and-bounds-histograms-in-pg_-20231125.patchapplication/octet-stream; name=0002-Display-length-and-bounds-histograms-in-pg_-20231125.patchDownload

From a92bd68d5f47137426b6007bf357ea95693eba2f Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Sat, 25 Nov 2023 00:38:37 +0200
Subject: [PATCH 2/2] Display length and bounds histograms in pg_stats

Values corresponding to STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM and
STATISTIC_KIND_BOUNDS_HISTOGRAM were not exposed to pg_stats when these
slot kinds were introduced in 918eee0c49.

This commit adds the missing fields to pg_stats.

Catversion is bumped.

Discussion: https://postgr.es/m/flat/b67d8b57-9357-7e82-a2e7-f6ce6eaeec67@postgrespro.ru
Author: Egor Rogov, Soumyadeep Chakraborty
Reviewed-by: Tomas Vondra, Justin Pryzby, Jian He
---
 doc/src/sgml/system-views.sgml       | 35 ++++++++++++++++++++++++++++
 src/backend/catalog/system_views.sql | 23 +++++++++++++++++-
 src/test/regress/expected/rules.out  | 26 ++++++++++++++++++++-
 3 files changed, 82 insertions(+), 2 deletions(-)

diff --git a/doc/src/sgml/system-views.sgml b/doc/src/sgml/system-views.sgml
index 7078491c4c0..73da3850742 100644
--- a/doc/src/sgml/system-views.sgml
+++ b/doc/src/sgml/system-views.sgml
@@ -3799,6 +3799,41 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
        non-null elements.  (Null for scalar types.)
       </para></entry>
      </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>range_length_histogram</structfield> <type>anyarray</type>
+      </para>
+      <para>
+       A histogram of the lengths of non-empty and non-null range values of a
+       range type column. (Null for non-range types.)
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>range_empty_frac</structfield> <type>float4</type>
+      </para>
+      <para>
+       Fraction of column entries whose values are empty ranges.
+       (Null for non-range types.)
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>range_bounds_histogram</structfield> <type>anyarray</type>
+      </para>
+      <para>
+       A histogram of lower and upper bounds of non-empty and non-null range
+       values. (Null for non-range types.)
+      </para>
+      <para>
+       These two histograms are represented as a single array of ranges, whose
+       lower bounds represent the histogram of lower bounds, and upper bounds
+       represent the histogram of upper bounds.
+      </para></entry>
+     </row>
     </tbody>
    </tgroup>
   </table>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index b65f6b52490..11d18ed9dd6 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -243,7 +243,28 @@ CREATE VIEW pg_stats WITH (security_barrier) AS
             WHEN stakind3 = 5 THEN stanumbers3
             WHEN stakind4 = 5 THEN stanumbers4
             WHEN stakind5 = 5 THEN stanumbers5
-        END AS elem_count_histogram
+        END AS elem_count_histogram,
+        CASE
+            WHEN stakind1 = 6 THEN stavalues1
+            WHEN stakind2 = 6 THEN stavalues2
+            WHEN stakind3 = 6 THEN stavalues3
+            WHEN stakind4 = 6 THEN stavalues4
+            WHEN stakind5 = 6 THEN stavalues5
+        END AS range_length_histogram,
+        CASE
+            WHEN stakind1 = 6 THEN stanumbers1[1]
+            WHEN stakind2 = 6 THEN stanumbers2[1]
+            WHEN stakind3 = 6 THEN stanumbers3[1]
+            WHEN stakind4 = 6 THEN stanumbers4[1]
+            WHEN stakind5 = 6 THEN stanumbers5[1]
+        END AS range_empty_frac,
+        CASE
+            WHEN stakind1 = 7 THEN stavalues1
+            WHEN stakind2 = 7 THEN stavalues2
+            WHEN stakind3 = 7 THEN stavalues3
+            WHEN stakind4 = 7 THEN stavalues4
+            WHEN stakind5 = 7 THEN stavalues5
+            END AS range_bounds_histogram
     FROM pg_statistic s JOIN pg_class c ON (c.oid = s.starelid)
          JOIN pg_attribute a ON (c.oid = attrelid AND attnum = s.staattnum)
          LEFT JOIN pg_namespace n ON (n.oid = c.relnamespace)
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 1442c43d9c5..05070393b99 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2466,7 +2466,31 @@ pg_stats| SELECT n.nspname AS schemaname,
             WHEN (s.stakind4 = 5) THEN s.stanumbers4
             WHEN (s.stakind5 = 5) THEN s.stanumbers5
             ELSE NULL::real[]
-        END AS elem_count_histogram
+        END AS elem_count_histogram,
+        CASE
+            WHEN (s.stakind1 = 6) THEN s.stavalues1
+            WHEN (s.stakind2 = 6) THEN s.stavalues2
+            WHEN (s.stakind3 = 6) THEN s.stavalues3
+            WHEN (s.stakind4 = 6) THEN s.stavalues4
+            WHEN (s.stakind5 = 6) THEN s.stavalues5
+            ELSE NULL::anyarray
+        END AS range_length_histogram,
+        CASE
+            WHEN (s.stakind1 = 6) THEN s.stanumbers1[1]
+            WHEN (s.stakind2 = 6) THEN s.stanumbers2[1]
+            WHEN (s.stakind3 = 6) THEN s.stanumbers3[1]
+            WHEN (s.stakind4 = 6) THEN s.stanumbers4[1]
+            WHEN (s.stakind5 = 6) THEN s.stanumbers5[1]
+            ELSE NULL::real
+        END AS range_empty_frac,
+        CASE
+            WHEN (s.stakind1 = 7) THEN s.stavalues1
+            WHEN (s.stakind2 = 7) THEN s.stavalues2
+            WHEN (s.stakind3 = 7) THEN s.stavalues3
+            WHEN (s.stakind4 = 7) THEN s.stavalues4
+            WHEN (s.stakind5 = 7) THEN s.stavalues5
+            ELSE NULL::anyarray
+        END AS range_bounds_histogram
    FROM (((pg_statistic s
      JOIN pg_class c ON ((c.oid = s.starelid)))
      JOIN pg_attribute a ON (((c.oid = a.attrelid) AND (a.attnum = s.staattnum))))
-- 
2.39.3 (Apple Git-145)

#21

jian he

jian.universality@gmail.com

about 2 years ago

In reply to: Alexander Korotkov (#20)

Re: pg_stats and range statistics

On Sat, Nov 25, 2023 at 7:06 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:

Hi!

I'm going to push this if there are no objections.

------
Regards,
Alexander Korotkov

src/include/catalog/pg_statistic.h
268: * range type's subdiff function. Only non-null rows are considered.

should it be: * range type's subdiff function. Only non-null,
non-empty rows are considered.

Other than that, it looks fine to me.

#22

jian he

jian.universality@gmail.com

about 2 years ago

In reply to: Alexander Korotkov (#20)

Re: pg_stats and range statistics

On Sat, Nov 25, 2023 at 7:06 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:

Hi!
Additionally, I found that the current patch can't handle infinite
range bounds and discards information about inclusiveness of range
bounds. The infinite bounds could be represented as NULL (while I'm
not sure how good this representation is). Regarding inclusiveness, I
don't see the possibility to represent them in a reasonable way within
an array of base types. I also don't feel good about discarding the
accuracy in the pg_stats view.

in range_length_histogram, maybe we can document that when calculating
the length of a range, inclusiveness will be true.

#23

Egor Rogov

e.rogov@postgrespro.ru

about 2 years ago

In reply to: Alexander Korotkov (#20)

Re: pg_stats and range statistics

Hi Alexander,

On 25.11.2023 02:06, Alexander Korotkov wrote:

In conclusion of all of the above, I decided to revise the patch and
show the bounds histogram as it's stored in pg_statistic. I revised
the docs correspondingly.

So basically we returned to what it all has started from? I guess it's
better than nothing, although I have to admit that two-array
representation is much more readable. Unfortunately it brings in a
surprising amount of complexity.

Anyway, thanks for looking into this!

#24

Alexander Korotkov

aekorotkov@gmail.com

about 2 years ago

In reply to: jian he (#22)

2 attachment(s)

Re: pg_stats and range statistics

On Sat, Nov 25, 2023 at 10:58 AM jian he <jian.universality@gmail.com> wrote:

On Sat, Nov 25, 2023 at 7:06 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:

Hi!
Additionally, I found that the current patch can't handle infinite
range bounds and discards information about inclusiveness of range
bounds. The infinite bounds could be represented as NULL (while I'm
not sure how good this representation is). Regarding inclusiveness, I
don't see the possibility to represent them in a reasonable way within
an array of base types. I also don't feel good about discarding the
accuracy in the pg_stats view.

in range_length_histogram, maybe we can document that when calculating
the length of a range, inclusiveness will be true.

I've revised the patchset. Edited comment in pg_statistic.h as you
proposed. And I've added to the documentation a short note on how the
range length histogram is calculated.

------
Regards,
Alexander Korotkov

Attachments:

0001-Update-comments-for-pg_statistic-catalog--20231125-2.patchapplication/octet-stream; name=0001-Update-comments-for-pg_statistic-catalog--20231125-2.patchDownload

From 1887f2ed8cc16a00c60bf72533a8bb195c040faf Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Sat, 25 Nov 2023 00:38:05 +0200
Subject: [PATCH 1/2] Update comments for pg_statistic catalog table

Make a reminder that pg_stats view needs to be modified whenever a new slot
kind is added.  To prevent situations like 918eee0c49 when pg_stats was
forgotten to be updated.

Also, revise the comment that only non-null, non-empty rows are considered
for the range length histogram.

Discussion: https://postgr.es/m/flat/b67d8b57-9357-7e82-a2e7-f6ce6eaeec67@postgrespro.ru
Author: Egor Rogov, Soumyadeep Chakraborty
Reviewed-by: Tomas Vondra, Justin Pryzby, Jian He
---
 src/include/catalog/pg_statistic.h | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/src/include/catalog/pg_statistic.h b/src/include/catalog/pg_statistic.h
index 0808c5fc893..87f48dc6894 100644
--- a/src/include/catalog/pg_statistic.h
+++ b/src/include/catalog/pg_statistic.h
@@ -152,6 +152,9 @@ DECLARE_FOREIGN_KEY((starelid, staattnum), pg_attribute, (attrelid, attnum));
  * data "kind" will appear in any particular slot.  Instead, search the
  * stakind fields to see if the desired data is available.  (The standard
  * function get_attstatsslot() may be used for this.)
+ *
+ * Note: The pg_stats view needs to be modified whenever a new slot kind is
+ * added to core.
  */
 
 /*
@@ -262,7 +265,8 @@ DECLARE_FOREIGN_KEY((starelid, staattnum), pg_attribute, (attrelid, attnum));
  * a format similar to STATISTIC_KIND_HISTOGRAM: it contains M (>=2) range
  * values that divide the column data values into M-1 bins of approximately
  * equal population. The lengths are stored as float8s, as measured by the
- * range type's subdiff function. Only non-null rows are considered.
+ * range type's subdiff function. Only non-null, non-empty rows are
+ * considered.
  */
 #define STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM  6
 
-- 
2.39.3 (Apple Git-145)

0002-Display-length-and-bounds-histograms-in-p-20231125-2.patchapplication/octet-stream; name=0002-Display-length-and-bounds-histograms-in-p-20231125-2.patchDownload

From 235b2f3de5fff7ae07b0499831435d7b536d6306 Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Sat, 25 Nov 2023 00:38:37 +0200
Subject: [PATCH 2/2] Display length and bounds histograms in pg_stats

Values corresponding to STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM and
STATISTIC_KIND_BOUNDS_HISTOGRAM were not exposed to pg_stats when these
slot kinds were introduced in 918eee0c49.

This commit adds the missing fields to pg_stats.

Catversion is bumped.

Discussion: https://postgr.es/m/flat/b67d8b57-9357-7e82-a2e7-f6ce6eaeec67@postgrespro.ru
Author: Egor Rogov, Soumyadeep Chakraborty
Reviewed-by: Tomas Vondra, Justin Pryzby, Jian He
---
 doc/src/sgml/system-views.sgml       | 39 ++++++++++++++++++++++++++++
 src/backend/catalog/system_views.sql | 23 +++++++++++++++-
 src/test/regress/expected/rules.out  | 26 ++++++++++++++++++-
 3 files changed, 86 insertions(+), 2 deletions(-)

diff --git a/doc/src/sgml/system-views.sgml b/doc/src/sgml/system-views.sgml
index 7078491c4c0..0ef17456318 100644
--- a/doc/src/sgml/system-views.sgml
+++ b/doc/src/sgml/system-views.sgml
@@ -3799,6 +3799,45 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
        non-null elements.  (Null for scalar types.)
       </para></entry>
      </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>range_length_histogram</structfield> <type>anyarray</type>
+      </para>
+      <para>
+       A histogram of the lengths of non-empty and non-null range values of a
+       range type column. (Null for non-range types.)
+      </para>
+      <para>
+       This histogram is calculated using the <literal>subtype_diff</literal>
+       range function regardless of whether range bounds are inclusive.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>range_empty_frac</structfield> <type>float4</type>
+      </para>
+      <para>
+       Fraction of column entries whose values are empty ranges.
+       (Null for non-range types.)
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>range_bounds_histogram</structfield> <type>anyarray</type>
+      </para>
+      <para>
+       A histogram of lower and upper bounds of non-empty and non-null range
+       values. (Null for non-range types.)
+      </para>
+      <para>
+       These two histograms are represented as a single array of ranges, whose
+       lower bounds represent the histogram of lower bounds, and upper bounds
+       represent the histogram of upper bounds.
+      </para></entry>
+     </row>
     </tbody>
    </tgroup>
   </table>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index b65f6b52490..11d18ed9dd6 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -243,7 +243,28 @@ CREATE VIEW pg_stats WITH (security_barrier) AS
             WHEN stakind3 = 5 THEN stanumbers3
             WHEN stakind4 = 5 THEN stanumbers4
             WHEN stakind5 = 5 THEN stanumbers5
-        END AS elem_count_histogram
+        END AS elem_count_histogram,
+        CASE
+            WHEN stakind1 = 6 THEN stavalues1
+            WHEN stakind2 = 6 THEN stavalues2
+            WHEN stakind3 = 6 THEN stavalues3
+            WHEN stakind4 = 6 THEN stavalues4
+            WHEN stakind5 = 6 THEN stavalues5
+        END AS range_length_histogram,
+        CASE
+            WHEN stakind1 = 6 THEN stanumbers1[1]
+            WHEN stakind2 = 6 THEN stanumbers2[1]
+            WHEN stakind3 = 6 THEN stanumbers3[1]
+            WHEN stakind4 = 6 THEN stanumbers4[1]
+            WHEN stakind5 = 6 THEN stanumbers5[1]
+        END AS range_empty_frac,
+        CASE
+            WHEN stakind1 = 7 THEN stavalues1
+            WHEN stakind2 = 7 THEN stavalues2
+            WHEN stakind3 = 7 THEN stavalues3
+            WHEN stakind4 = 7 THEN stavalues4
+            WHEN stakind5 = 7 THEN stavalues5
+            END AS range_bounds_histogram
     FROM pg_statistic s JOIN pg_class c ON (c.oid = s.starelid)
          JOIN pg_attribute a ON (c.oid = attrelid AND attnum = s.staattnum)
          LEFT JOIN pg_namespace n ON (n.oid = c.relnamespace)
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 1442c43d9c5..05070393b99 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2466,7 +2466,31 @@ pg_stats| SELECT n.nspname AS schemaname,
             WHEN (s.stakind4 = 5) THEN s.stanumbers4
             WHEN (s.stakind5 = 5) THEN s.stanumbers5
             ELSE NULL::real[]
-        END AS elem_count_histogram
+        END AS elem_count_histogram,
+        CASE
+            WHEN (s.stakind1 = 6) THEN s.stavalues1
+            WHEN (s.stakind2 = 6) THEN s.stavalues2
+            WHEN (s.stakind3 = 6) THEN s.stavalues3
+            WHEN (s.stakind4 = 6) THEN s.stavalues4
+            WHEN (s.stakind5 = 6) THEN s.stavalues5
+            ELSE NULL::anyarray
+        END AS range_length_histogram,
+        CASE
+            WHEN (s.stakind1 = 6) THEN s.stanumbers1[1]
+            WHEN (s.stakind2 = 6) THEN s.stanumbers2[1]
+            WHEN (s.stakind3 = 6) THEN s.stanumbers3[1]
+            WHEN (s.stakind4 = 6) THEN s.stanumbers4[1]
+            WHEN (s.stakind5 = 6) THEN s.stanumbers5[1]
+            ELSE NULL::real
+        END AS range_empty_frac,
+        CASE
+            WHEN (s.stakind1 = 7) THEN s.stavalues1
+            WHEN (s.stakind2 = 7) THEN s.stavalues2
+            WHEN (s.stakind3 = 7) THEN s.stavalues3
+            WHEN (s.stakind4 = 7) THEN s.stavalues4
+            WHEN (s.stakind5 = 7) THEN s.stavalues5
+            ELSE NULL::anyarray
+        END AS range_bounds_histogram
    FROM (((pg_statistic s
      JOIN pg_class c ON ((c.oid = s.starelid)))
      JOIN pg_attribute a ON (((c.oid = a.attrelid) AND (a.attnum = s.staattnum))))
-- 
2.39.3 (Apple Git-145)

#25

Alexander Korotkov

aekorotkov@gmail.com

about 2 years ago

In reply to: Egor Rogov (#23)

Re: pg_stats and range statistics

On Sat, Nov 25, 2023 at 11:14 AM Egor Rogov <e.rogov@postgrespro.ru> wrote:

Hi Alexander,

On 25.11.2023 02:06, Alexander Korotkov wrote:

In conclusion of all of the above, I decided to revise the patch and
show the bounds histogram as it's stored in pg_statistic. I revised
the docs correspondingly.

So basically we returned to what it all has started from? I guess it's
better than nothing, although I have to admit that two-array
representation is much more readable. Unfortunately it brings in a
surprising amount of complexity.

Yep, it is.

Anyway, thanks for looking into this!

And thank you for the feedback!

------
Regards,
Alexander Korotkov