list of extended statistics on psql

Started by Tatsuro Yamadaover 5 years ago82 messageshackers
Jump to latest
#1Tatsuro Yamada
tatsuro.yamada.tf@nttcom.co.jp

Hi!

I created a POC patch that allows showing a list of extended statistics by
"\dz" command on psql. I believe this feature helps DBA and users who
would like to know all extended statistics easily. :-D

I have not a strong opinion to assign "\dz". I prefer "\dx" or "\de*"
than "\dz" but they were already assigned. Therefore I used "\dz"
instead of them.

Please find the attached patch.
Any comments are welcome!

For Example:
=======================
CREATE TABLE t1 (a INT, b INT);
CREATE STATISTICS stts1 (dependencies) ON a, b FROM t1;
CREATE STATISTICS stts2 (dependencies, ndistinct) ON a, b FROM t1;
CREATE STATISTICS stts3 (dependencies, ndistinct, mcv) ON a, b FROM t1;
ANALYZE t1;

CREATE TABLE t2 (a INT, b INT, c INT);
CREATE STATISTICS stts4 ON b, c FROM t2;
ANALYZE t2;

postgres=# \dz
List of extended statistics
Schema | Table | Name | Columns | Ndistinct | Dependencies | MCV
--------+-------+-------+---------+-----------+--------------+-----
public | t1 | stts1 | a, b | f | t | f
public | t1 | stts2 | a, b | t | t | f
public | t1 | stts3 | a, b | t | t | t
public | t2 | stts4 | b, c | t | t | t
(4 rows)

postgres=# \?
...
\dy [PATTERN] list event triggers
\dz [PATTERN] list extended statistics
\l[+] [PATTERN] list databases
...
=======================

For now, I haven't written a document and regression test for that.
I'll create it later.

Thanks,
Tatsuro Yamada

Attachments:

add_list_extended_stats_for_psql_poc1.patchtext/plain; charset=UTF-8; name=add_list_extended_stats_for_psql_poc1.patchDownload+68-0
#2Pavel Stehule
pavel.stehule@gmail.com
In reply to: Tatsuro Yamada (#1)
Re: list of extended statistics on psql

po 24. 8. 2020 v 5:23 odesílatel Tatsuro Yamada <
tatsuro.yamada.tf@nttcom.co.jp> napsal:

Hi!

I created a POC patch that allows showing a list of extended statistics by
"\dz" command on psql. I believe this feature helps DBA and users who
would like to know all extended statistics easily. :-D

I have not a strong opinion to assign "\dz". I prefer "\dx" or "\de*"
than "\dz" but they were already assigned. Therefore I used "\dz"
instead of them.

Please find the attached patch.
Any comments are welcome!

For Example:
=======================
CREATE TABLE t1 (a INT, b INT);
CREATE STATISTICS stts1 (dependencies) ON a, b FROM t1;
CREATE STATISTICS stts2 (dependencies, ndistinct) ON a, b FROM t1;
CREATE STATISTICS stts3 (dependencies, ndistinct, mcv) ON a, b FROM t1;
ANALYZE t1;

CREATE TABLE t2 (a INT, b INT, c INT);
CREATE STATISTICS stts4 ON b, c FROM t2;
ANALYZE t2;

postgres=# \dz
List of extended statistics
Schema | Table | Name | Columns | Ndistinct | Dependencies | MCV
--------+-------+-------+---------+-----------+--------------+-----
public | t1 | stts1 | a, b | f | t | f
public | t1 | stts2 | a, b | t | t | f
public | t1 | stts3 | a, b | t | t | t
public | t2 | stts4 | b, c | t | t | t
(4 rows)

postgres=# \?
...
\dy [PATTERN] list event triggers
\dz [PATTERN] list extended statistics
\l[+] [PATTERN] list databases
...
=======================

For now, I haven't written a document and regression test for that.
I'll create it later.

+1 good idea

Pavel

Show quoted text

Thanks,
Tatsuro Yamada

#3Julien Rouhaud
rjuju123@gmail.com
In reply to: Pavel Stehule (#2)
Re: list of extended statistics on psql

On Mon, Aug 24, 2020 at 6:13 AM Pavel Stehule <pavel.stehule@gmail.com> wrote:

po 24. 8. 2020 v 5:23 odesílatel Tatsuro Yamada <tatsuro.yamada.tf@nttcom.co.jp> napsal:

Hi!

I created a POC patch that allows showing a list of extended statistics by
"\dz" command on psql. I believe this feature helps DBA and users who
would like to know all extended statistics easily. :-D

I have not a strong opinion to assign "\dz". I prefer "\dx" or "\de*"
than "\dz" but they were already assigned. Therefore I used "\dz"
instead of them.

Please find the attached patch.
Any comments are welcome!

For Example:
=======================
CREATE TABLE t1 (a INT, b INT);
CREATE STATISTICS stts1 (dependencies) ON a, b FROM t1;
CREATE STATISTICS stts2 (dependencies, ndistinct) ON a, b FROM t1;
CREATE STATISTICS stts3 (dependencies, ndistinct, mcv) ON a, b FROM t1;
ANALYZE t1;

CREATE TABLE t2 (a INT, b INT, c INT);
CREATE STATISTICS stts4 ON b, c FROM t2;
ANALYZE t2;

postgres=# \dz
List of extended statistics
Schema | Table | Name | Columns | Ndistinct | Dependencies | MCV
--------+-------+-------+---------+-----------+--------------+-----
public | t1 | stts1 | a, b | f | t | f
public | t1 | stts2 | a, b | t | t | f
public | t1 | stts3 | a, b | t | t | t
public | t2 | stts4 | b, c | t | t | t
(4 rows)

postgres=# \?
...
\dy [PATTERN] list event triggers
\dz [PATTERN] list extended statistics
\l[+] [PATTERN] list databases
...
=======================

For now, I haven't written a document and regression test for that.
I'll create it later.

+1 good idea

+1 that's a good idea. Please add it to the next commitfest!

You have a typo:

+    if (pset.sversion < 10000)
+    {
+        char        sverbuf[32];
+
+        pg_log_error("The server (version %s) does not support
extended statistics.",
+                     formatPGVersionNumber(pset.sversion, false,
+                                           sverbuf, sizeof(sverbuf)));
+        return true;
+    }

the version test is missing a 0, the feature looks otherwise ok.

How about using \dX rather than \dz?

#4Tatsuro Yamada
tatsuro.yamada.tf@nttcom.co.jp
In reply to: Julien Rouhaud (#3)
Re: list of extended statistics on psql

Hi!

+1 good idea

+1 that's a good idea. Please add it to the next commitfest!

Thanks!

You have a typo:

+    if (pset.sversion < 10000)
+    {
+        char        sverbuf[32];
+
+        pg_log_error("The server (version %s) does not support
extended statistics.",
+                     formatPGVersionNumber(pset.sversion, false,
+                                           sverbuf, sizeof(sverbuf)));
+        return true;
+    }

the version test is missing a 0, the feature looks otherwise ok.

Ouch, I fixed on the attached patch.

The new patch includes:

- Fix the version number check (10000 -> 100000)
- Fix query to get extended stats info for sort order
- Add handling [Pattern] e.g \dz stts*
- Add document and regression test for \dz

How about using \dX rather than \dz?

Thanks for your suggestion!
I'll replace it if I got consensus. :-D

Thanks,
Tatsuro Yamada

Attachments:

add_list_extended_stats_for_psql_poc2.patchtext/plain; charset=UTF-8; name=add_list_extended_stats_for_psql_poc2.patchDownload+142-0
#5Tatsuro Yamada
tatsuro.yamada.tf@nttcom.co.jp
In reply to: Tatsuro Yamada (#4)
Re: list of extended statistics on psql

Hi Julien and Pavel!

How about using \dX rather than \dz?

Thanks for your suggestion!
I'll replace it if I got consensus. :-D

How about using \dX rather than \dz?

Thanks for your suggestion!
I'll replace it if I got consensus. :-D

I re-read a help message of \d* commands and realized it's better to
use "\dX".
There are already cases where the commands differ due to differences
in case, so I did the same way. Please find attached patch. :-D

For example:
==========
\da[S] [PATTERN] list aggregates
\dA[+] [PATTERN] list access methods
==========

Attached patch uses "\dX" instead of "\dz":
==========
\dx[+] [PATTERN] list extensions
\dX [PATTERN] list extended statistics
==========

Results of regress test of the feature are the following:
==========
-- check printing info about extended statistics
create table t1 (a int, b int);
create statistics stts_1 (dependencies) on a, b from t1;
create statistics stts_2 (dependencies, ndistinct) on a, b from t1;
create statistics stts_3 (dependencies, ndistinct, mcv) on a, b from t1;
create table t2 (a int, b int, c int);
create statistics stts_4 on b, c from t2;
create table hoge (col1 int, col2 int, col3 int);
create statistics stts_hoge on col1, col2, col3 from hoge;

\dX
List of extended statistics
Schema | Table | Name | Columns | Ndistinct | Dependencies | MCV
--------+-------+-----------+------------------+-----------+--------------+-----
public | hoge | stts_hoge | col1, col2, col3 | t | t | t
public | t1 | stts_1 | a, b | f | t | f
public | t1 | stts_2 | a, b | t | t | f
public | t1 | stts_3 | a, b | t | t | t
public | t2 | stts_4 | b, c | t | t | t
(5 rows)

\dX stts_?
List of extended statistics
Schema | Table | Name | Columns | Ndistinct | Dependencies | MCV
--------+-------+--------+---------+-----------+--------------+-----
public | t1 | stts_1 | a, b | f | t | f
public | t1 | stts_2 | a, b | t | t | f
public | t1 | stts_3 | a, b | t | t | t
public | t2 | stts_4 | b, c | t | t | t
(4 rows)

\dX *hoge
List of extended statistics
Schema | Table | Name | Columns | Ndistinct | Dependencies | MCV
--------+-------+-----------+------------------+-----------+--------------+-----
public | hoge | stts_hoge | col1, col2, col3 | t | t | t
(1 row)
==========

Thanks,
Tatsuro Yamada

Attachments:

add_list_extended_stats_for_psql_by_dX_command.patchtext/plain; charset=UTF-8; name=add_list_extended_stats_for_psql_by_dX_command.patchDownload+142-0
#6Julien Rouhaud
rjuju123@gmail.com
In reply to: Tatsuro Yamada (#5)
Re: list of extended statistics on psql

Hi Yamada-san,

On Thu, Aug 27, 2020 at 03:13:09PM +0900, Tatsuro Yamada wrote:

I re-read a help message of \d* commands and realized it's better to
use "\dX".
There are already cases where the commands differ due to differences
in case, so I did the same way. Please find attached patch. :-D
For example:
==========
\da[S] [PATTERN] list aggregates
\dA[+] [PATTERN] list access methods
==========

Attached patch uses "\dX" instead of "\dz":
==========
\dx[+] [PATTERN] list extensions
\dX [PATTERN] list extended statistics
==========

Thanks for updating the patch! This alias will probably be easier to remember.

Results of regress test of the feature are the following:
==========
-- check printing info about extended statistics
create table t1 (a int, b int);
create statistics stts_1 (dependencies) on a, b from t1;
create statistics stts_2 (dependencies, ndistinct) on a, b from t1;
create statistics stts_3 (dependencies, ndistinct, mcv) on a, b from t1;
create table t2 (a int, b int, c int);
create statistics stts_4 on b, c from t2;
create table hoge (col1 int, col2 int, col3 int);
create statistics stts_hoge on col1, col2, col3 from hoge;

\dX
List of extended statistics
Schema | Table | Name | Columns | Ndistinct | Dependencies | MCV
--------+-------+-----------+------------------+-----------+--------------+-----
public | hoge | stts_hoge | col1, col2, col3 | t | t | t
public | t1 | stts_1 | a, b | f | t | f
public | t1 | stts_2 | a, b | t | t | f
public | t1 | stts_3 | a, b | t | t | t
public | t2 | stts_4 | b, c | t | t | t
(5 rows)

\dX stts_?
List of extended statistics
Schema | Table | Name | Columns | Ndistinct | Dependencies | MCV
--------+-------+--------+---------+-----------+--------------+-----
public | t1 | stts_1 | a, b | f | t | f
public | t1 | stts_2 | a, b | t | t | f
public | t1 | stts_3 | a, b | t | t | t
public | t2 | stts_4 | b, c | t | t | t
(4 rows)

\dX *hoge
List of extended statistics
Schema | Table | Name | Columns | Ndistinct | Dependencies | MCV
--------+-------+-----------+------------------+-----------+--------------+-----
public | hoge | stts_hoge | col1, col2, col3 | t | t | t
(1 row)
==========

Thanks also for the documentation and regression tests. This overall looks
good, I just have a two comments:

- there's a whitespace issue in the documentation part:

add_list_extended_stats_for_psql_by_dX_command.patch:10: tab in indent.
<varlistentry>
warning: 1 line adds whitespace errors.

- You're sorting the output on schema, table, extended statistics and columns
but I think the last one isn't required since extended statistics names are
unique.

#7Tatsuro Yamada
tatsuro.yamada.tf@nttcom.co.jp
In reply to: Julien Rouhaud (#6)
Re: list of extended statistics on psql

Hi Julien!

Thanks also for the documentation and regression tests. This overall looks
good, I just have a two comments:

Thank you for reviewing the patch! :-D

- there's a whitespace issue in the documentation part:

add_list_extended_stats_for_psql_by_dX_command.patch:10: tab in indent.
<varlistentry>
warning: 1 line adds whitespace errors.

Oops, I forgot to use "git diff --check". I fixed it.

- You're sorting the output on schema, table, extended statistics and columns
but I think the last one isn't required since extended statistics names are
unique.

You are right.
The sort key "columns" was not necessary so I removed it.

Attached new patch includes the above two fixes:

- Fix whitespace issue in the documentation part
- Remove unnecessary sort key from the query
(ORDER BY 1, 2, 3, 4 -> ORDER BY 1, 2, 3)

Thanks,
Tatsuro Yamada

Attachments:

add_list_extended_stats_for_psql_by_dX_command_r2.patchtext/plain; charset=UTF-8; name=add_list_extended_stats_for_psql_by_dX_command_r2.patchDownload+142-0
#8Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tatsuro Yamada (#7)
Re: list of extended statistics on psql

+1 for the general idea, and +1 for \dX being the syntax to use

IMO the per-type columns should show both the type being enabled as
well as it being built.

(How many more stat types do we expect -- Tomas? I wonder if having one
column per type is going to scale in the long run.)

Also, the stat obj name column should be first, followed by a single
column listing both table and columns that it applies to. Keep in mind
that in the future we might want to add stats that cross multiple tables
-- that's why the CREATE syntax is the way it is. So we should give
room for that in psql's display too.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#9Tatsuro Yamada
tatsuro.yamada.tf@nttcom.co.jp
In reply to: Alvaro Herrera (#8)
Re: list of extended statistics on psql

Hi Alvaro!

It's been ages since we created a progress reporting feature together. :-D

+1 good idea

+1 that's a good idea. Please add it to the next commitfest!

+1 for the general idea, and +1 for \dX being the syntax to use

Thank you for voting!

IMO the per-type columns should show both the type being enabled as

well as it being built.

Hmm. I'm not sure how to get the status (enabled or disabled) of
extended stats. :(
Could you explain it more?

Also, the stat obj name column should be first, followed by a single
column listing both table and columns that it applies to. Keep in mind
that in the future we might want to add stats that cross multiple tables
-- that's why the CREATE syntax is the way it is. So we should give
room for that in psql's display too.

I understand your suggestions are the following, right?

* The Current column order:
===================
Schema | Table | Name | Columns | Ndistinct | Dependencies | MCV
--------+-------+--------+---------+-----------+--------------+-----
public | t1 | stts_1 | a, b | f | t | f
public | t1 | stts_2 | a, b | t | t | f
public | t1 | stts_3 | a, b | t | t | t
public | t2 | stts_4 | b, c | t | t | t
===================

* The suggested column order is like this:
===================
Name | Schema | Table | Columns | Ndistinct | Dependencies | MCV
-----------+--------+-------+------------------+-----------+--------------+-----
stts_1 | public | t1 | a, b | f | t | f
stts_2 | public | t1 | a, b | t | t | f
stts_3 | public | t1 | a, b | t | t | t
stts_4 | public | t2 | b, c | t | t | t
===================

* In the future, Extended stats that cross multiple tables will be
shown maybe... (t1, t2):
===================
Name | Schema | Table | Columns | Ndistinct | Dependencies | MCV
-----------+--------+--------+------------------+-----------+--------------+-----
stts_5 | public | t1, t2 | a, b | f | t | f
===================

If so, I can revise the column order as you suggested easily.
However, I have no idea how to show extended stats that cross
multiple tables and the status now.

I suppose that the current column order is sufficient if there is
no improvement of extended stats on PG14. Do you know any plan to
improve extended stats such as to allow it to cross multiple tables on PG14?

In addition,
Currently, I use this query to get Extended stats info from pg_statistic_ext.

SELECT
stxnamespace::pg_catalog.regnamespace AS "Schema",
c.relname AS "Table",
stxname AS "Name",
(SELECT pg_catalog.string_agg(pg_catalog.quote_ident(attname),', ')
FROM pg_catalog.unnest(stxkeys) s(attnum)
JOIN pg_catalog.pg_attribute a ON (stxrelid = a.attrelid AND
a.attnum = s.attnum AND NOT attisdropped)) AS "Columns",
'd' = any(stxkind) AS "Ndistinct",
'f' = any(stxkind) AS "Dependencies",
'm' = any(stxkind) AS "MCV"
FROM pg_catalog.pg_statistic_ext
INNER JOIN pg_catalog.pg_class c
ON stxrelid = c.oid
ORDER BY 1, 2, 3;

Thanks,
Tatsuro Yamada

#10Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tatsuro Yamada (#9)
Re: list of extended statistics on psql

On 2020-Aug-28, Tatsuro Yamada wrote:

IMO the per-type columns should show both the type being enabled as
well as it being built.

Hmm. I'm not sure how to get the status (enabled or disabled) of
extended stats. :(
Could you explain it more?

pg_statistic_ext_data.stxdndistinct is not null if the stats have been
built. (I'm not sure whether there's an easier way to determine this.)

* The suggested column order is like this:
===================
Name | Schema | Table | Columns | Ndistinct | Dependencies | MCV
-----------+--------+-------+------------------+-----------+--------------+-----
stts_1 | public | t1 | a, b | f | t | f
stts_2 | public | t1 | a, b | t | t | f
stts_3 | public | t1 | a, b | t | t | t
stts_4 | public | t2 | b, c | t | t | t
===================

I suggest to do this

Name | Schema | Definition | Ndistinct | Dependencies | MCV
-----------+--------+--------------------------+-----------+--------------+-----
stts_1 | public | (a, b) FROM t1 | f | t | f

I suppose that the current column order is sufficient if there is
no improvement of extended stats on PG14. Do you know any plan to
improve extended stats such as to allow it to cross multiple tables on PG14?

I suggest that changing it in the future is going to be an uphill
battle, so better get it right from the get go, without requiring a
future restructure.

In addition,
Currently, I use this query to get Extended stats info from pg_statistic_ext.

Maybe something like this would do

SELECT
stxnamespace::pg_catalog.regnamespace AS "Schema",
stxname AS "Name",
format('%s FROM %s',
(SELECT pg_catalog.string_agg(pg_catalog.quote_ident(attname),', ')
FROM pg_catalog.unnest(stxkeys) s(attnum)
JOIN pg_catalog.pg_attribute a ON (stxrelid = a.attrelid AND
a.attnum = s.attnum AND NOT attisdropped)),
stxrelid::regclass) AS "Definition",
CASE WHEN stxdndistinct IS NOT NULL THEN 'built' WHEN 'd' = any(stxkind) THEN 'enabled, not built' END AS "n-distinct",
CASE WHEN stxddependencies IS NOT NULL THEN 'built' WHEN 'f' = any(stxkind) THEN 'enabled, not built' END AS "functional dependencies",
CASE WHEN stxdmcv IS NOT NULL THEN 'built' WHEN 'm' = any(stxkind) THEN 'enabled, not built' END AS mcv
FROM pg_catalog.pg_statistic_ext es
INNER JOIN pg_catalog.pg_class c
ON stxrelid = c.oid
LEFT JOIN pg_catalog.pg_statistic_ext_data esd ON es.oid = esd.stxoid
ORDER BY 1, 2, 3;

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#11Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Alvaro Herrera (#8)
Re: list of extended statistics on psql

On Thu, Aug 27, 2020 at 07:53:23PM -0400, Alvaro Herrera wrote:

+1 for the general idea, and +1 for \dX being the syntax to use

IMO the per-type columns should show both the type being enabled as
well as it being built.

(How many more stat types do we expect -- Tomas? I wonder if having one
column per type is going to scale in the long run.)

I wouldn't expect a huge number of types. I can imagine maybe twice the
current number of types, but not much more. But I'm not sure the output
is easy to read even now ...

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#12Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Alvaro Herrera (#10)
Re: list of extended statistics on psql

On Thu, Aug 27, 2020 at 11:26:17PM -0400, Alvaro Herrera wrote:

On 2020-Aug-28, Tatsuro Yamada wrote:

IMO the per-type columns should show both the type being enabled as
well as it being built.

Hmm. I'm not sure how to get the status (enabled or disabled) of
extended stats. :(
Could you explain it more?

pg_statistic_ext_data.stxdndistinct is not null if the stats have been
built. (I'm not sure whether there's an easier way to determine this.)

It's the only way, I think. Which types were requested is stored in

pg_statistic_ext.stxkind

and what was built is in pg_statistic_ext_data. But if we want the
output to show both what was requested and which types were actually
built, that'll effectively double the number of columns needed :-(

Also, it might be useful to show the size of the statistics built, just
like we show for \d+ etc.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#13Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tomas Vondra (#12)
Re: list of extended statistics on psql

On 2020-Aug-29, Tomas Vondra wrote:

But if we want the
output to show both what was requested and which types were actually
built, that'll effectively double the number of columns needed :-(

I was thinking it would be one column per type showing either disabled or enabled
or built. But another idea is to show one type per line that's at least
enabled.

Also, it might be useful to show the size of the statistics built, just
like we show for \d+ etc.

\dX+ I suppose?

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#14Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Alvaro Herrera (#13)
Re: list of extended statistics on psql

On Sat, Aug 29, 2020 at 06:43:47PM -0400, Alvaro Herrera wrote:

On 2020-Aug-29, Tomas Vondra wrote:

But if we want the
output to show both what was requested and which types were actually
built, that'll effectively double the number of columns needed :-(

I was thinking it would be one column per type showing either disabled or enabled
or built. But another idea is to show one type per line that's at least
enabled.

Also, it might be useful to show the size of the statistics built, just
like we show for \d+ etc.

\dX+ I suppose?

Right. I've only used \d+ as an example of an existing command showing
sizes of the objects.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#15Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tomas Vondra (#14)
Re: list of extended statistics on psql

On 2020-Aug-30, Tomas Vondra wrote:

On Sat, Aug 29, 2020 at 06:43:47PM -0400, Alvaro Herrera wrote:

On 2020-Aug-29, Tomas Vondra wrote:

Also, it might be useful to show the size of the statistics built, just
like we show for \d+ etc.

\dX+ I suppose?

Right. I've only used \d+ as an example of an existing command showing
sizes of the objects.

Yeah, I understood it that way too.

How can you measure the size of the stat objects in a query? Are you
thinking in pg_column_size()?

I wonder how to report that. Knowing that psql \-commands are not meant
for anything other than human consumption, maybe we can use a format()
string that says "built: %d bytes" when \dX+ is used (for each stat type),
and just "built" when \dX is used. What do people think about this?

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#16Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Alvaro Herrera (#15)
Re: list of extended statistics on psql

On Sun, Aug 30, 2020 at 12:33:29PM -0400, Alvaro Herrera wrote:

On 2020-Aug-30, Tomas Vondra wrote:

On Sat, Aug 29, 2020 at 06:43:47PM -0400, Alvaro Herrera wrote:

On 2020-Aug-29, Tomas Vondra wrote:

Also, it might be useful to show the size of the statistics built, just
like we show for \d+ etc.

\dX+ I suppose?

Right. I've only used \d+ as an example of an existing command showing
sizes of the objects.

Yeah, I understood it that way too.

How can you measure the size of the stat objects in a query? Are you
thinking in pg_column_size()?

Either that or simply length() on the bytea value.

I wonder how to report that. Knowing that psql \-commands are not meant
for anything other than human consumption, maybe we can use a format()
string that says "built: %d bytes" when \dX+ is used (for each stat type),
and just "built" when \dX is used. What do people think about this?

I'd use the same approach as \d+, i.e. a separate column with the size.
Maybe that'd mean too many columns, though.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#17Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tomas Vondra (#16)
Re: list of extended statistics on psql

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:

On Sun, Aug 30, 2020 at 12:33:29PM -0400, Alvaro Herrera wrote:

I wonder how to report that. Knowing that psql \-commands are not meant
for anything other than human consumption, maybe we can use a format()
string that says "built: %d bytes" when \dX+ is used (for each stat type),
and just "built" when \dX is used. What do people think about this?

Seems a little too cute to me.

I'd use the same approach as \d+, i.e. a separate column with the size.
Maybe that'd mean too many columns, though.

psql already has \d commands with so many columns that you pretty much
have to use \x mode to make them legible; \df+ for instance. I don't
mind if \dX+ is also in that territory. It'd be good though if plain
\dX can fit in a normal terminal window.

regards, tom lane

#18Tatsuro Yamada
tatsuro.yamada.tf@nttcom.co.jp
In reply to: Alvaro Herrera (#10)
Re: list of extended statistics on psql

Hi Alvaro,

IMO the per-type columns should show both the type being enabled as
well as it being built.

Hmm. I'm not sure how to get the status (enabled or disabled) of
extended stats. :(
Could you explain it more?

pg_statistic_ext_data.stxdndistinct is not null if the stats have been
built. (I'm not sure whether there's an easier way to determine this.)

Ah.. I see! Thank you.

I suggest to do this

Name | Schema | Definition | Ndistinct | Dependencies | MCV
-----------+--------+--------------------------+-----------+--------------+-----
stts_1 | public | (a, b) FROM t1 | f | t | f

I suppose that the current column order is sufficient if there is
no improvement of extended stats on PG14. Do you know any plan to
improve extended stats such as to allow it to cross multiple tables on PG14?

I suggest that changing it in the future is going to be an uphill
battle, so better get it right from the get go, without requiring a
future restructure.

I understand your suggestions. I'll replace "Columns" and "Table" columns with "Definition" column.

Currently, I use this query to get Extended stats info from pg_statistic_ext.

Maybe something like this would do

SELECT
stxnamespace::pg_catalog.regnamespace AS "Schema",
stxname AS "Name",
format('%s FROM %s',
(SELECT pg_catalog.string_agg(pg_catalog.quote_ident(attname),', ')
FROM pg_catalog.unnest(stxkeys) s(attnum)
JOIN pg_catalog.pg_attribute a ON (stxrelid = a.attrelid AND
a.attnum = s.attnum AND NOT attisdropped)),
stxrelid::regclass) AS "Definition",
CASE WHEN stxdndistinct IS NOT NULL THEN 'built' WHEN 'd' = any(stxkind) THEN 'enabled, not built' END AS "n-distinct",
CASE WHEN stxddependencies IS NOT NULL THEN 'built' WHEN 'f' = any(stxkind) THEN 'enabled, not built' END AS "functional dependencies",
CASE WHEN stxdmcv IS NOT NULL THEN 'built' WHEN 'm' = any(stxkind) THEN 'enabled, not built' END AS mcv
FROM pg_catalog.pg_statistic_ext es
INNER JOIN pg_catalog.pg_class c
ON stxrelid = c.oid
LEFT JOIN pg_catalog.pg_statistic_ext_data esd ON es.oid = esd.stxoid
ORDER BY 1, 2, 3;

Great! It helped me a lot to understand your suggestions correctly. Thanks. :-D
I got the below results by your query.

========
create table t1 (a int, b int);
create statistics stts_1 (dependencies) on a, b from t1;
create statistics stts_2 (dependencies, ndistinct) on a, b from t1;
create statistics stts_3 (dependencies, ndistinct, mcv) on a, b from t1;
create table t2 (a int, b int, c int);
create statistics stts_4 on b, c from t2;
create table hoge (col1 int, col2 int, col3 int);
create statistics stts_hoge on col1, col2, col3 from hoge;

insert into t1 select i,i from generate_series(1,100) i;
analyze t1;

Your query gave this result:

Schema | Name | Definition | n-distinct | functional dependencies | mcv
--------+-----------+----------------------------+--------------------+-------------------------+--------------------
public | stts_1 | a, b FROM t1 | | built |
public | stts_2 | a, b FROM t1 | built | built |
public | stts_3 | a, b FROM t1 | built | built | built
public | stts_4 | b, c FROM t2 | enabled, not built | enabled, not built | enabled, not built
public | stts_hoge | col1, col2, col3 FROM hoge | enabled, not built | enabled, not built | enabled, not built
(5 rows)
========

I guess "enabled, not built" is a little redundant. The status would better to
have three patterns: "built", "not built" or nothing (NULL) like these:

- "built": extended stats is defined and built (collected by analyze cmd)
- "not built": extended stats is defined but have not built yet
- nothing (NULL): extended stats is not defined

What do you think about it?

I will send a new patch including :

- Replace "Columns" and "Table" column with "Definition"
- Show the status (built/not built/null) of extended stats by using
pg_statistic_ext_data

Thanks,
Tatsuro Yamada

#19Tatsuro Yamada
tatsuro.yamada.tf@nttcom.co.jp
In reply to: Tom Lane (#17)
Re: list of extended statistics on psql

On 2020/08/31 1:59, Tom Lane wrote:

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:

On Sun, Aug 30, 2020 at 12:33:29PM -0400, Alvaro Herrera wrote:

I wonder how to report that. Knowing that psql \-commands are not meant
for anything other than human consumption, maybe we can use a format()
string that says "built: %d bytes" when \dX+ is used (for each stat type),
and just "built" when \dX is used. What do people think about this?

Seems a little too cute to me.

I'd use the same approach as \d+, i.e. a separate column with the size.
Maybe that'd mean too many columns, though.

psql already has \d commands with so many columns that you pretty much
have to use \x mode to make them legible; \df+ for instance. I don't
mind if \dX+ is also in that territory. It'd be good though if plain
\dX can fit in a normal terminal window.

Hmm. How about these instead of "built: %d bytes"?
I added three columns (N_size, D_size, M_size) to show size. See below:

===================
postgres=# \dX
List of extended statistics
Schema | Name | Definition | N_distinct | Dependencies | Mcv
--------+-----------+----------------------------+------------+--------------+-----------
public | stts_1 | a, b FROM t1 | | built |
public | stts_2 | a, b FROM t1 | built | built |
public | stts_3 | a, b FROM t1 | built | built | built
public | stts_4 | b, c FROM t2 | not built | not built | not built
public | stts_hoge | col1, col2, col3 FROM hoge | not built | not built | not built
(5 rows)

postgres=# \dX+
List of extended statistics
Schema | Name | Definition | N_distinct | Dependencies | Mcv | N_size | D_size | M_size
--------+-----------+----------------------------+------------+--------------+-----------+--------+--------+--------
public | stts_1 | a, b FROM t1 | | built | | | 40 |
public | stts_2 | a, b FROM t1 | built | built | | 13 | 40 |
public | stts_3 | a, b FROM t1 | built | built | built | 13 | 40 | 6126
public | stts_4 | b, c FROM t2 | not built | not built | not built | | |
public | stts_hoge | col1, col2, col3 FROM hoge | not built | not built | not built | | |
===================

I used this query to get results of "\dX+".
===================
SELECT
stxnamespace::pg_catalog.regnamespace AS "Schema",
stxname AS "Name",
format('%s FROM %s',
(SELECT pg_catalog.string_agg(pg_catalog.quote_ident(attname),', ')
FROM pg_catalog.unnest(stxkeys) s(attnum)
JOIN pg_catalog.pg_attribute a
ON (stxrelid = a.attrelid
AND a.attnum = s.attnum
AND NOT attisdropped)),
stxrelid::regclass) AS "Definition",
CASE WHEN esd.stxdndistinct IS NOT NULL THEN 'built'
WHEN 'd' = any(stxkind) THEN 'not built'
END AS "N_distinct",
CASE WHEN esd.stxddependencies IS NOT NULL THEN 'built'
WHEN 'f' = any(stxkind) THEN 'not built'
END AS "Dependencies",
CASE WHEN esd.stxdmcv IS NOT NULL THEN 'built'
WHEN 'm' = any(stxkind) THEN 'not built'
END AS "Mcv",
pg_catalog.length(stxdndistinct) AS "N_size",
pg_catalog.length(stxddependencies) AS "D_size",
pg_catalog.length(stxdmcv) AS "M_size"
FROM pg_catalog.pg_statistic_ext es
INNER JOIN pg_catalog.pg_class c
ON stxrelid = c.oid
LEFT JOIN pg_catalog.pg_statistic_ext_data esd
ON es.oid = esd.stxoid
ORDER BY 1, 2;
===================

Attached patch includes:

- Replace "Columns" and "Table" column with "Definition"
- Show the status (built/not built/null) of extended stats by
using pg_statistic_ext_data
- Add "\dX+" command to show size of extended stats

Please find the attached file! :-D

Thanks,
Tatsuro Yamada

Attachments:

add_list_extended_stats_for_psql_by_dX_and_dXplus_r3.patchtext/plain; charset=UTF-8; name=add_list_extended_stats_for_psql_by_dX_and_dXplus_r3.patchDownload+204-0
#20Justin Pryzby
pryzby@telsasoft.com
In reply to: Alvaro Herrera (#8)
Re: list of extended statistics on psql

On Thu, Aug 27, 2020 at 07:53:23PM -0400, Alvaro Herrera wrote:

+1 for the general idea, and +1 for \dX being the syntax to use

IMO the per-type columns should show both the type being enabled as
well as it being built.

(How many more stat types do we expect -- Tomas? I wonder if having one
column per type is going to scale in the long run.)

Also, the stat obj name column should be first, followed by a single
column listing both table and columns that it applies to. Keep in mind
that in the future we might want to add stats that cross multiple tables
-- that's why the CREATE syntax is the way it is. So we should give
room for that in psql's display too.

There's also a plan for CREATE STATISTICS to support expresion statistics, with
the statistics functionality of an expression index, but without the cost of
index-update on UPDATE/DELETE. That's Tomas' patch here:
https://commitfest.postgresql.org/29/2421/

I think that would compute ndistinct and MCV, same as indexes, but not
dependencies. To me, I think it's better if there's a single column showing
the "kinds" of statistics to be generated (stxkind), rather than a column for
each.

I'm not sure why the length of the stats lists cast as text is useful to show?
We don't have a slash-dee command to show the number of MCV or histogram in
traditional, 1-D stats in pg_statistic, right ? I think anybody wanting that
would learn to SELECT FROM pg_statistic*. Also, the length of the text output
isn't very meaningful ? If this is json, maybe you'd do something like this:
|SELECT a.stxdndistinct , COUNT(b) FROM pg_statistic_ext_data a , json_each(stxdndistinct::Json) AS b GROUP BY 1

I guess stxdmcv isn't json, but it seems especially meaningless to show
length() of its ::text, since we don't even "deserialize" the object to begin
with.

BTW, I've just started a new thread about displaying in psql \d the stats
target of target extended stats.

--
Justin

#21Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tomas Vondra (#16)
#22Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#21)
#23Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tom Lane (#22)
#24Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Alvaro Herrera (#21)
#25Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tomas Vondra (#24)
#26Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Alvaro Herrera (#25)
#27Pavel Stehule
pavel.stehule@gmail.com
In reply to: Tomas Vondra (#26)
#28Tatsuro Yamada
tatsuro.yamada.tf@nttcom.co.jp
In reply to: Pavel Stehule (#27)
#29Michael Paquier
michael@paquier.xyz
In reply to: Tatsuro Yamada (#28)
#30Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#29)
#31Tatsuro Yamada
tatsuro.yamada.tf@nttcom.co.jp
In reply to: Michael Paquier (#30)
#32Tatsuro Yamada
tatsuro.yamada.tf@nttcom.co.jp
In reply to: Michael Paquier (#30)
#33Tatsuro Yamada
tatsuro.yamada.tf@nttcom.co.jp
In reply to: Tatsuro Yamada (#32)
#34Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tatsuro Yamada (#32)
#35Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tatsuro Yamada (#33)
#36Tatsuro Yamada
tatsuro.yamada.tf@nttcom.co.jp
In reply to: Tomas Vondra (#35)
#37Tatsuro Yamada
tatsuro.yamada.tf@nttcom.co.jp
In reply to: Tomas Vondra (#34)
#38Tatsuro Yamada
tatsuro.yamada.tf@nttcom.co.jp
In reply to: Tatsuro Yamada (#37)
#39Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tatsuro Yamada (#38)
#40Tatsuro Yamada
tatsuro.yamada.tf@nttcom.co.jp
In reply to: Tomas Vondra (#39)
#41Tatsuro Yamada
tatsuro.yamada.tf@nttcom.co.jp
In reply to: Tatsuro Yamada (#40)
#42Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tatsuro Yamada (#41)
#43Tatsuro Yamada
tatsuro.yamada.tf@nttcom.co.jp
In reply to: Tomas Vondra (#42)
#44Tatsuro Yamada
tatsuro.yamada.tf@nttcom.co.jp
In reply to: Tatsuro Yamada (#43)
#45Tatsuro Yamada
tatsuro.yamada.tf@nttcom.co.jp
In reply to: Tatsuro Yamada (#44)
#46Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tatsuro Yamada (#45)
#47Tatsuro Yamada
tatsuro.yamada.tf@nttcom.co.jp
In reply to: Tomas Vondra (#46)
#48Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tatsuro Yamada (#47)
#49Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tomas Vondra (#48)
#50Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Alvaro Herrera (#49)
#51Tatsuro Yamada
tatsuro.yamada.tf@nttcom.co.jp
In reply to: Tomas Vondra (#50)
#52Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tatsuro Yamada (#51)
#53Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tomas Vondra (#52)
#54Tatsuro Yamada
tatsuro.yamada.tf@nttcom.co.jp
In reply to: Tomas Vondra (#53)
#55Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tatsuro Yamada (#54)
#56Tatsuro Yamada
tatsuro.yamada.tf@nttcom.co.jp
In reply to: Tomas Vondra (#55)
#57Tatsuro Yamada
tatsuro.yamada.tf@nttcom.co.jp
In reply to: Tatsuro Yamada (#56)
#58Julien Rouhaud
rjuju123@gmail.com
In reply to: Tatsuro Yamada (#57)
#59Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Julien Rouhaud (#58)
#60Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tomas Vondra (#59)
#61Shinoda, Noriyoshi (PN Japan FSIP)
noriyoshi.shinoda@hpe.com
In reply to: Tomas Vondra (#60)
#62Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Shinoda, Noriyoshi (PN Japan FSIP) (#61)
#63Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tomas Vondra (#62)
#64Justin Pryzby
pryzby@telsasoft.com
In reply to: Tomas Vondra (#63)
#65Tatsuro Yamada
tatsuro.yamada.tf@nttcom.co.jp
In reply to: Julien Rouhaud (#58)
#66Tatsuro Yamada
tatsuro.yamada.tf@nttcom.co.jp
In reply to: Tomas Vondra (#60)
#67Tatsuro Yamada
tatsuro.yamada.tf@nttcom.co.jp
In reply to: Tomas Vondra (#63)
#68Tatsuro Yamada
tatsuro.yamada.tf@nttcom.co.jp
In reply to: Justin Pryzby (#64)
#69Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tatsuro Yamada (#67)
#70Tatsuro Yamada
tatsuro.yamada.tf@nttcom.co.jp
In reply to: Tomas Vondra (#69)
#71Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tatsuro Yamada (#70)
#72Tatsuro Yamada
tatsuro.yamada.tf@nttcom.co.jp
In reply to: Tatsuro Yamada (#70)
#73Tatsuro Yamada
tatsuro.yamada.tf@nttcom.co.jp
In reply to: Tomas Vondra (#71)
#74Tatsuro Yamada
tatsuro.yamada.tf@nttcom.co.jp
In reply to: Tatsuro Yamada (#73)
#75Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tatsuro Yamada (#74)
#76Tatsuro Yamada
tatsuro.yamada.tf@nttcom.co.jp
In reply to: Tomas Vondra (#75)
#77Justin Pryzby
pryzby@telsasoft.com
In reply to: Tomas Vondra (#75)
#78Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Justin Pryzby (#77)
#79Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tomas Vondra (#78)
#80Tatsuro Yamada
tatsuro.yamada.tf@nttcom.co.jp
In reply to: Tomas Vondra (#79)
#81Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tatsuro Yamada (#80)
#82Tatsuro Yamada
tatsuro.yamada.tf@nttcom.co.jp
In reply to: Tomas Vondra (#81)