list of extended statistics on psql
Hi!
I created a POC patch that allows showing a list of extended statistics by
"\dz" command on psql. I believe this feature helps DBA and users who
would like to know all extended statistics easily. :-D
I have not a strong opinion to assign "\dz". I prefer "\dx" or "\de*"
than "\dz" but they were already assigned. Therefore I used "\dz"
instead of them.
Please find the attached patch.
Any comments are welcome!
For Example:
=======================
CREATE TABLE t1 (a INT, b INT);
CREATE STATISTICS stts1 (dependencies) ON a, b FROM t1;
CREATE STATISTICS stts2 (dependencies, ndistinct) ON a, b FROM t1;
CREATE STATISTICS stts3 (dependencies, ndistinct, mcv) ON a, b FROM t1;
ANALYZE t1;
CREATE TABLE t2 (a INT, b INT, c INT);
CREATE STATISTICS stts4 ON b, c FROM t2;
ANALYZE t2;
postgres=# \dz
List of extended statistics
Schema | Table | Name | Columns | Ndistinct | Dependencies | MCV
--------+-------+-------+---------+-----------+--------------+-----
public | t1 | stts1 | a, b | f | t | f
public | t1 | stts2 | a, b | t | t | f
public | t1 | stts3 | a, b | t | t | t
public | t2 | stts4 | b, c | t | t | t
(4 rows)
postgres=# \?
...
\dy [PATTERN] list event triggers
\dz [PATTERN] list extended statistics
\l[+] [PATTERN] list databases
...
=======================
For now, I haven't written a document and regression test for that.
I'll create it later.
Thanks,
Tatsuro Yamada
Attachments:
add_list_extended_stats_for_psql_poc1.patchtext/plain; charset=UTF-8; name=add_list_extended_stats_for_psql_poc1.patchDownload+68-0
po 24. 8. 2020 v 5:23 odesílatel Tatsuro Yamada <
tatsuro.yamada.tf@nttcom.co.jp> napsal:
Hi!
I created a POC patch that allows showing a list of extended statistics by
"\dz" command on psql. I believe this feature helps DBA and users who
would like to know all extended statistics easily. :-DI have not a strong opinion to assign "\dz". I prefer "\dx" or "\de*"
than "\dz" but they were already assigned. Therefore I used "\dz"
instead of them.Please find the attached patch.
Any comments are welcome!For Example:
=======================
CREATE TABLE t1 (a INT, b INT);
CREATE STATISTICS stts1 (dependencies) ON a, b FROM t1;
CREATE STATISTICS stts2 (dependencies, ndistinct) ON a, b FROM t1;
CREATE STATISTICS stts3 (dependencies, ndistinct, mcv) ON a, b FROM t1;
ANALYZE t1;CREATE TABLE t2 (a INT, b INT, c INT);
CREATE STATISTICS stts4 ON b, c FROM t2;
ANALYZE t2;postgres=# \dz
List of extended statistics
Schema | Table | Name | Columns | Ndistinct | Dependencies | MCV
--------+-------+-------+---------+-----------+--------------+-----
public | t1 | stts1 | a, b | f | t | f
public | t1 | stts2 | a, b | t | t | f
public | t1 | stts3 | a, b | t | t | t
public | t2 | stts4 | b, c | t | t | t
(4 rows)postgres=# \?
...
\dy [PATTERN] list event triggers
\dz [PATTERN] list extended statistics
\l[+] [PATTERN] list databases
...
=======================For now, I haven't written a document and regression test for that.
I'll create it later.
+1 good idea
Pavel
Show quoted text
Thanks,
Tatsuro Yamada
On Mon, Aug 24, 2020 at 6:13 AM Pavel Stehule <pavel.stehule@gmail.com> wrote:
po 24. 8. 2020 v 5:23 odesílatel Tatsuro Yamada <tatsuro.yamada.tf@nttcom.co.jp> napsal:
Hi!
I created a POC patch that allows showing a list of extended statistics by
"\dz" command on psql. I believe this feature helps DBA and users who
would like to know all extended statistics easily. :-DI have not a strong opinion to assign "\dz". I prefer "\dx" or "\de*"
than "\dz" but they were already assigned. Therefore I used "\dz"
instead of them.Please find the attached patch.
Any comments are welcome!For Example:
=======================
CREATE TABLE t1 (a INT, b INT);
CREATE STATISTICS stts1 (dependencies) ON a, b FROM t1;
CREATE STATISTICS stts2 (dependencies, ndistinct) ON a, b FROM t1;
CREATE STATISTICS stts3 (dependencies, ndistinct, mcv) ON a, b FROM t1;
ANALYZE t1;CREATE TABLE t2 (a INT, b INT, c INT);
CREATE STATISTICS stts4 ON b, c FROM t2;
ANALYZE t2;postgres=# \dz
List of extended statistics
Schema | Table | Name | Columns | Ndistinct | Dependencies | MCV
--------+-------+-------+---------+-----------+--------------+-----
public | t1 | stts1 | a, b | f | t | f
public | t1 | stts2 | a, b | t | t | f
public | t1 | stts3 | a, b | t | t | t
public | t2 | stts4 | b, c | t | t | t
(4 rows)postgres=# \?
...
\dy [PATTERN] list event triggers
\dz [PATTERN] list extended statistics
\l[+] [PATTERN] list databases
...
=======================For now, I haven't written a document and regression test for that.
I'll create it later.+1 good idea
+1 that's a good idea. Please add it to the next commitfest!
You have a typo:
+ if (pset.sversion < 10000)
+ {
+ char sverbuf[32];
+
+ pg_log_error("The server (version %s) does not support
extended statistics.",
+ formatPGVersionNumber(pset.sversion, false,
+ sverbuf, sizeof(sverbuf)));
+ return true;
+ }
the version test is missing a 0, the feature looks otherwise ok.
How about using \dX rather than \dz?
Hi!
+1 good idea
+1 that's a good idea. Please add it to the next commitfest!
Thanks!
You have a typo:
+ if (pset.sversion < 10000) + { + char sverbuf[32]; + + pg_log_error("The server (version %s) does not support extended statistics.", + formatPGVersionNumber(pset.sversion, false, + sverbuf, sizeof(sverbuf))); + return true; + }the version test is missing a 0, the feature looks otherwise ok.
Ouch, I fixed on the attached patch.
The new patch includes:
- Fix the version number check (10000 -> 100000)
- Fix query to get extended stats info for sort order
- Add handling [Pattern] e.g \dz stts*
- Add document and regression test for \dz
How about using \dX rather than \dz?
Thanks for your suggestion!
I'll replace it if I got consensus. :-D
Thanks,
Tatsuro Yamada
Attachments:
add_list_extended_stats_for_psql_poc2.patchtext/plain; charset=UTF-8; name=add_list_extended_stats_for_psql_poc2.patchDownload+142-0
Hi Julien and Pavel!
How about using \dX rather than \dz?
Thanks for your suggestion!
I'll replace it if I got consensus. :-D
How about using \dX rather than \dz?
Thanks for your suggestion!
I'll replace it if I got consensus. :-D
I re-read a help message of \d* commands and realized it's better to
use "\dX".
There are already cases where the commands differ due to differences
in case, so I did the same way. Please find attached patch. :-D
For example:
==========
\da[S] [PATTERN] list aggregates
\dA[+] [PATTERN] list access methods
==========
Attached patch uses "\dX" instead of "\dz":
==========
\dx[+] [PATTERN] list extensions
\dX [PATTERN] list extended statistics
==========
Results of regress test of the feature are the following:
==========
-- check printing info about extended statistics
create table t1 (a int, b int);
create statistics stts_1 (dependencies) on a, b from t1;
create statistics stts_2 (dependencies, ndistinct) on a, b from t1;
create statistics stts_3 (dependencies, ndistinct, mcv) on a, b from t1;
create table t2 (a int, b int, c int);
create statistics stts_4 on b, c from t2;
create table hoge (col1 int, col2 int, col3 int);
create statistics stts_hoge on col1, col2, col3 from hoge;
\dX
List of extended statistics
Schema | Table | Name | Columns | Ndistinct | Dependencies | MCV
--------+-------+-----------+------------------+-----------+--------------+-----
public | hoge | stts_hoge | col1, col2, col3 | t | t | t
public | t1 | stts_1 | a, b | f | t | f
public | t1 | stts_2 | a, b | t | t | f
public | t1 | stts_3 | a, b | t | t | t
public | t2 | stts_4 | b, c | t | t | t
(5 rows)
\dX stts_?
List of extended statistics
Schema | Table | Name | Columns | Ndistinct | Dependencies | MCV
--------+-------+--------+---------+-----------+--------------+-----
public | t1 | stts_1 | a, b | f | t | f
public | t1 | stts_2 | a, b | t | t | f
public | t1 | stts_3 | a, b | t | t | t
public | t2 | stts_4 | b, c | t | t | t
(4 rows)
\dX *hoge
List of extended statistics
Schema | Table | Name | Columns | Ndistinct | Dependencies | MCV
--------+-------+-----------+------------------+-----------+--------------+-----
public | hoge | stts_hoge | col1, col2, col3 | t | t | t
(1 row)
==========
Thanks,
Tatsuro Yamada
Attachments:
add_list_extended_stats_for_psql_by_dX_command.patchtext/plain; charset=UTF-8; name=add_list_extended_stats_for_psql_by_dX_command.patchDownload+142-0
Hi Yamada-san,
On Thu, Aug 27, 2020 at 03:13:09PM +0900, Tatsuro Yamada wrote:
I re-read a help message of \d* commands and realized it's better to
use "\dX".
There are already cases where the commands differ due to differences
in case, so I did the same way. Please find attached patch. :-D
For example:
==========
\da[S] [PATTERN] list aggregates
\dA[+] [PATTERN] list access methods
==========Attached patch uses "\dX" instead of "\dz":
==========
\dx[+] [PATTERN] list extensions
\dX [PATTERN] list extended statistics
==========
Thanks for updating the patch! This alias will probably be easier to remember.
Results of regress test of the feature are the following:
==========
-- check printing info about extended statistics
create table t1 (a int, b int);
create statistics stts_1 (dependencies) on a, b from t1;
create statistics stts_2 (dependencies, ndistinct) on a, b from t1;
create statistics stts_3 (dependencies, ndistinct, mcv) on a, b from t1;
create table t2 (a int, b int, c int);
create statistics stts_4 on b, c from t2;
create table hoge (col1 int, col2 int, col3 int);
create statistics stts_hoge on col1, col2, col3 from hoge;\dX
List of extended statistics
Schema | Table | Name | Columns | Ndistinct | Dependencies | MCV
--------+-------+-----------+------------------+-----------+--------------+-----
public | hoge | stts_hoge | col1, col2, col3 | t | t | t
public | t1 | stts_1 | a, b | f | t | f
public | t1 | stts_2 | a, b | t | t | f
public | t1 | stts_3 | a, b | t | t | t
public | t2 | stts_4 | b, c | t | t | t
(5 rows)\dX stts_?
List of extended statistics
Schema | Table | Name | Columns | Ndistinct | Dependencies | MCV
--------+-------+--------+---------+-----------+--------------+-----
public | t1 | stts_1 | a, b | f | t | f
public | t1 | stts_2 | a, b | t | t | f
public | t1 | stts_3 | a, b | t | t | t
public | t2 | stts_4 | b, c | t | t | t
(4 rows)\dX *hoge
List of extended statistics
Schema | Table | Name | Columns | Ndistinct | Dependencies | MCV
--------+-------+-----------+------------------+-----------+--------------+-----
public | hoge | stts_hoge | col1, col2, col3 | t | t | t
(1 row)
==========
Thanks also for the documentation and regression tests. This overall looks
good, I just have a two comments:
- there's a whitespace issue in the documentation part:
add_list_extended_stats_for_psql_by_dX_command.patch:10: tab in indent.
<varlistentry>
warning: 1 line adds whitespace errors.
- You're sorting the output on schema, table, extended statistics and columns
but I think the last one isn't required since extended statistics names are
unique.
Hi Julien!
Thanks also for the documentation and regression tests. This overall looks
good, I just have a two comments:
Thank you for reviewing the patch! :-D
- there's a whitespace issue in the documentation part:
add_list_extended_stats_for_psql_by_dX_command.patch:10: tab in indent.
<varlistentry>
warning: 1 line adds whitespace errors.
Oops, I forgot to use "git diff --check". I fixed it.
- You're sorting the output on schema, table, extended statistics and columns
but I think the last one isn't required since extended statistics names are
unique.
You are right.
The sort key "columns" was not necessary so I removed it.
Attached new patch includes the above two fixes:
- Fix whitespace issue in the documentation part
- Remove unnecessary sort key from the query
(ORDER BY 1, 2, 3, 4 -> ORDER BY 1, 2, 3)
Thanks,
Tatsuro Yamada
Attachments:
add_list_extended_stats_for_psql_by_dX_command_r2.patchtext/plain; charset=UTF-8; name=add_list_extended_stats_for_psql_by_dX_command_r2.patchDownload+142-0
+1 for the general idea, and +1 for \dX being the syntax to use
IMO the per-type columns should show both the type being enabled as
well as it being built.
(How many more stat types do we expect -- Tomas? I wonder if having one
column per type is going to scale in the long run.)
Also, the stat obj name column should be first, followed by a single
column listing both table and columns that it applies to. Keep in mind
that in the future we might want to add stats that cross multiple tables
-- that's why the CREATE syntax is the way it is. So we should give
room for that in psql's display too.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi Alvaro!
It's been ages since we created a progress reporting feature together. :-D
+1 good idea
+1 that's a good idea. Please add it to the next commitfest!
+1 for the general idea, and +1 for \dX being the syntax to use
Thank you for voting!
IMO the per-type columns should show both the type being enabled as
well as it being built.
Hmm. I'm not sure how to get the status (enabled or disabled) of
extended stats. :(
Could you explain it more?
Also, the stat obj name column should be first, followed by a single
column listing both table and columns that it applies to. Keep in mind
that in the future we might want to add stats that cross multiple tables
-- that's why the CREATE syntax is the way it is. So we should give
room for that in psql's display too.
I understand your suggestions are the following, right?
* The Current column order:
===================
Schema | Table | Name | Columns | Ndistinct | Dependencies | MCV
--------+-------+--------+---------+-----------+--------------+-----
public | t1 | stts_1 | a, b | f | t | f
public | t1 | stts_2 | a, b | t | t | f
public | t1 | stts_3 | a, b | t | t | t
public | t2 | stts_4 | b, c | t | t | t
===================
* The suggested column order is like this:
===================
Name | Schema | Table | Columns | Ndistinct | Dependencies | MCV
-----------+--------+-------+------------------+-----------+--------------+-----
stts_1 | public | t1 | a, b | f | t | f
stts_2 | public | t1 | a, b | t | t | f
stts_3 | public | t1 | a, b | t | t | t
stts_4 | public | t2 | b, c | t | t | t
===================
* In the future, Extended stats that cross multiple tables will be
shown maybe... (t1, t2):
===================
Name | Schema | Table | Columns | Ndistinct | Dependencies | MCV
-----------+--------+--------+------------------+-----------+--------------+-----
stts_5 | public | t1, t2 | a, b | f | t | f
===================
If so, I can revise the column order as you suggested easily.
However, I have no idea how to show extended stats that cross
multiple tables and the status now.
I suppose that the current column order is sufficient if there is
no improvement of extended stats on PG14. Do you know any plan to
improve extended stats such as to allow it to cross multiple tables on PG14?
In addition,
Currently, I use this query to get Extended stats info from pg_statistic_ext.
SELECT
stxnamespace::pg_catalog.regnamespace AS "Schema",
c.relname AS "Table",
stxname AS "Name",
(SELECT pg_catalog.string_agg(pg_catalog.quote_ident(attname),', ')
FROM pg_catalog.unnest(stxkeys) s(attnum)
JOIN pg_catalog.pg_attribute a ON (stxrelid = a.attrelid AND
a.attnum = s.attnum AND NOT attisdropped)) AS "Columns",
'd' = any(stxkind) AS "Ndistinct",
'f' = any(stxkind) AS "Dependencies",
'm' = any(stxkind) AS "MCV"
FROM pg_catalog.pg_statistic_ext
INNER JOIN pg_catalog.pg_class c
ON stxrelid = c.oid
ORDER BY 1, 2, 3;
Thanks,
Tatsuro Yamada
On 2020-Aug-28, Tatsuro Yamada wrote:
IMO the per-type columns should show both the type being enabled as
well as it being built.Hmm. I'm not sure how to get the status (enabled or disabled) of
extended stats. :(
Could you explain it more?
pg_statistic_ext_data.stxdndistinct is not null if the stats have been
built. (I'm not sure whether there's an easier way to determine this.)
* The suggested column order is like this:
===================
Name | Schema | Table | Columns | Ndistinct | Dependencies | MCV
-----------+--------+-------+------------------+-----------+--------------+-----
stts_1 | public | t1 | a, b | f | t | f
stts_2 | public | t1 | a, b | t | t | f
stts_3 | public | t1 | a, b | t | t | t
stts_4 | public | t2 | b, c | t | t | t
===================
I suggest to do this
Name | Schema | Definition | Ndistinct | Dependencies | MCV
-----------+--------+--------------------------+-----------+--------------+-----
stts_1 | public | (a, b) FROM t1 | f | t | f
I suppose that the current column order is sufficient if there is
no improvement of extended stats on PG14. Do you know any plan to
improve extended stats such as to allow it to cross multiple tables on PG14?
I suggest that changing it in the future is going to be an uphill
battle, so better get it right from the get go, without requiring a
future restructure.
In addition,
Currently, I use this query to get Extended stats info from pg_statistic_ext.
Maybe something like this would do
SELECT
stxnamespace::pg_catalog.regnamespace AS "Schema",
stxname AS "Name",
format('%s FROM %s',
(SELECT pg_catalog.string_agg(pg_catalog.quote_ident(attname),', ')
FROM pg_catalog.unnest(stxkeys) s(attnum)
JOIN pg_catalog.pg_attribute a ON (stxrelid = a.attrelid AND
a.attnum = s.attnum AND NOT attisdropped)),
stxrelid::regclass) AS "Definition",
CASE WHEN stxdndistinct IS NOT NULL THEN 'built' WHEN 'd' = any(stxkind) THEN 'enabled, not built' END AS "n-distinct",
CASE WHEN stxddependencies IS NOT NULL THEN 'built' WHEN 'f' = any(stxkind) THEN 'enabled, not built' END AS "functional dependencies",
CASE WHEN stxdmcv IS NOT NULL THEN 'built' WHEN 'm' = any(stxkind) THEN 'enabled, not built' END AS mcv
FROM pg_catalog.pg_statistic_ext es
INNER JOIN pg_catalog.pg_class c
ON stxrelid = c.oid
LEFT JOIN pg_catalog.pg_statistic_ext_data esd ON es.oid = esd.stxoid
ORDER BY 1, 2, 3;
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Aug 27, 2020 at 07:53:23PM -0400, Alvaro Herrera wrote:
+1 for the general idea, and +1 for \dX being the syntax to use
IMO the per-type columns should show both the type being enabled as
well as it being built.(How many more stat types do we expect -- Tomas? I wonder if having one
column per type is going to scale in the long run.)
I wouldn't expect a huge number of types. I can imagine maybe twice the
current number of types, but not much more. But I'm not sure the output
is easy to read even now ...
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Aug 27, 2020 at 11:26:17PM -0400, Alvaro Herrera wrote:
On 2020-Aug-28, Tatsuro Yamada wrote:
IMO the per-type columns should show both the type being enabled as
well as it being built.Hmm. I'm not sure how to get the status (enabled or disabled) of
extended stats. :(
Could you explain it more?pg_statistic_ext_data.stxdndistinct is not null if the stats have been
built. (I'm not sure whether there's an easier way to determine this.)
It's the only way, I think. Which types were requested is stored in
pg_statistic_ext.stxkind
and what was built is in pg_statistic_ext_data. But if we want the
output to show both what was requested and which types were actually
built, that'll effectively double the number of columns needed :-(
Also, it might be useful to show the size of the statistics built, just
like we show for \d+ etc.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2020-Aug-29, Tomas Vondra wrote:
But if we want the
output to show both what was requested and which types were actually
built, that'll effectively double the number of columns needed :-(
I was thinking it would be one column per type showing either disabled or enabled
or built. But another idea is to show one type per line that's at least
enabled.
Also, it might be useful to show the size of the statistics built, just
like we show for \d+ etc.
\dX+ I suppose?
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sat, Aug 29, 2020 at 06:43:47PM -0400, Alvaro Herrera wrote:
On 2020-Aug-29, Tomas Vondra wrote:
But if we want the
output to show both what was requested and which types were actually
built, that'll effectively double the number of columns needed :-(I was thinking it would be one column per type showing either disabled or enabled
or built. But another idea is to show one type per line that's at least
enabled.Also, it might be useful to show the size of the statistics built, just
like we show for \d+ etc.\dX+ I suppose?
Right. I've only used \d+ as an example of an existing command showing
sizes of the objects.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2020-Aug-30, Tomas Vondra wrote:
On Sat, Aug 29, 2020 at 06:43:47PM -0400, Alvaro Herrera wrote:
On 2020-Aug-29, Tomas Vondra wrote:
Also, it might be useful to show the size of the statistics built, just
like we show for \d+ etc.\dX+ I suppose?
Right. I've only used \d+ as an example of an existing command showing
sizes of the objects.
Yeah, I understood it that way too.
How can you measure the size of the stat objects in a query? Are you
thinking in pg_column_size()?
I wonder how to report that. Knowing that psql \-commands are not meant
for anything other than human consumption, maybe we can use a format()
string that says "built: %d bytes" when \dX+ is used (for each stat type),
and just "built" when \dX is used. What do people think about this?
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sun, Aug 30, 2020 at 12:33:29PM -0400, Alvaro Herrera wrote:
On 2020-Aug-30, Tomas Vondra wrote:
On Sat, Aug 29, 2020 at 06:43:47PM -0400, Alvaro Herrera wrote:
On 2020-Aug-29, Tomas Vondra wrote:
Also, it might be useful to show the size of the statistics built, just
like we show for \d+ etc.\dX+ I suppose?
Right. I've only used \d+ as an example of an existing command showing
sizes of the objects.Yeah, I understood it that way too.
How can you measure the size of the stat objects in a query? Are you
thinking in pg_column_size()?
Either that or simply length() on the bytea value.
I wonder how to report that. Knowing that psql \-commands are not meant
for anything other than human consumption, maybe we can use a format()
string that says "built: %d bytes" when \dX+ is used (for each stat type),
and just "built" when \dX is used. What do people think about this?
I'd use the same approach as \d+, i.e. a separate column with the size.
Maybe that'd mean too many columns, though.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
On Sun, Aug 30, 2020 at 12:33:29PM -0400, Alvaro Herrera wrote:
I wonder how to report that. Knowing that psql \-commands are not meant
for anything other than human consumption, maybe we can use a format()
string that says "built: %d bytes" when \dX+ is used (for each stat type),
and just "built" when \dX is used. What do people think about this?
Seems a little too cute to me.
I'd use the same approach as \d+, i.e. a separate column with the size.
Maybe that'd mean too many columns, though.
psql already has \d commands with so many columns that you pretty much
have to use \x mode to make them legible; \df+ for instance. I don't
mind if \dX+ is also in that territory. It'd be good though if plain
\dX can fit in a normal terminal window.
regards, tom lane
Hi Alvaro,
IMO the per-type columns should show both the type being enabled as
well as it being built.Hmm. I'm not sure how to get the status (enabled or disabled) of
extended stats. :(
Could you explain it more?pg_statistic_ext_data.stxdndistinct is not null if the stats have been
built. (I'm not sure whether there's an easier way to determine this.)
Ah.. I see! Thank you.
I suggest to do this
Name | Schema | Definition | Ndistinct | Dependencies | MCV
-----------+--------+--------------------------+-----------+--------------+-----
stts_1 | public | (a, b) FROM t1 | f | t | fI suppose that the current column order is sufficient if there is
no improvement of extended stats on PG14. Do you know any plan to
improve extended stats such as to allow it to cross multiple tables on PG14?I suggest that changing it in the future is going to be an uphill
battle, so better get it right from the get go, without requiring a
future restructure.
I understand your suggestions. I'll replace "Columns" and "Table" columns with "Definition" column.
Currently, I use this query to get Extended stats info from pg_statistic_ext.
Maybe something like this would do
SELECT
stxnamespace::pg_catalog.regnamespace AS "Schema",
stxname AS "Name",
format('%s FROM %s',
(SELECT pg_catalog.string_agg(pg_catalog.quote_ident(attname),', ')
FROM pg_catalog.unnest(stxkeys) s(attnum)
JOIN pg_catalog.pg_attribute a ON (stxrelid = a.attrelid AND
a.attnum = s.attnum AND NOT attisdropped)),
stxrelid::regclass) AS "Definition",
CASE WHEN stxdndistinct IS NOT NULL THEN 'built' WHEN 'd' = any(stxkind) THEN 'enabled, not built' END AS "n-distinct",
CASE WHEN stxddependencies IS NOT NULL THEN 'built' WHEN 'f' = any(stxkind) THEN 'enabled, not built' END AS "functional dependencies",
CASE WHEN stxdmcv IS NOT NULL THEN 'built' WHEN 'm' = any(stxkind) THEN 'enabled, not built' END AS mcv
FROM pg_catalog.pg_statistic_ext es
INNER JOIN pg_catalog.pg_class c
ON stxrelid = c.oid
LEFT JOIN pg_catalog.pg_statistic_ext_data esd ON es.oid = esd.stxoid
ORDER BY 1, 2, 3;
Great! It helped me a lot to understand your suggestions correctly. Thanks. :-D
I got the below results by your query.
========
create table t1 (a int, b int);
create statistics stts_1 (dependencies) on a, b from t1;
create statistics stts_2 (dependencies, ndistinct) on a, b from t1;
create statistics stts_3 (dependencies, ndistinct, mcv) on a, b from t1;
create table t2 (a int, b int, c int);
create statistics stts_4 on b, c from t2;
create table hoge (col1 int, col2 int, col3 int);
create statistics stts_hoge on col1, col2, col3 from hoge;
insert into t1 select i,i from generate_series(1,100) i;
analyze t1;
Your query gave this result:
Schema | Name | Definition | n-distinct | functional dependencies | mcv
--------+-----------+----------------------------+--------------------+-------------------------+--------------------
public | stts_1 | a, b FROM t1 | | built |
public | stts_2 | a, b FROM t1 | built | built |
public | stts_3 | a, b FROM t1 | built | built | built
public | stts_4 | b, c FROM t2 | enabled, not built | enabled, not built | enabled, not built
public | stts_hoge | col1, col2, col3 FROM hoge | enabled, not built | enabled, not built | enabled, not built
(5 rows)
========
I guess "enabled, not built" is a little redundant. The status would better to
have three patterns: "built", "not built" or nothing (NULL) like these:
- "built": extended stats is defined and built (collected by analyze cmd)
- "not built": extended stats is defined but have not built yet
- nothing (NULL): extended stats is not defined
What do you think about it?
I will send a new patch including :
- Replace "Columns" and "Table" column with "Definition"
- Show the status (built/not built/null) of extended stats by using
pg_statistic_ext_data
Thanks,
Tatsuro Yamada
On 2020/08/31 1:59, Tom Lane wrote:
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
On Sun, Aug 30, 2020 at 12:33:29PM -0400, Alvaro Herrera wrote:
I wonder how to report that. Knowing that psql \-commands are not meant
for anything other than human consumption, maybe we can use a format()
string that says "built: %d bytes" when \dX+ is used (for each stat type),
and just "built" when \dX is used. What do people think about this?Seems a little too cute to me.
I'd use the same approach as \d+, i.e. a separate column with the size.
Maybe that'd mean too many columns, though.psql already has \d commands with so many columns that you pretty much
have to use \x mode to make them legible; \df+ for instance. I don't
mind if \dX+ is also in that territory. It'd be good though if plain
\dX can fit in a normal terminal window.
Hmm. How about these instead of "built: %d bytes"?
I added three columns (N_size, D_size, M_size) to show size. See below:
===================
postgres=# \dX
List of extended statistics
Schema | Name | Definition | N_distinct | Dependencies | Mcv
--------+-----------+----------------------------+------------+--------------+-----------
public | stts_1 | a, b FROM t1 | | built |
public | stts_2 | a, b FROM t1 | built | built |
public | stts_3 | a, b FROM t1 | built | built | built
public | stts_4 | b, c FROM t2 | not built | not built | not built
public | stts_hoge | col1, col2, col3 FROM hoge | not built | not built | not built
(5 rows)
postgres=# \dX+
List of extended statistics
Schema | Name | Definition | N_distinct | Dependencies | Mcv | N_size | D_size | M_size
--------+-----------+----------------------------+------------+--------------+-----------+--------+--------+--------
public | stts_1 | a, b FROM t1 | | built | | | 40 |
public | stts_2 | a, b FROM t1 | built | built | | 13 | 40 |
public | stts_3 | a, b FROM t1 | built | built | built | 13 | 40 | 6126
public | stts_4 | b, c FROM t2 | not built | not built | not built | | |
public | stts_hoge | col1, col2, col3 FROM hoge | not built | not built | not built | | |
===================
I used this query to get results of "\dX+".
===================
SELECT
stxnamespace::pg_catalog.regnamespace AS "Schema",
stxname AS "Name",
format('%s FROM %s',
(SELECT pg_catalog.string_agg(pg_catalog.quote_ident(attname),', ')
FROM pg_catalog.unnest(stxkeys) s(attnum)
JOIN pg_catalog.pg_attribute a
ON (stxrelid = a.attrelid
AND a.attnum = s.attnum
AND NOT attisdropped)),
stxrelid::regclass) AS "Definition",
CASE WHEN esd.stxdndistinct IS NOT NULL THEN 'built'
WHEN 'd' = any(stxkind) THEN 'not built'
END AS "N_distinct",
CASE WHEN esd.stxddependencies IS NOT NULL THEN 'built'
WHEN 'f' = any(stxkind) THEN 'not built'
END AS "Dependencies",
CASE WHEN esd.stxdmcv IS NOT NULL THEN 'built'
WHEN 'm' = any(stxkind) THEN 'not built'
END AS "Mcv",
pg_catalog.length(stxdndistinct) AS "N_size",
pg_catalog.length(stxddependencies) AS "D_size",
pg_catalog.length(stxdmcv) AS "M_size"
FROM pg_catalog.pg_statistic_ext es
INNER JOIN pg_catalog.pg_class c
ON stxrelid = c.oid
LEFT JOIN pg_catalog.pg_statistic_ext_data esd
ON es.oid = esd.stxoid
ORDER BY 1, 2;
===================
Attached patch includes:
- Replace "Columns" and "Table" column with "Definition"
- Show the status (built/not built/null) of extended stats by
using pg_statistic_ext_data
- Add "\dX+" command to show size of extended stats
Please find the attached file! :-D
Thanks,
Tatsuro Yamada
Attachments:
add_list_extended_stats_for_psql_by_dX_and_dXplus_r3.patchtext/plain; charset=UTF-8; name=add_list_extended_stats_for_psql_by_dX_and_dXplus_r3.patchDownload+204-0
On Thu, Aug 27, 2020 at 07:53:23PM -0400, Alvaro Herrera wrote:
+1 for the general idea, and +1 for \dX being the syntax to use
IMO the per-type columns should show both the type being enabled as
well as it being built.(How many more stat types do we expect -- Tomas? I wonder if having one
column per type is going to scale in the long run.)Also, the stat obj name column should be first, followed by a single
column listing both table and columns that it applies to. Keep in mind
that in the future we might want to add stats that cross multiple tables
-- that's why the CREATE syntax is the way it is. So we should give
room for that in psql's display too.
There's also a plan for CREATE STATISTICS to support expresion statistics, with
the statistics functionality of an expression index, but without the cost of
index-update on UPDATE/DELETE. That's Tomas' patch here:
https://commitfest.postgresql.org/29/2421/
I think that would compute ndistinct and MCV, same as indexes, but not
dependencies. To me, I think it's better if there's a single column showing
the "kinds" of statistics to be generated (stxkind), rather than a column for
each.
I'm not sure why the length of the stats lists cast as text is useful to show?
We don't have a slash-dee command to show the number of MCV or histogram in
traditional, 1-D stats in pg_statistic, right ? I think anybody wanting that
would learn to SELECT FROM pg_statistic*. Also, the length of the text output
isn't very meaningful ? If this is json, maybe you'd do something like this:
|SELECT a.stxdndistinct , COUNT(b) FROM pg_statistic_ext_data a , json_each(stxdndistinct::Json) AS b GROUP BY 1
I guess stxdmcv isn't json, but it seems especially meaningless to show
length() of its ::text, since we don't even "deserialize" the object to begin
with.
BTW, I've just started a new thread about displaying in psql \d the stats
target of target extended stats.
--
Justin