Most-common value docs in PG 12

Started by Bruce Momjianover 6 years ago11 messagesdocs
Jump to latest
#1Bruce Momjian
bruce@momjian.us

Our docs for most-common values in PG 12 has:

--> CREATE STATISTICS stts3 (mcv) ON state, city FROM zipcodes;

ANALYZE zipcodes;

SELECT m.* FROM pg_statistic_ext join pg_statistic_ext_data on (oid = stxoid),
pg_mcv_list_items(stxdmcv) m WHERE stxname = 'stts3';

index | values | nulls | frequency | base_frequency
-------+------------------------+-------+-----------+----------------
--> 0 | {Washington, DC} | {f,f} | 0.003467 | 2.7e-05
1 | {Apo, AE} | {f,f} | 0.003067 | 1.9e-05
2 | {Houston, TX} | {f,f} | 0.002167 | 0.000133
3 | {El Paso, TX} | {f,f} | 0.002 | 0.000113

It seems pg_mcv_list_items() reports the column names in the order they
appear in the table, not in the order they appear in the CREATE
STATISTICS statement. Same for psql \d:

\d zipcodes
Table "public.zipcodes"
Column | Type | Collation | Nullable | Default
---------+------+-----------+----------+---------
city | text | | |
state | text | | |
zipcode | text | | |
Statistics objects:
--> "public"."stts3" (mcv) ON city, state FROM zipcodes

If this is so, why don't we show the CREATE STATISTICS example as
city/state, and not state/city?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +
#2Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Bruce Momjian (#1)
Re: Most-common value docs in PG 12

On Wed, Aug 28, 2019 at 12:22:38PM -0400, Bruce Momjian wrote:

Our docs for most-common values in PG 12 has:

--> CREATE STATISTICS stts3 (mcv) ON state, city FROM zipcodes;

ANALYZE zipcodes;

SELECT m.* FROM pg_statistic_ext join pg_statistic_ext_data on (oid = stxoid),
pg_mcv_list_items(stxdmcv) m WHERE stxname = 'stts3';

index | values | nulls | frequency | base_frequency
-------+------------------------+-------+-----------+----------------
--> 0 | {Washington, DC} | {f,f} | 0.003467 | 2.7e-05
1 | {Apo, AE} | {f,f} | 0.003067 | 1.9e-05
2 | {Houston, TX} | {f,f} | 0.002167 | 0.000133
3 | {El Paso, TX} | {f,f} | 0.002 | 0.000113

It seems pg_mcv_list_items() reports the column names in the order they
appear in the table, not in the order they appear in the CREATE
STATISTICS statement. Same for psql \d:

\d zipcodes
Table "public.zipcodes"
Column | Type | Collation | Nullable | Default
---------+------+-----------+----------+---------
city | text | | |
state | text | | |
zipcode | text | | |
Statistics objects:
--> "public"."stts3" (mcv) ON city, state FROM zipcodes

If this is so, why don't we show the CREATE STATISTICS example as
city/state, and not state/city?

Yes, we deduplicate the attributes and store them sorted by attnum. I'm
not sure it makes sense to change the example to match this order, which
is mostly an implementation detail, though. It might be better to point
out the order may not exactly match CREATE STATISTICS, and point users to
what e.g. "\d" shows (because that will show the order as stored in the
system catalog).

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#3Bruce Momjian
bruce@momjian.us
In reply to: Tomas Vondra (#2)
Re: Most-common value docs in PG 12

On Wed, Aug 28, 2019 at 08:25:41PM +0200, Tomas Vondra wrote:

On Wed, Aug 28, 2019 at 12:22:38PM -0400, Bruce Momjian wrote:

If this is so, why don't we show the CREATE STATISTICS example as
city/state, and not state/city?

Yes, we deduplicate the attributes and store them sorted by attnum. I'm
not sure it makes sense to change the example to match this order, which
is mostly an implementation detail, though. It might be better to point
out the order may not exactly match CREATE STATISTICS, and point users to
what e.g. "\d" shows (because that will show the order as stored in the
system catalog).

OK, how is this patch? I didn't mention psql since I think everyone
expects psql to show all information about tables and indexes.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +

Attachments:

mcv.difftext/x-diff; charset=us-asciiDownload+5-4
#4Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Bruce Momjian (#3)
Re: Most-common value docs in PG 12

On 2019-Aug-30, Bruce Momjian wrote:

OK, how is this patch? I didn't mention psql since I think everyone
expects psql to show all information about tables and indexes.

Why would you change perform.sgml? It seems unnecessary; the commands
shown work fine.

This part seems okay:

diff --git a/doc/src/sgml/ref/create_statistics.sgml b/doc/src/sgml/ref/create_statistics.sgml
index ec23a154d6..5b583aacb4 100644
--- a/doc/src/sgml/ref/create_statistics.sgml
+++ b/doc/src/sgml/ref/create_statistics.sgml
@@ -98,7 +98,8 @@ CREATE STATISTICS [ IF NOT EXISTS ] <replaceable class="parameter">statistics_na
<listitem>
<para>
The name of a table column to be covered by the computed statistics.
-      At least two column names must be given.
+      At least two column names must be given;  the order of the column names
+      is insignificant.
</para>
</listitem>
</varlistentry>

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#5Bruce Momjian
bruce@momjian.us
In reply to: Alvaro Herrera (#4)
Re: Most-common value docs in PG 12

On Wed, Sep 11, 2019 at 06:15:22PM -0300, Alvaro Herrera wrote:

On 2019-Aug-30, Bruce Momjian wrote:

OK, how is this patch? I didn't mention psql since I think everyone
expects psql to show all information about tables and indexes.

Why would you change perform.sgml? It seems unnecessary; the commands
shown work fine.

I realize they work fine, but the ordering in the examples not matching
the defined order suggests that ordering matters, but it does not.

---------------------------------------------------------------------------

This part seems okay:

diff --git a/doc/src/sgml/ref/create_statistics.sgml b/doc/src/sgml/ref/create_statistics.sgml
index ec23a154d6..5b583aacb4 100644
--- a/doc/src/sgml/ref/create_statistics.sgml
+++ b/doc/src/sgml/ref/create_statistics.sgml
@@ -98,7 +98,8 @@ CREATE STATISTICS [ IF NOT EXISTS ] <replaceable class="parameter">statistics_na
<listitem>
<para>
The name of a table column to be covered by the computed statistics.
-      At least two column names must be given.
+      At least two column names must be given;  the order of the column names
+      is insignificant.
</para>
</listitem>
</varlistentry>

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +
#6Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Bruce Momjian (#5)
Re: Most-common value docs in PG 12

On 2019-Sep-26, Bruce Momjian wrote:

On Wed, Sep 11, 2019 at 06:15:22PM -0300, Alvaro Herrera wrote:

On 2019-Aug-30, Bruce Momjian wrote:

OK, how is this patch? I didn't mention psql since I think everyone
expects psql to show all information about tables and indexes.

Why would you change perform.sgml? It seems unnecessary; the commands
shown work fine.

I realize they work fine, but the ordering in the examples not matching
the defined order suggests that ordering matters, but it does not.

Well, I mean exactly the other way around: the fact that the orders
don't match illustrates that the order is not important. And that is
reinforced by the explanation indicating explicitly that it does not
matter:

<para>
The name of a table column to be covered by the computed statistics.
-      At least two column names must be given.
+      At least two column names must be given;  the order of the column names
+      is insignificant.
</para>

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#7Bruce Momjian
bruce@momjian.us
In reply to: Alvaro Herrera (#6)
Re: Most-common value docs in PG 12

On Thu, Sep 26, 2019 at 05:17:55PM -0300, Alvaro Herrera wrote:

On 2019-Sep-26, Bruce Momjian wrote:

On Wed, Sep 11, 2019 at 06:15:22PM -0300, Alvaro Herrera wrote:

On 2019-Aug-30, Bruce Momjian wrote:

OK, how is this patch? I didn't mention psql since I think everyone
expects psql to show all information about tables and indexes.

Why would you change perform.sgml? It seems unnecessary; the commands
shown work fine.

I realize they work fine, but the ordering in the examples not matching
the defined order suggests that ordering matters, but it does not.

Well, I mean exactly the other way around: the fact that the orders
don't match illustrates that the order is not important. And that is
reinforced by the explanation indicating explicitly that it does not
matter:

Uh, people normally list things in defined order, so you would usually
not list them in non-defined order unless there is a purpose. Doing
that just to illustrate the order doesn't matter seems odd.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +
#8Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Bruce Momjian (#7)
Re: Most-common value docs in PG 12

On Thu, Sep 26, 2019 at 04:20:59PM -0400, Bruce Momjian wrote:

On Thu, Sep 26, 2019 at 05:17:55PM -0300, Alvaro Herrera wrote:

On 2019-Sep-26, Bruce Momjian wrote:

On Wed, Sep 11, 2019 at 06:15:22PM -0300, Alvaro Herrera wrote:

On 2019-Aug-30, Bruce Momjian wrote:

OK, how is this patch? I didn't mention psql since I think everyone
expects psql to show all information about tables and indexes.

Why would you change perform.sgml? It seems unnecessary; the commands
shown work fine.

I realize they work fine, but the ordering in the examples not matching
the defined order suggests that ordering matters, but it does not.

Well, I mean exactly the other way around: the fact that the orders
don't match illustrates that the order is not important. And that is
reinforced by the explanation indicating explicitly that it does not
matter:

Uh, people normally list things in defined order, so you would usually
not list them in non-defined order unless there is a purpose. Doing
that just to illustrate the order doesn't matter seems odd.

Well, that assumes there is a definition, and I don't think the zipcodes
table is defined anywhere. So how do you know in what order are those
columns defined?

Now, maybe the table should be defined somewhere in perform.sgml - I
don't recall why exactly I chose not to do that, maybe because there is
no universal definition (one country uses text, another number, ...).

I do however agree that had there been such definition, it's probably
natural to list columns in the same order. We know the order is not
important, the proposed patch states that explicitly, but this just
feels natural.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#9Bruce Momjian
bruce@momjian.us
In reply to: Tomas Vondra (#8)
Re: Most-common value docs in PG 12

On Thu, Sep 26, 2019 at 11:03:54PM +0200, Tomas Vondra wrote:

On Thu, Sep 26, 2019 at 04:20:59PM -0400, Bruce Momjian wrote:

Uh, people normally list things in defined order, so you would usually
not list them in non-defined order unless there is a purpose. Doing
that just to illustrate the order doesn't matter seems odd.

Well, that assumes there is a definition, and I don't think the zipcodes
table is defined anywhere. So how do you know in what order are those
columns defined?

In the USA, it is usually specific to general, i.e., city, state.

Now, maybe the table should be defined somewhere in perform.sgml - I
don't recall why exactly I chose not to do that, maybe because there is
no universal definition (one country uses text, another number, ...)

Yeah, doesn't seem worth adding.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +
#10Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Bruce Momjian (#9)
Re: Most-common value docs in PG 12

On Thu, Sep 26, 2019 at 05:31:07PM -0400, Bruce Momjian wrote:

On Thu, Sep 26, 2019 at 11:03:54PM +0200, Tomas Vondra wrote:

On Thu, Sep 26, 2019 at 04:20:59PM -0400, Bruce Momjian wrote:

Uh, people normally list things in defined order, so you would usually
not list them in non-defined order unless there is a purpose. Doing
that just to illustrate the order doesn't matter seems odd.

Well, that assumes there is a definition, and I don't think the zipcodes
table is defined anywhere. So how do you know in what order are those
columns defined?

In the USA, it is usually specific to general, i.e., city, state.

I'd probably define it the same way, but for example the zipcode data
sets I usually use for my talks [1]http://download.geonames.org/export/zip/ defines it like this:

postal code : varchar(20)
place name : varchar(180)
admin name1 : 1. order subdivision (state) varchar(100)
admin code1 : 1. order subdivision (state) varchar(20)
admin name2 : 2. order subdivision (county/province) varchar(100)
admin code2 : 2. order subdivision (county/province) varchar(20)
admin name3 : 3. order subdivision (community) varchar(100)
admin code3 : 3. order subdivision (community) varchar(20)
latitude : estimated latitude (wgs84)
longitude : estimated longitude (wgs84)
accuracy : accuracy of lat/lng

so in this case it's a bit of a mix of specific vs. general first.

[1]: http://download.geonames.org/export/zip/

Now, maybe the table should be defined somewhere in perform.sgml - I
don't recall why exactly I chose not to do that, maybe because there is
no universal definition (one country uses text, another number, ...)

Yeah, doesn't seem worth adding.

OK.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#11Bruce Momjian
bruce@momjian.us
In reply to: Tomas Vondra (#10)
Re: Most-common value docs in PG 12

On Fri, Sep 27, 2019 at 01:30:49PM +0200, Tomas Vondra wrote:

Now, maybe the table should be defined somewhere in perform.sgml - I
don't recall why exactly I chose not to do that, maybe because there is
no universal definition (one country uses text, another number, ...)

Yeah, doesn't seem worth adding.

OK.

Patch applied through PG 12.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +