ALTER INDEX ... ALTER COLUMN not present in dump

Started by Ronan Dunklauover 7 years ago28 messageshackersbugs
Jump to latest
#1Ronan Dunklau
ronan.dunklau@people-doc.com
hackersbugs

Hello,

It seems like ALTER INDEX ... ALTER COLUMN statements (for setting specific
statistics targets on functional indexes, for example) are not part of a
pg_dump.

It is not easily noticed, since everything seems to work normally until a
sub-par plan is chosen because of an error in cardinality estimates.

Regards,

--
Ronan Dunklau

In reply to: Ronan Dunklau (#1)
hackersbugs
Re: ALTER INDEX ... ALTER COLUMN not present in dump

Hello
Can you give reproducible example?
I have ALTER TABLE ONLY (schema).(table) ALTER COLUMN (column) SET STATISTICS (target); in pg_dump output.

regards, Sergei

#3Ronan Dunklau
ronan.dunklau@people-doc.com
In reply to: Sergei Kornilov (#2)
hackersbugs
Re: ALTER INDEX ... ALTER COLUMN not present in dump

Hello
Can you give reproducible example?
I have ALTER TABLE ONLY (schema).(table) ALTER COLUMN (column) SET STATISTICS (target); in pg_dump output.

regards, Sergei

Please note it is about ALTER INDEX, not ALTER TABLE.

Here is the reproducible example:

create table t1 (id int);
create index on t1 ((id + 2));
alter index t1 alter column t1_expr statistics 10000;

pg_dump output extract:

--
-- Name: t1_expr_idx; Type: INDEX; Schema: public; Owner: postgres
--

CREATE INDEX t1_expr_idx ON public.t1 USING btree (((id + 2)));

--
-- PostgreSQL database dump complete
--

In reply to: Ronan Dunklau (#3)
hackersbugs
Re: ALTER INDEX ... ALTER COLUMN not present in dump

Oh, i see.
I can reproduce and did not found any SET STATISTICS code for indexes in pg_dump source. Seems completely missed support for this clause.

regards, Sergei

#5Adrien Nayrat
adrien.nayrat@anayrat.info
In reply to: Sergei Kornilov (#4)
hackersbugs
Re: ALTER INDEX ... ALTER COLUMN not present in dump

On 11/15/18 12:20 PM, Sergei Kornilov wrote:

Oh, i see.
I can reproduce and did not found any SET STATISTICS code for indexes in pg_dump source. Seems completely missed support for this clause.

regards, Sergei

It seems we missed something in :
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=5b6d13eec72b960eb0f78542199380e49c8583d4;hp=e09db94c0a5f3b440d96c5c9e8e6c1638d1ec39f

Sorry :/

#6Michael Paquier
michael@paquier.xyz
In reply to: Adrien Nayrat (#5)
hackersbugs
Re: ALTER INDEX ... ALTER COLUMN not present in dump

On Thu, Nov 15, 2018 at 12:46:08PM +0100, Adrien NAYRAT wrote:

On 11/15/18 12:20 PM, Sergei Kornilov wrote:

I can reproduce and did not found any SET STATISTICS code for indexes
in pg_dump source. Seems completely missed support for this clause.

It seems we missed something in :
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=5b6d13eec72b960eb0f78542199380e49c8583d4;hp=e09db94c0a5f3b440d96c5c9e8e6c1638d1ec39f

Yes, that's a bug, and something that we should try to fix in v11.
--
Michael

#7Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#6)
hackersbugs
Re: ALTER INDEX ... ALTER COLUMN not present in dump

On Fri, Nov 16, 2018 at 07:46:01AM +0900, Michael Paquier wrote:

Yes, that's a bug, and something that we should try to fix in v11.

Okay, here are my notes. We need to do a couple of things here:
1) Add a new join to pg_attribute in getIndexes(), then add the
information for statistics and the associated column to IndxInfo after
parsing the gathered array using parsePGArray().
2) Add the extra ALTER INDEX commands to the queries creating the
objects in dumpIndex().
3) Add a test.

A good thing is that when ALTER INDEX .. SET STATISTICS is applied on an
index of a partitioned table, the statement is not cascaded to the
existing partitions. We may want in the future to support ONLY and make
the inheritance automatic. But that's another topic, and the fix for
v11 should be chirurgical.
--
Michael

#8Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#7)
hackersbugs
Re: ALTER INDEX ... ALTER COLUMN not present in dump

On Fri, Nov 16, 2018 at 09:45:05AM +0900, Michael Paquier wrote:

A good thing is that when ALTER INDEX .. SET STATISTICS is applied on an
index of a partitioned table, the statement is not cascaded to the
existing partitions. We may want in the future to support ONLY and make
the inheritance automatic. But that's another topic, and the fix for
v11 should be chirurgical.

And here you go as attached. Looking closer, in v10 and older versions,
ALTER INDEX SET STATISTICS is able to work as it is an alias of ALTER
TABLE. The attached patch does not bother generating the ALTER INDEX
queries for v10 and older and feeds queries with empty strings. Perhaps
we should support that case? Or the lack of complains would be an
argument sufficient to care only about v11 and newer versions? I would
tend to think that supporting only v11 and above is enough. Thoughts
are welcome.
--
Michael

Attachments:

dump-alter-index-stats.patchtext/x-diff; charset=us-asciiDownload+99-6
#9Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#8)
hackersbugs
Re: ALTER INDEX ... ALTER COLUMN not present in dump

On Fri, Nov 16, 2018 at 05:32:52PM +0900, Michael Paquier wrote:

And here you go as attached. Looking closer, in v10 and older versions,
ALTER INDEX SET STATISTICS is able to work as it is an alias of ALTER
TABLE. The attached patch does not bother generating the ALTER INDEX
queries for v10 and older and feeds queries with empty strings. Perhaps
we should support that case? Or the lack of complains would be an
argument sufficient to care only about v11 and newer versions? I would
tend to think that supporting only v11 and above is enough. Thoughts
are welcome.

+       appendPQExpBuffer(q, "ALTER COLUMN %s ",
+                         indstatcolsarray[j])

This forgot a wrapping with fmtId(). Friday hits hard..
--
Michael

Attachments:

dump-alter-index-stats-v2.patchtext/x-diff; charset=us-asciiDownload+99-6
#10Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#9)
hackersbugs
Re: ALTER INDEX ... ALTER COLUMN not present in dump

On Fri, Nov 16, 2018 at 10:31:03PM +0900, Michael Paquier wrote:

On Fri, Nov 16, 2018 at 05:32:52PM +0900, Michael Paquier wrote:

And here you go as attached. Looking closer, in v10 and older versions,
ALTER INDEX SET STATISTICS is able to work as it is an alias of ALTER
TABLE. The attached patch does not bother generating the ALTER INDEX
queries for v10 and older and feeds queries with empty strings. Perhaps
we should support that case? Or the lack of complains would be an
argument sufficient to care only about v11 and newer versions? I would
tend to think that supporting only v11 and above is enough. Thoughts
are welcome.

So, any thoughts about this patch? I would still like to move on with
only supporting this set of queries only for v11 and above as that gets
only clearly documented on the ALTER INDEX page from that point.
--
Michael

#11Adrien Nayrat
adrien.nayrat@anayrat.info
In reply to: Michael Paquier (#10)
hackersbugs
Re: ALTER INDEX ... ALTER COLUMN not present in dump

On 11/21/18 1:04 AM, Michael Paquier wrote:

On Fri, Nov 16, 2018 at 10:31:03PM +0900, Michael Paquier wrote:

On Fri, Nov 16, 2018 at 05:32:52PM +0900, Michael Paquier wrote:

And here you go as attached. Looking closer, in v10 and older versions,
ALTER INDEX SET STATISTICS is able to work as it is an alias of ALTER
TABLE. The attached patch does not bother generating the ALTER INDEX
queries for v10 and older and feeds queries with empty strings. Perhaps
we should support that case? Or the lack of complains would be an
argument sufficient to care only about v11 and newer versions? I would
tend to think that supporting only v11 and above is enough. Thoughts
are welcome.

So, any thoughts about this patch? I would still like to move on with
only supporting this set of queries only for v11 and above as that gets
only clearly documented on the ALTER INDEX page from that point.
--
Michael

Sorry, I will try to look at this soon, but it is relatively new for me.
At least, someone else with more knowledge should also look.

Thanks to address this issue.

#12Adrien Nayrat
adrien.nayrat@anayrat.info
In reply to: Adrien Nayrat (#11)
hackersbugs
Re: ALTER INDEX ... ALTER COLUMN not present in dump

On 11/21/18 9:09 AM, Adrien Nayrat wrote:

So, any thoughts about this patch? I would still like to move on with
only supporting this set of queries only for v11 and above as that gets
only clearly documented on the ALTER INDEX page from that point.
--
Michael

Sorry, I will try to look at this soon, but it is relatively new for me.
At least, someone else with more knowledge should also look.

Thanks to address this issue.

I done some tests and it look good to me. I took an eye on the code and
nothing hurt me, but I am not the most qualified to say that.

Thanks Michael for the fix!

#13Michael Paquier
michael@paquier.xyz
In reply to: Adrien Nayrat (#12)
hackersbugs
Re: ALTER INDEX ... ALTER COLUMN not present in dump

On Wed, Nov 21, 2018 at 07:24:18PM +0100, Adrien NAYRAT wrote:

I done some tests and it look good to me. I took an eye on the code and
nothing hurt me, but I am not the most qualified to say that.

Thanks Adrien for the review. For now I have added it to the next
commit fest:
https://commitfest.postgresql.org/21/1884/
--
Michael

#14Amul Sul
sulamul@gmail.com
In reply to: Michael Paquier (#13)
hackersbugs
Re: ALTER INDEX ... ALTER COLUMN not present in dump

The following review has been posted through the commitfest application:
make installcheck-world: tested, passed
Implements feature: tested, passed
Spec compliant: not tested
Documentation: not tested

dump-alter-index-stats-v2.patch looks pretty much reasonable to me, passing on committer.

The new status of this patch is: Ready for Committer

#15Michael Paquier
michael@paquier.xyz
In reply to: Amul Sul (#14)
hackersbugs
Re: ALTER INDEX ... ALTER COLUMN not present in dump

On Fri, Dec 14, 2018 at 08:08:45AM +0000, Amul Sul wrote:

dump-alter-index-stats-v2.patch looks pretty much reasonable to me, passing on committer.

The new status of this patch is: Ready for Committer

Thanks Amul for the review. I got the occasion to look again at this
patch, and I have read again the original thread which has added the new
grammar for ALTER INDEX SET STATISTICS:
/messages/by-id/CAPpHfdsSYo6xpt0F=ngAdqMPFJJhC7zApde9h1qwkdpHpwFisA@mail.gmail.com

As Alexander and others state on this thread, it looks a bit weird to
use internally-produced attribute names in those SQL queries, which is
why the new grammar has been added. At the same time, it looks more
solid to me to represent the dumps with those column names instead of
column numbers. Tom, Alexander, as you have commented on the original
thread, perhaps you have an opinion here to share?

For now, attached is an updated patch which has a simplified test list
in the TAP test. I have also added two free() calls for the arrays
getting allocated when statistics are present for an index.
--
Michael

#16Tom Lane
tgl@sss.pgh.pa.us
In reply to: Michael Paquier (#15)
hackersbugs
Re: ALTER INDEX ... ALTER COLUMN not present in dump

Michael Paquier <michael@paquier.xyz> writes:

As Alexander and others state on this thread, it looks a bit weird to
use internally-produced attribute names in those SQL queries, which is
why the new grammar has been added. At the same time, it looks more
solid to me to represent the dumps with those column names instead of
column numbers. Tom, Alexander, as you have commented on the original
thread, perhaps you have an opinion here to share?

The problem is that there's no guarantee that the new server would
generate the same column name for an index column --- and I don't
want to try to lock things down so much that there would be such
a guarantee. So I'd go with the column-number form.

As an example:

regression=# create table foo (expr int, f1 int, f2 int);
CREATE TABLE
regression=# create index on foo ((f1+f2));
CREATE INDEX
regression=# create index on foo (expr, (f1+f2));
CREATE INDEX
regression=# \d foo
Table "public.foo"
Column | Type | Collation | Nullable | Default
--------+---------+-----------+----------+---------
expr | integer | | |
f1 | integer | | |
f2 | integer | | |
Indexes:
"foo_expr_expr1_idx" btree (expr, (f1 + f2))
"foo_expr_idx" btree ((f1 + f2))

regression=# \d foo_expr_idx
Index "public.foo_expr_idx"
Column | Type | Key? | Definition
--------+---------+------+------------
expr | integer | yes | (f1 + f2)
btree, for table "public.foo"

regression=# \d foo_expr_expr1_idx
Index "public.foo_expr_expr1_idx"
Column | Type | Key? | Definition
--------+---------+------+------------
expr | integer | yes | expr
expr1 | integer | yes | (f1 + f2)
btree, for table "public.foo"

If we were to rename the "foo.expr" column at this point,
and then dump and reload, the expression column in the
second index would presumably acquire the name "expr"
not "expr1", because "expr" would no longer be taken.
So if pg_dump were to try to use that index column name
in ALTER ... SET STATISTICS, it'd fail.

regards, tom lane

#17Amul Sul
sulamul@gmail.com
In reply to: Michael Paquier (#15)
hackersbugs
Re: ALTER INDEX ... ALTER COLUMN not present in dump

On Mon, Dec 17, 2018 at 10:44 AM Michael Paquier <michael@paquier.xyz> wrote:

On Fri, Dec 14, 2018 at 08:08:45AM +0000, Amul Sul wrote:

dump-alter-index-stats-v2.patch looks pretty much reasonable to me, passing on committer.

The new status of this patch is: Ready for Committer

Thanks Amul for the review. I got the occasion to look again at this
patch, and I have read again the original thread which has added the new
grammar for ALTER INDEX SET STATISTICS:
/messages/by-id/CAPpHfdsSYo6xpt0F=ngAdqMPFJJhC7zApde9h1qwkdpHpwFisA@mail.gmail.com

As Alexander and others state on this thread, it looks a bit weird to
use internally-produced attribute names in those SQL queries, which is
why the new grammar has been added. At the same time, it looks more
solid to me to represent the dumps with those column names instead of
column numbers. Tom, Alexander, as you have commented on the original
thread, perhaps you have an opinion here to share?

Oh I see -- understood the problem, I missed this discussion, thanks to
letting me know.

For now, attached is an updated patch which has a simplified test list
in the TAP test. I have also added two free() calls for the arrays
getting allocated when statistics are present for an index.

Patch is missing?

Regards,
Amul

#18Michael Paquier
michael@paquier.xyz
In reply to: Amul Sul (#17)
hackersbugs
Re: ALTER INDEX ... ALTER COLUMN not present in dump

On Mon, Dec 17, 2018 at 10:59:08AM +0530, amul sul wrote:

Patch is missing?

Here you go. The patch is still using atttribute names, which is a bad
idea ;)
--
Michael

Attachments:

dump-alter-index-stats-v3.patchtext/x-diff; charset=us-asciiDownload+81-6
#19Michael Paquier
michael@paquier.xyz
In reply to: Tom Lane (#16)
hackersbugs
Re: ALTER INDEX ... ALTER COLUMN not present in dump

On Mon, Dec 17, 2018 at 12:24:15AM -0500, Tom Lane wrote:

If we were to rename the "foo.expr" column at this point,
and then dump and reload, the expression column in the
second index would presumably acquire the name "expr"
not "expr1", because "expr" would no longer be taken.
So if pg_dump were to try to use that index column name
in ALTER ... SET STATISTICS, it'd fail.

Good point, thanks! I did not think about the case where a table uses
an attribute name matching what would be generated for indexes.

So this settles the argument that we had better not do anything before
v11. Switching the dump code to use column numbers has not proved to be
complicated as only the query and some comments had to be tweaked.
Attached is an updated patch, and I am switching back the patch to
"Needs review" to have an extra pair of eyes look at that in case I
missed something.
--
Michael

Attachments:

dump-alter-index-stats-v4.patchtext/x-diff; charset=us-asciiDownload+85-6
#20Amul Sul
sulamul@gmail.com
In reply to: Michael Paquier (#19)
hackersbugs
Re: ALTER INDEX ... ALTER COLUMN not present in dump

On Mon, Dec 17, 2018 at 11:54 AM Michael Paquier <michael@paquier.xyz> wrote:

On Mon, Dec 17, 2018 at 12:24:15AM -0500, Tom Lane wrote:

If we were to rename the "foo.expr" column at this point,
and then dump and reload, the expression column in the
second index would presumably acquire the name "expr"
not "expr1", because "expr" would no longer be taken.
So if pg_dump were to try to use that index column name
in ALTER ... SET STATISTICS, it'd fail.

Good point, thanks! I did not think about the case where a table uses
an attribute name matching what would be generated for indexes.

So this settles the argument that we had better not do anything before
v11. Switching the dump code to use column numbers has not proved to be
complicated as only the query and some comments had to be tweaked.
Attached is an updated patch, and I am switching back the patch to
"Needs review" to have an extra pair of eyes look at that in case I
missed something.

+1, will have a look, thanks.

Regards,
Amul

#21Amul Sul
sulamul@gmail.com
In reply to: Michael Paquier (#19)
hackersbugs
#22Michael Paquier
michael@paquier.xyz
In reply to: Amul Sul (#21)
hackersbugs
#23Amul Sul
sulamul@gmail.com
In reply to: Michael Paquier (#22)
hackersbugs
#24Michael Paquier
michael@paquier.xyz
In reply to: Amul Sul (#23)
hackersbugs
#25Amul Sul
sulamul@gmail.com
In reply to: Michael Paquier (#24)
hackersbugs
#26Amul Sul
sulamul@gmail.com
In reply to: Amul Sul (#20)
hackersbugs
#27Tom Lane
tgl@sss.pgh.pa.us
In reply to: Amul Sul (#26)
hackersbugs
#28Michael Paquier
michael@paquier.xyz
In reply to: Tom Lane (#27)
hackersbugs