Alias collision in `refresh materialized view concurrently`

Started by Mathis Rudolfabout 5 years ago20 messageshackers

mathis.rudolf@credativ.de

about 5 years ago

Hello,

we had a Customer-Report in which `refresh materialized view
CONCURRENTLY` failed with: `ERROR: column reference "mv" is ambiguous`

They're using `mv` as an alias for one column and this is causing a
collision with an internal alias. They also made it reproducible like this:
```
create materialized view testmv as select 'asdas' mv; --ok
create unique index on testmv (mv); --ok
refresh materialized view testmv; --ok
refresh materialized view CONCURRENTLY testmv; ---BAM!
```

```
ERROR: column reference "mv" is ambiguous
LINE 1: ...alog.=) mv.mv AND newdata OPERATOR(pg_catalog.*=) mv) WHERE ...
^
QUERY: CREATE TEMP TABLE pg_temp_4.pg_temp_218322_2 AS SELECT mv.ctid
AS tid, newdata FROM public.testmv mv FULL JOIN pg_temp_4.pg_temp_218322
newdata ON (newdata.mv OPERATOR(pg_catalog.=) mv.mv AND newdata
OPERATOR(pg_catalog.*=) mv) WHERE newdata IS NULL OR mv IS NULL ORDER BY tid
```

The corresponding Code is in `matview.c` in function
`refresh_by_match_merge`. With adding a prefix like `_pg_internal_` we
could make collisions pretty unlikely, without intrusive changes.

The appended patch does this change for the aliases `mv`, `newdata` and
`newdata2`.

Kind regards,
Mathis

Bharath Rupireddy

bharath.rupireddyforpostgres@gmail.com

about 5 years ago

In reply to: Mathis Rudolf (#1)

Re: Alias collision in `refresh materialized view concurrently`

On Wed, May 19, 2021 at 5:33 PM Mathis Rudolf <mathis.rudolf@credativ.de> wrote:

Hello,

we had a Customer-Report in which `refresh materialized view
CONCURRENTLY` failed with: `ERROR: column reference "mv" is ambiguous`

They're using `mv` as an alias for one column and this is causing a
collision with an internal alias. They also made it reproducible like this:
```
create materialized view testmv as select 'asdas' mv; --ok
create unique index on testmv (mv); --ok
refresh materialized view testmv; --ok
refresh materialized view CONCURRENTLY testmv; ---BAM!
```

```
ERROR: column reference "mv" is ambiguous
LINE 1: ...alog.=) mv.mv AND newdata OPERATOR(pg_catalog.*=) mv) WHERE ...
^
QUERY: CREATE TEMP TABLE pg_temp_4.pg_temp_218322_2 AS SELECT mv.ctid
AS tid, newdata FROM public.testmv mv FULL JOIN pg_temp_4.pg_temp_218322
newdata ON (newdata.mv OPERATOR(pg_catalog.=) mv.mv AND newdata
OPERATOR(pg_catalog.*=) mv) WHERE newdata IS NULL OR mv IS NULL ORDER BY tid
```

The corresponding Code is in `matview.c` in function
`refresh_by_match_merge`. With adding a prefix like `_pg_internal_` we
could make collisions pretty unlikely, without intrusive changes.

The appended patch does this change for the aliases `mv`, `newdata` and
`newdata2`.

I think it's better to have some random name, see below. We could
either use "OIDNewHeap" or "MyBackendId" to make those column names
unique and almost unguessable. So, something like "pg_temp1_XXXX",
"pg_temp2_XXXX" or "pg_temp3_XXXX" and so on would be better IMO.

snprintf(NewHeapName, sizeof(NewHeapName), "pg_temp_%u", OIDOldHeap);
snprintf(namespaceName, sizeof(namespaceName), "pg_temp_%d", MyBackendId);

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Bernd Helmle

mailings@oopsware.de

about 5 years ago

In reply to: Bharath Rupireddy (#2)

Re: Alias collision in `refresh materialized view concurrently`

Am Mittwoch, dem 19.05.2021 um 18:06 +0530 schrieb Bharath Rupireddy:

The corresponding Code is in `matview.c` in function
`refresh_by_match_merge`. With adding a prefix like `_pg_internal_`
we
could make collisions pretty unlikely, without intrusive changes.

The appended patch does this change for the aliases `mv`, `newdata`
and
`newdata2`.

I think it's better to have some random name, see below. We could
either use "OIDNewHeap" or "MyBackendId" to make those column names
unique and almost unguessable. So, something like "pg_temp1_XXXX",
"pg_temp2_XXXX" or "pg_temp3_XXXX" and so on would be better IMO.

snprintf(NewHeapName, sizeof(NewHeapName), "pg_temp_%u",
OIDOldHeap);
snprintf(namespaceName, sizeof(namespaceName), "pg_temp_%d",
MyBackendId);

Hmm, it's an idea, but this can also lead to pretty random failures if
you have an unlucky user who had the same idea in its generating query
tool than the backend, no? Not sure if that's really better.

With the current implementation of REFRESH MATERIALIZED VIEW
CONCURRENTLY we always have the problem of possible collisions here,
you'd never get out of this area without analyzing the whole query for
such collisions.

"mv" looks like a very common alias (i use it all over the time when
testing or playing around with materialized views, so i'm wondering why
i didn't see this issue already myself). So the risk here for such a
collision looks very high. We can try to lower this risk by choosing an
alias name, which is not so common. With a static alias however you get
a static error condition, not something that fails here and then.

--
Thanks,
Bernd

Bharath Rupireddy

bharath.rupireddyforpostgres@gmail.com

about 5 years ago

In reply to: Bernd Helmle (#3)

Re: Alias collision in `refresh materialized view concurrently`

On Thu, May 20, 2021 at 7:52 PM Bernd Helmle <mailings@oopsware.de> wrote:

Am Mittwoch, dem 19.05.2021 um 18:06 +0530 schrieb Bharath Rupireddy:

The corresponding Code is in `matview.c` in function
`refresh_by_match_merge`. With adding a prefix like `_pg_internal_`
we
could make collisions pretty unlikely, without intrusive changes.

The appended patch does this change for the aliases `mv`, `newdata`
and
`newdata2`.

I think it's better to have some random name, see below. We could
either use "OIDNewHeap" or "MyBackendId" to make those column names
unique and almost unguessable. So, something like "pg_temp1_XXXX",
"pg_temp2_XXXX" or "pg_temp3_XXXX" and so on would be better IMO.

snprintf(NewHeapName, sizeof(NewHeapName), "pg_temp_%u",
OIDOldHeap);
snprintf(namespaceName, sizeof(namespaceName), "pg_temp_%d",
MyBackendId);

Hmm, it's an idea, but this can also lead to pretty random failures if
you have an unlucky user who had the same idea in its generating query
tool than the backend, no? Not sure if that's really better.

With the current implementation of REFRESH MATERIALIZED VIEW
CONCURRENTLY we always have the problem of possible collisions here,
you'd never get out of this area without analyzing the whole query for
such collisions.

"mv" looks like a very common alias (i use it all over the time when
testing or playing around with materialized views, so i'm wondering why
i didn't see this issue already myself). So the risk here for such a
collision looks very high. We can try to lower this risk by choosing an
alias name, which is not so common. With a static alias however you get
a static error condition, not something that fails here and then.

Another idea is to use random() function to generate required number
of uint32 random values(refresh_by_match_merge might need 3 values to
replace newdata, newdata2 and mv) and use the names like
pg_temp_rmv_<<rand_no1>>, pg_temp_rmv_<<rand_no2>> and so on. This
would make the name unguessable. Note that we use this in
choose_dsm_implementation, dsm_impl_posix.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Michael Paquier

michael@paquier.xyz

about 5 years ago

In reply to: Bharath Rupireddy (#4)

Re: Alias collision in `refresh materialized view concurrently`

On Thu, May 20, 2021 at 09:14:45PM +0530, Bharath Rupireddy wrote:

On Thu, May 20, 2021 at 7:52 PM Bernd Helmle <mailings@oopsware.de> wrote:

"mv" looks like a very common alias (i use it all over the time when
testing or playing around with materialized views, so i'm wondering why
i didn't see this issue already myself). So the risk here for such a
collision looks very high. We can try to lower this risk by choosing an
alias name, which is not so common. With a static alias however you get
a static error condition, not something that fails here and then.

Another idea is to use random() function to generate required number
of uint32 random values(refresh_by_match_merge might need 3 values to
replace newdata, newdata2 and mv) and use the names like
pg_temp_rmv_<<rand_no1>>, pg_temp_rmv_<<rand_no2>> and so on. This
would make the name unguessable. Note that we use this in
choose_dsm_implementation, dsm_impl_posix.

I am not sure that I see the point of using a random() number here
while the backend ID, or just the PID, would easily provide enough
entropy for this internal alias. I agree that "mv" is a bad choice
for this alias name. One thing that comes in mind here is to use an
alias similar to what we do for dropped attributes, say
........pg.matview.%d........ where %d is the PID. This will very
unlikely cause conflicts.
--
Michael

Bharath Rupireddy

bharath.rupireddyforpostgres@gmail.com

about 5 years ago

In reply to: Michael Paquier (#5)

Re: Alias collision in `refresh materialized view concurrently`

On Fri, May 21, 2021 at 6:08 AM Michael Paquier <michael@paquier.xyz> wrote:

On Thu, May 20, 2021 at 09:14:45PM +0530, Bharath Rupireddy wrote:

On Thu, May 20, 2021 at 7:52 PM Bernd Helmle <mailings@oopsware.de> wrote:

"mv" looks like a very common alias (i use it all over the time when
testing or playing around with materialized views, so i'm wondering why
i didn't see this issue already myself). So the risk here for such a
collision looks very high. We can try to lower this risk by choosing an
alias name, which is not so common. With a static alias however you get
a static error condition, not something that fails here and then.

Another idea is to use random() function to generate required number
of uint32 random values(refresh_by_match_merge might need 3 values to
replace newdata, newdata2 and mv) and use the names like
pg_temp_rmv_<<rand_no1>>, pg_temp_rmv_<<rand_no2>> and so on. This
would make the name unguessable. Note that we use this in
choose_dsm_implementation, dsm_impl_posix.

I am not sure that I see the point of using a random() number here
while the backend ID, or just the PID, would easily provide enough
entropy for this internal alias. I agree that "mv" is a bad choice
for this alias name. One thing that comes in mind here is to use an
alias similar to what we do for dropped attributes, say
........pg.matview.%d........ where %d is the PID. This will very
unlikely cause conflicts.

I agree that backend ID and/or PID is enough. I'm not fully convinced
with using random(). To make it more concrete, how about something
like pg.matview.%d.%d (MyBackendId, MyProcPid)? If the user still sees
some collisions, then IMHO, it's better to ensure that this kind of
table/alias names are not generated outside of the server.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Michael Paquier

michael@paquier.xyz

almost 5 years ago

In reply to: Bharath Rupireddy (#6)

Re: Alias collision in `refresh materialized view concurrently`

On Fri, May 21, 2021 at 03:56:31PM +0530, Bharath Rupireddy wrote:

On Fri, May 21, 2021 at 6:08 AM Michael Paquier <michael@paquier.xyz> wrote:

I am not sure that I see the point of using a random() number here
while the backend ID, or just the PID, would easily provide enough
entropy for this internal alias. I agree that "mv" is a bad choice
for this alias name. One thing that comes in mind here is to use an
alias similar to what we do for dropped attributes, say
........pg.matview.%d........ where %d is the PID. This will very
unlikely cause conflicts.

I agree that backend ID and/or PID is enough. I'm not fully convinced
with using random(). To make it more concrete, how about something
like pg.matview.%d.%d (MyBackendId, MyProcPid)? If the user still sees
some collisions, then IMHO, it's better to ensure that this kind of
table/alias names are not generated outside of the server.

There is no need to have the PID if MyBackendId is enough, so after
considering it I would just choose something like what I quoted above.
Don't we need also to worry about the queries using newdata, newdata2
and diff as aliases? Would you like to implement a patch doing
something like that?
--
Michael

Bharath Rupireddy

bharath.rupireddyforpostgres@gmail.com

almost 5 years ago

In reply to: Michael Paquier (#7)

Re: Alias collision in `refresh materialized view concurrently`

On Tue, Jun 1, 2021 at 7:11 AM Michael Paquier <michael@paquier.xyz> wrote:

On Fri, May 21, 2021 at 03:56:31PM +0530, Bharath Rupireddy wrote:

On Fri, May 21, 2021 at 6:08 AM Michael Paquier <michael@paquier.xyz> wrote:

I am not sure that I see the point of using a random() number here
while the backend ID, or just the PID, would easily provide enough
entropy for this internal alias. I agree that "mv" is a bad choice
for this alias name. One thing that comes in mind here is to use an
alias similar to what we do for dropped attributes, say
........pg.matview.%d........ where %d is the PID. This will very
unlikely cause conflicts.

I agree that backend ID and/or PID is enough. I'm not fully convinced
with using random(). To make it more concrete, how about something
like pg.matview.%d.%d (MyBackendId, MyProcPid)? If the user still sees
some collisions, then IMHO, it's better to ensure that this kind of
table/alias names are not generated outside of the server.

There is no need to have the PID if MyBackendId is enough, so after
considering it I would just choose something like what I quoted above.
Don't we need also to worry about the queries using newdata, newdata2
and diff as aliases? Would you like to implement a patch doing
something like that?

Sure. PSA v2 patch. We can't have "." as separator in the alias names,
so I used "_" instead.

I used MyProcPid which seems more random than MyBackendId (which is
just a number like 1,2,3...). Even with this, someone could argue that
they can look at the backend PID, use it in the materialized view
names just to trick the server. I'm not sure if anyone would want to
do this.

I used the existing function make_temptable_name_n to prepare the
alias names. The advantage of this is that the code looks cleaner, but
it leaks memory, 1KB string for each call of the function. This is
also true with the existing usage of the function. Now, we will have 5
make_temptable_name_n function calls leaking 5KB memory. And we also
have quote_qualified_identifier leaking memory, 2 function calls, 2KB.
So, in total, these two functions will leak 7KB of memory (with the
patch).

Shall I pfree the memory for all the strings returned by the functions
make_temptable_name_n and quote_qualified_identifier? The problem is
that pfree isn't cheaper.
Or shall we leave it as is so that the memory will be freed up by the context?

Note I have not added tests for this, as the code is covered by the
existing tests.

With Regards,
Bharath Rupireddy.

Bernd Helmle

mailings@oopsware.de

almost 5 years ago

In reply to: Bharath Rupireddy (#8)

Re: Alias collision in `refresh materialized view concurrently`

Am Dienstag, dem 01.06.2021 um 13:13 +0530 schrieb Bharath Rupireddy:

I used MyProcPid which seems more random than MyBackendId (which is
just a number like 1,2,3...). Even with this, someone could argue
that
they can look at the backend PID, use it in the materialized view
names just to trick the server. I'm not sure if anyone would want to
do this.

A generated query likely uses just an incremented value derived from
somewhere and in my opinion 1,2,3 makes it more likely that you get a
chance for collisions if you managed to get the same alias prefix
somehow. So +1 with the MyProcPid...

I used the existing function make_temptable_name_n to prepare the
alias names. The advantage of this is that the code looks cleaner,
but
it leaks memory, 1KB string for each call of the function. This is
also true with the existing usage of the function. Now, we will have
5
make_temptable_name_n function calls leaking 5KB memory. And we also
have quote_qualified_identifier leaking memory, 2 function calls,
2KB.
So, in total, these two functions will leak 7KB of memory (with the
patch).

Shall I pfree the memory for all the strings returned by the
functions
make_temptable_name_n and quote_qualified_identifier? The problem is
that pfree isn't cheaper.
Or shall we leave it as is so that the memory will be freed up by the
context?

afaics the memory context is deleted after execution immediately, so
i'd assume it's okay.

--
Thanks,
Bernd

#10

Bharath Rupireddy

bharath.rupireddyforpostgres@gmail.com

almost 5 years ago

In reply to: Bernd Helmle (#9)

Re: Alias collision in `refresh materialized view concurrently`

On Tue, Jun 1, 2021 at 5:24 PM Bernd Helmle <mailings@oopsware.de> wrote:

Am Dienstag, dem 01.06.2021 um 13:13 +0530 schrieb Bharath Rupireddy:

I used MyProcPid which seems more random than MyBackendId (which is
just a number like 1,2,3...). Even with this, someone could argue
that
they can look at the backend PID, use it in the materialized view
names just to trick the server. I'm not sure if anyone would want to
do this.

A generated query likely uses just an incremented value derived from
somewhere and in my opinion 1,2,3 makes it more likely that you get a
chance for collisions if you managed to get the same alias prefix
somehow. So +1 with the MyProcPid...

Thanks.

I used the existing function make_temptable_name_n to prepare the
alias names. The advantage of this is that the code looks cleaner,
but
it leaks memory, 1KB string for each call of the function. This is
also true with the existing usage of the function. Now, we will have
5
make_temptable_name_n function calls leaking 5KB memory. And we also
have quote_qualified_identifier leaking memory, 2 function calls,
2KB.
So, in total, these two functions will leak 7KB of memory (with the
patch).

Shall I pfree the memory for all the strings returned by the
functions
make_temptable_name_n and quote_qualified_identifier? The problem is
that pfree isn't cheaper.
Or shall we leave it as is so that the memory will be freed up by the
context?

afaics the memory context is deleted after execution immediately, so
i'd assume it's okay.

Yes, the refresh operation happens in the "PortalContext", which gets
destroyed at the end of the query in PortalDrop.

PSA v3 patch. I added a commit message and made some cosmetic adjustments.

With Regards,
Bharath Rupireddy.

#11

Thomas Munro

thomas.munro@gmail.com

almost 5 years ago

In reply to: Bharath Rupireddy (#10)

Re: Alias collision in `refresh materialized view concurrently`

On Wed, Jun 2, 2021 at 2:02 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

PSA v3 patch. I added a commit message and made some cosmetic adjustments.

Reminds me of this fun topic in Lisp:

https://en.wikipedia.org/wiki/Hygienic_macro#Strategies_used_in_languages_that_lack_hygienic_macros

I wondered if we could find a way to make identifiers that regular
queries are prohibited from using, at least by documentation. You
could take advantage of the various constraints on unquoted
identifiers in the standard (for example, something involving $), but
it does seem a shame to remove the ability for users to put absolutely
anything except NUL in quoted identifiers. I do wonder if at least
using something like _$mv would be slightly more principled than
pg_mv_1234, since nothing says pg_XXX is reserved (except in some very
specific places like schema names), and the number on the end seems a
bit cargo-cultish.

#12

Michael Paquier

michael@paquier.xyz

almost 5 years ago

In reply to: Thomas Munro (#11)

Re: Alias collision in `refresh materialized view concurrently`

On Wed, Jun 02, 2021 at 12:30:55PM +1200, Thomas Munro wrote:

I wondered if we could find a way to make identifiers that regular
queries are prohibited from using, at least by documentation. You
could take advantage of the various constraints on unquoted
identifiers in the standard (for example, something involving $), but
it does seem a shame to remove the ability for users to put absolutely
anything except NUL in quoted identifiers. I do wonder if at least
using something like _$mv would be slightly more principled than
pg_mv_1234, since nothing says pg_XXX is reserved (except in some very
specific places like schema names), and the number on the end seems a
bit cargo-cultish.

Yeah, using an underscore at the beginning of the name would have the
advantage to mark the relation as an internal thing.

+                    "(SELECT %s.tid FROM %s %s "
+                    "WHERE %s.tid IS NOT NULL "
+                    "AND %s.%s IS NULL)",
Anyway, I have another problem with the patch: readability.  It
becomes really hard for one to guess to which object or alias portions
of the internal queries refer to, especially with a total of five 
temporary names lying around.  I think that you should drop the
business with make_temptable_name_n(), and just append those extra
underscores and uses of MyProcPid directly in the query string.  The
surroundings of quote_qualified_identifier() require two extra printf
calls, but that does not sound bad to me compared to the long-term
maintenance of those queries.
--
Michael

#13

Bharath Rupireddy

bharath.rupireddyforpostgres@gmail.com

almost 5 years ago

In reply to: Michael Paquier (#12)

Re: Alias collision in `refresh materialized view concurrently`

On Wed, Jun 2, 2021 at 6:33 AM Michael Paquier <michael@paquier.xyz> wrote:

On Wed, Jun 02, 2021 at 12:30:55PM +1200, Thomas Munro wrote:

I wondered if we could find a way to make identifiers that regular
queries are prohibited from using, at least by documentation. You
could take advantage of the various constraints on unquoted
identifiers in the standard (for example, something involving $), but
it does seem a shame to remove the ability for users to put absolutely
anything except NUL in quoted identifiers. I do wonder if at least
using something like _$mv would be slightly more principled than
pg_mv_1234, since nothing says pg_XXX is reserved (except in some very
specific places like schema names), and the number on the end seems a
bit cargo-cultish.

Yeah, using an underscore at the beginning of the name would have the
advantage to mark the relation as an internal thing.
+                    "(SELECT %s.tid FROM %s %s "
+                    "WHERE %s.tid IS NOT NULL "
+                    "AND %s.%s IS NULL)",
Anyway, I have another problem with the patch: readability.  It
becomes really hard for one to guess to which object or alias portions
of the internal queries refer to, especially with a total of five
temporary names lying around.  I think that you should drop the
business with make_temptable_name_n(), and just append those extra
underscores and uses of MyProcPid directly in the query string.  The
surroundings of quote_qualified_identifier() require two extra printf
calls, but that does not sound bad to me compared to the long-term
maintenance of those queries.

Thanks. PSA v4.

With Regards,
Bharath Rupireddy.

#14

Michael Paquier

michael@paquier.xyz

almost 5 years ago

In reply to: Bharath Rupireddy (#13)

Re: Alias collision in `refresh materialized view concurrently`

On Wed, Jun 02, 2021 at 10:53:22AM +0530, Bharath Rupireddy wrote:

Thanks. PSA v4.

Thanks for the new version.

+                    MyProcPid, tempname, MyProcPid, MyProcPid,
+                    tempname, MyProcPid, MyProcPid, MyProcPid,
+                    MyProcPid, MyProcPid, MyProcPid);
This style is still a bit heavy-ish.  Perhaps we should just come back
to Thomas's suggestion and just use a prefix with _$ for all that.
--
Michael

#15

Bharath Rupireddy

bharath.rupireddyforpostgres@gmail.com

almost 5 years ago

In reply to: Michael Paquier (#14)

Re: Alias collision in `refresh materialized view concurrently`

On Wed, Jun 2, 2021 at 1:27 PM Michael Paquier <michael@paquier.xyz> wrote:

On Wed, Jun 02, 2021 at 10:53:22AM +0530, Bharath Rupireddy wrote:

Thanks. PSA v4.

Thanks for the new version.

+                    MyProcPid, tempname, MyProcPid, MyProcPid,
+                    tempname, MyProcPid, MyProcPid, MyProcPid,
+                    MyProcPid, MyProcPid, MyProcPid);
This style is still a bit heavy-ish.  Perhaps we should just come back
to Thomas's suggestion and just use a prefix with _$ for all that.

Thanks.The changes with that approach are very minimal. PSA v5 and let
me know if any more changes are needed.

With Regards,
Bharath Rupireddy.

#16

Michael Paquier

michael@paquier.xyz

almost 5 years ago

In reply to: Bharath Rupireddy (#15)

Re: Alias collision in `refresh materialized view concurrently`

On Wed, Jun 02, 2021 at 03:44:45PM +0530, Bharath Rupireddy wrote:

Thanks.The changes with that approach are very minimal. PSA v5 and let
me know if any more changes are needed.

Simple enough, so applied and back-patched. It took 8 years for
somebody to complain about the current aliases, so that should be
enough to get us close to zero conflicts now. I have looked a bit to
see if anybody would use this naming convention, but could not find a
trace, FWIW.
--
Michael

#17

Tom Lane

tgl@sss.pgh.pa.us

almost 5 years ago

In reply to: Michael Paquier (#16)

Re: Alias collision in `refresh materialized view concurrently`

Michael Paquier <michael@paquier.xyz> writes:

On Wed, Jun 02, 2021 at 03:44:45PM +0530, Bharath Rupireddy wrote:

Thanks.The changes with that approach are very minimal. PSA v5 and let
me know if any more changes are needed.

Simple enough, so applied and back-patched.

I just came across this issue while preparing the release notes.
ISTM that people have expended a great deal of effort on a fundamentally
unreliable solution, when a reliable one is easily available.
The originally complained-of issue was that a user-chosen column name
could collide with the query-chosen table name:

ERROR: column reference "mv" is ambiguous
LINE 1: ...alog.=) mv.mv AND newdata OPERATOR(pg_catalog.*=) mv) WHERE ...

This is true, but it's self-inflicted damage, because all you have
to do is write the query so that mv is clearly a table name:

... mv.mv AND newdata.* OPERATOR(pg_catalog.*=) mv.*) WHERE ...

AFAICT that works and generates the identical parse tree to the original
coding. The only place touched by the patch where it's a bit difficult to
make the syntax unambiguous this way is

"CREATE TEMP TABLE %s AS "
"SELECT _$mv.ctid AS tid, _$newdata "
"FROM %s _$mv FULL JOIN %s _$newdata ON (",

because newdata.* would ordinarily get expanded to multiple columns
if it's at the top level of a SELECT list, and that's not what we want.
However, that's easily fixed using the same hack as in ruleutils.c's
get_variable: add a no-op cast to the table's rowtype. So this
would look like

appendStringInfo(&querybuf,
"CREATE TEMP TABLE %s AS "
"SELECT mv.ctid AS tid, newdata.*::%s "
"FROM %s mv FULL JOIN %s newdata ON (",
diffname, matviewname, matviewname, tempname);

Given that it took this long to notice the problem at all, maybe
this is not a fix to cram in on the weekend before the release wrap.
But I don't see why we need to settle for "mostly works" when
"always works" is barely any harder.

regards, tom lane

#18

Tom Lane

tgl@sss.pgh.pa.us

almost 5 years ago

In reply to: Tom Lane (#17)

Re: Alias collision in `refresh materialized view concurrently`

I wrote:

I just came across this issue while preparing the release notes.
ISTM that people have expended a great deal of effort on a fundamentally
unreliable solution, when a reliable one is easily available.

Concretely, I propose the attached.

regards, tom lane

#19

Michael Paquier

michael@paquier.xyz

almost 5 years ago

In reply to: Tom Lane (#17)

Re: Alias collision in `refresh materialized view concurrently`

On Fri, Aug 06, 2021 at 10:48:40AM -0400, Tom Lane wrote:

AFAICT that works and generates the identical parse tree to the original
coding. The only place touched by the patch where it's a bit difficult to
make the syntax unambiguous this way is

"CREATE TEMP TABLE %s AS "
"SELECT _$mv.ctid AS tid, _$newdata "
"FROM %s _$mv FULL JOIN %s _$newdata ON (",

because newdata.* would ordinarily get expanded to multiple columns
if it's at the top level of a SELECT list, and that's not what we want.
However, that's easily fixed using the same hack as in ruleutils.c's
get_variable: add a no-op cast to the table's rowtype. So this
would look like

appendStringInfo(&querybuf,
"CREATE TEMP TABLE %s AS "
"SELECT mv.ctid AS tid, newdata.*::%s "
"FROM %s mv FULL JOIN %s newdata ON (",
diffname, matviewname, matviewname, tempname);

Smart piece. I haven't thought of that.

Given that it took this long to notice the problem at all, maybe
this is not a fix to cram in on the weekend before the release wrap.
But I don't see why we need to settle for "mostly works" when
"always works" is barely any harder.

Yes, I would vote to delay that for a couple of days. That's not
worth taking a risk for.
--
Michael

#20

Tom Lane

tgl@sss.pgh.pa.us

almost 5 years ago

In reply to: Michael Paquier (#19)

Re: Alias collision in `refresh materialized view concurrently`

Michael Paquier <michael@paquier.xyz> writes:

On Fri, Aug 06, 2021 at 10:48:40AM -0400, Tom Lane wrote:

Given that it took this long to notice the problem at all, maybe
this is not a fix to cram in on the weekend before the release wrap.
But I don't see why we need to settle for "mostly works" when
"always works" is barely any harder.

Yes, I would vote to delay that for a couple of days. That's not
worth taking a risk for.

I went ahead and created the patch, including test case, and it
seems fine. So I'm leaning towards pushing that tomorrow. Mainly
because I don't want to have to document "we partially fixed this"
in this release set and then "we really fixed it" three months from
now.

regards, tom lane

Alias collision in `refresh materialized view concurrently`

Attachments:

Attachments:

Attachments:

Attachments:

Attachments:

Attachments: