The output sql generated by pg_dump for a create function refers to a modified table name
Hi,
The output sql generated by pg_dump for the below function refers to a
modified table name:
create table t1 (c1 int);
create table t2 (c1 int);
CREATE OR REPLACE FUNCTION test_fun(c1 int)
RETURNS void
LANGUAGE SQL
BEGIN ATOMIC
WITH delete_t1 AS (
DELETE FROM t1 WHERE c1 = $1
)
INSERT INTO t1 (c1) SELECT $1 FROM t2;
END;
The below sql output created by pg_dump refers to t1_1 which should
have been t1:
CREATE FUNCTION public.test_fun(c1 integer) RETURNS void
LANGUAGE sql
BEGIN ATOMIC
WITH delete_t1 AS (
DELETE FROM public.t1
WHERE (t1_1.c1 = test_fun.c1)
)
INSERT INTO public.t1 (c1) SELECT test_fun.c1
FROM public.t2;
END;
pg_get_function_sqlbody also returns similar result:
select proname, pg_get_function_sqlbody(oid) from pg_proc where
proname = 'test_fun';
proname | pg_get_function_sqlbody
----------+-------------------------------------------
test_fun | BEGIN ATOMIC +
| WITH delete_t1 AS ( +
| DELETE FROM t1 +
| WHERE (t1_1.c1 = test_fun.c1) +
| ) +
| INSERT INTO t1 (c1) SELECT test_fun.c1+
| FROM t2; +
| END
(1 row)
I felt the problem here is with set_rtable_names function which
changes the relation name t1 to t1_1 while parsing the statement:
/*
* If the selected name isn't unique, append digits to make it so, and
* make a new hash entry for it once we've got a unique name. For a
* very long input name, we might have to truncate to stay within
* NAMEDATALEN.
*/
During the query generation we will set the table names before
generating each statement, in our case the table t1 would have been
added already to the hash table during the first insert statement
generation. Next time it will try to set the relation names again for
the next statement, i.e delete statement, if the entry with same name
already exists, it will change the name to t1_1 by appending a digit
to keep the has entry unique.
Regards,
Vignesh
On 2/17/23 5:22 AM, vignesh C wrote:
Hi,
The output sql generated by pg_dump for the below function refers to a
modified table name:
create table t1 (c1 int);
create table t2 (c1 int);CREATE OR REPLACE FUNCTION test_fun(c1 int)
RETURNS void
LANGUAGE SQL
BEGIN ATOMIC
WITH delete_t1 AS (
DELETE FROM t1 WHERE c1 = $1
)
INSERT INTO t1 (c1) SELECT $1 FROM t2;
END;The below sql output created by pg_dump refers to t1_1 which should
have been t1:
CREATE FUNCTION public.test_fun(c1 integer) RETURNS void
LANGUAGE sql
BEGIN ATOMIC
WITH delete_t1 AS (
DELETE FROM public.t1
WHERE (t1_1.c1 = test_fun.c1)
)
INSERT INTO public.t1 (c1) SELECT test_fun.c1
FROM public.t2;
END;pg_get_function_sqlbody also returns similar result:
select proname, pg_get_function_sqlbody(oid) from pg_proc where
proname = 'test_fun';
proname | pg_get_function_sqlbody
----------+-------------------------------------------
test_fun | BEGIN ATOMIC +
| WITH delete_t1 AS ( +
| DELETE FROM t1 +
| WHERE (t1_1.c1 = test_fun.c1) +
| ) +
| INSERT INTO t1 (c1) SELECT test_fun.c1+
| FROM t2; +
| END
(1 row)
Thanks for reproducing and demonstrating that this was more generally
applicable. For context, this was initially discovered when testing the
DDL replication patch[1]/messages/by-id/e947fa21-24b2-f922-375a-d4f763ef3e4b@postgresql.org under that context.
I felt the problem here is with set_rtable_names function which
changes the relation name t1 to t1_1 while parsing the statement:
/*
* If the selected name isn't unique, append digits to make it so, and
* make a new hash entry for it once we've got a unique name. For a
* very long input name, we might have to truncate to stay within
* NAMEDATALEN.
*/During the query generation we will set the table names before
generating each statement, in our case the table t1 would have been
added already to the hash table during the first insert statement
generation. Next time it will try to set the relation names again for
the next statement, i.e delete statement, if the entry with same name
already exists, it will change the name to t1_1 by appending a digit
to keep the has entry unique.
Good catch. Do you have thoughts on how we can adjust the naming logic
to handle cases like this?
Jonathan
[1]: /messages/by-id/e947fa21-24b2-f922-375a-d4f763ef3e4b@postgresql.org
/messages/by-id/e947fa21-24b2-f922-375a-d4f763ef3e4b@postgresql.org
"Jonathan S. Katz" <jkatz@postgresql.org> writes:
Good catch. Do you have thoughts on how we can adjust the naming logic
to handle cases like this?
I think it's perfectly fine that ruleutils decided to use different
aliases for the two different occurrences of "t1": the statement is
quite confusing as written. The problem probably is that
get_delete_query_def() has no idea that it's supposed to print the
adjusted alias just after "DELETE FROM tab". UPDATE likely has same
issue ... maybe INSERT too?
regards, tom lane
On 2/17/23 10:09 AM, Tom Lane wrote:
"Jonathan S. Katz" <jkatz@postgresql.org> writes:
Good catch. Do you have thoughts on how we can adjust the naming logic
to handle cases like this?I think it's perfectly fine that ruleutils decided to use different
aliases for the two different occurrences of "t1": the statement is
quite confusing as written.
Agreed on that -- while it's harder to set up, I do prefer the original
example[1]/messages/by-id/e947fa21-24b2-f922-375a-d4f763ef3e4b@postgresql.org to demonstrate this, as it shows the issue given it does not
have those multiple occurrences, at least not within the same context, i.e.:
CREATE OR REPLACE FUNCTION public.calendar_manage(room_id int,
calendar_date date)
RETURNS void
LANGUAGE SQL
BEGIN ATOMIC
WITH delete_calendar AS (
DELETE FROM calendar
WHERE
room_id = $1 AND
calendar_date = $2
)
INSERT INTO calendar (room_id, status, calendar_date, calendar_range)
SELECT $1, c.status, $2, c.calendar_range
FROM calendar_generate_calendar($1, tstzrange($2, $2 + 1)) c;
END;
the table prefixes on the attributes within the DELETE statement were
ultimately mangled:
WITH delete_calendar AS (
DELETE FROM public.calendar
WHERE ((calendar_1.room_id OPERATOR(pg_catalog.=)
calendar_manage.room_id) AND (calendar_1.calendar_date
OPERATOR(pg_catalog.=) calendar_manage.calendar_date))
)
INSERT INTO public.calendar (room_id, status, calendar_date,
calendar_range)
The problem probably is that
get_delete_query_def() has no idea that it's supposed to print the
adjusted alias just after "DELETE FROM tab". UPDATE likely has same
issue ... maybe INSERT too?
Maybe? I modified the function above to do an INSERT/UPDATE instead of a
DELETE but I did not get any errors. However, if the logic is similar
there could be an issue there.
Thanks,
Jonathan
[1]: /messages/by-id/e947fa21-24b2-f922-375a-d4f763ef3e4b@postgresql.org
/messages/by-id/e947fa21-24b2-f922-375a-d4f763ef3e4b@postgresql.org
On 2/17/23 11:19 AM, Jonathan S. Katz wrote:
On 2/17/23 10:09 AM, Tom Lane wrote:
Agreed on that -- while it's harder to set up, I do prefer the original
example[1] to demonstrate this, as it shows the issue given it does not
have those multiple occurrences, at least not within the same context,
i.e.:CREATE OR REPLACE FUNCTION public.calendar_manage(room_id int,
calendar_date date)
RETURNS void
LANGUAGE SQL
BEGIN ATOMIC
WITH delete_calendar AS (
DELETE FROM calendar
WHERE
room_id = $1 AND
calendar_date = $2
)
INSERT INTO calendar (room_id, status, calendar_date, calendar_range)
SELECT $1, c.status, $2, c.calendar_range
FROM calendar_generate_calendar($1, tstzrange($2, $2 + 1)) c;
END;The problem probably is that
get_delete_query_def() has no idea that it's supposed to print the
adjusted alias just after "DELETE FROM tab". UPDATE likely has same
issue ... maybe INSERT too?Maybe? I modified the function above to do an INSERT/UPDATE instead of a
DELETE but I did not get any errors. However, if the logic is similar
there could be an issue there.
I spoke too soon -- I was looking at the wrong logs. I did reproduce it
with UPDATE, but not INSERT. The example I used for UPDATE:
CREATE OR REPLACE FUNCTION public.calendar_manage(room_id int,
calendar_date date)
RETURNS void
LANGUAGE SQL
BEGIN ATOMIC
WITH update_calendar AS (
UPDATE calendar
SET room_id = $1
WHERE
room_id = $1 AND
calendar_date = $2
)
INSERT INTO calendar (room_id, status, calendar_date, calendar_range)
SELECT $1, c.status, $2, c.calendar_range
FROM calendar_generate_calendar($1, tstzrange($2, $2 + 1)) c;
END;
which produced:
WITH update_calendar AS (
UPDATE public.calendar SET room_id = calendar_manage.room_id
WHERE (
(calendar_1.room_id OPERATOR(pg_catalog.=)
calendar_manage.room_id) AND (calendar_1.calendar_date
OPERATOR(pg_catalog.=) calendar_manage.calendar_date))
)
INSERT INTO public.calendar (room_id, status, calendar_date,
calendar_range) SELECT calendar_manage.room_id,
c.status,
calendar_manage.calendar_date,
c.calendar_range
FROM public.calendar_generate_calendar(calendar_manage.room_id,
pg_catalog.tstzrange((calendar_manage.calendar_date)::timestamp with
time zone, ((calendar_manage.calendar_date OPERATOR(pg_catalog.+)
1))::timestamp with time zone)) c(status, calendar_range);
Thanks,
Jonathan
"Jonathan S. Katz" <jkatz@postgresql.org> writes:
I spoke too soon -- I was looking at the wrong logs. I did reproduce it
with UPDATE, but not INSERT.
It can be reproduced with INSERT too, on the same principle as the others:
put the DML command inside a WITH, and give it an alias conflicting with
the outer query.
Being a lazy sort, I tried to collapse all three cases into a single
test case, and observed something I hadn't thought of: we disambiguate
aliases in a WITH query with respect to the outer query, but not with
respect to other WITH queries. This makes the example (see attached)
a bit more confusing than I would have hoped. However, the same sort
of thing happens within other kinds of nested subqueries, so I think
it's probably all right as-is. In any case, changing this aspect
would require a significantly bigger patch with more risk of unwanted
side-effects.
To fix it, I pulled out the print-an-alias logic within
get_from_clause_item and called that new function for
INSERT/UPDATE/DELETE. This is a bit of overkill perhaps, because
only the RTE_RELATION case can be needed by these other callers, but
it seemed like a sane refactorization.
I've not tested, but I imagine this will need patched all the way back.
The rule case should be reachable in all supported versions.
regards, tom lane
Attachments:
generalize-DML-target-alias-logic-1.patchtext/x-diff; charset=us-ascii; name=generalize-DML-target-alias-logic-1.patchDownload+130-83
On 2/17/23 1:18 PM, Tom Lane wrote:
It can be reproduced with INSERT too, on the same principle as the others:
put the DML command inside a WITH, and give it an alias conflicting with
the outer query.
Ah, I see based on your example below. I did not alias the INSERT
statement in the way (and I don't know how common of a pattern it is to
o that).
Being a lazy sort, I tried to collapse all three cases into a single
test case, and observed something I hadn't thought of: we disambiguate
aliases in a WITH query with respect to the outer query, but not with
respect to other WITH queries. This makes the example (see attached)
a bit more confusing than I would have hoped. However, the same sort
of thing happens within other kinds of nested subqueries, so I think
it's probably all right as-is. In any case, changing this aspect
would require a significantly bigger patch with more risk of unwanted
side-effects.
I think I agree. Most people should not be looking at the disambiguated
statements unless they are troubleshooting an issue (such as $SUBJECT).
The main goal is to disambiguate correctly.
To fix it, I pulled out the print-an-alias logic within
get_from_clause_item and called that new function for
INSERT/UPDATE/DELETE. This is a bit of overkill perhaps, because
only the RTE_RELATION case can be needed by these other callers, but
it seemed like a sane refactorization.I've not tested, but I imagine this will need patched all the way back.
The rule case should be reachable in all supported versions.
I tested this against HEAD (+v69 of the DDL replication patch). My cases
are now all passing.
The code looks good to me -- I don't know if moving that logic is
overkill, but it makes the solution relatively clean.
I didn't test in any back branches yet, but given this can generate an
invalid function body, it does likely need to be backpatched.
Thanks,
Jonathan
"Jonathan S. Katz" <jkatz@postgresql.org> writes:
On 2/17/23 1:18 PM, Tom Lane wrote:
It can be reproduced with INSERT too, on the same principle as the others:
put the DML command inside a WITH, and give it an alias conflicting with
the outer query.
Ah, I see based on your example below. I did not alias the INSERT
statement in the way (and I don't know how common of a pattern it is to
o that).
I suppose you can also make examples where the true name of the DML
target table conflicts with an outer-query name, implying that we need
to give it an alias even though the user wrote none.
I tested this against HEAD (+v69 of the DDL replication patch). My cases
are now all passing.
The code looks good to me -- I don't know if moving that logic is
overkill, but it makes the solution relatively clean.
Cool, thanks for testing and code-reading. I'll go see about
back-patching.
I didn't test in any back branches yet, but given this can generate an
invalid function body, it does likely need to be backpatched.
Presumably it can also cause dump/restore failures for rules that
do this sort of thing, though admittedly those wouldn't be common.
regards, tom lane