psql: Fix CREATE SCHEMA scanning of nested routine bodies

Started by Chao Li1 day ago6 messageshackers

li.evan.chao@gmail.com

1 day ago

Hi,

While testing “[d51697484] Support more object types within CREATE SCHEMA”, I found an issue that seems to be a psql regression.

See this repro:
```
evantest=# CREATE SCHEMA s CREATE VIEW begin AS SELECT 1;
evantest-# end;
WARNING: there is no transaction in progress
CREATE SCHEMA
COMMIT
```

After the first CREATE SCHEMA line, the prompt changed from =# to -#, meaning psql appeared to be treating the view name “begin" as the start of a BEGIN ... END block. After I typed “end;" on the second line, the server executed the first line properly, then executed the second line as “end;", producing the “no transaction” warning. So, although psql waited for “end;", it still sent two statements to the server. This suggests a psql bug.

I then went back to commit 404db8f9, the immediate parent of d51697484, and ran the same test:
```
evantest=# CREATE SCHEMA s CREATE VIEW begin AS SELECT 1;
CREATE SCHEMA
evantest=#
evantest=# select * from s.begin;
?column?
----------
1
(1 row)
```

It created schema “s" and created a view named “begin" under “s". So this seems to confirm a psql regression introduced by d51697484.

Looking at psqlscan.l, d51697484 made BEGIN/END tracking apply to CREATE SCHEMA, which seems too broad. I think we should only track BEGIN ... END for CREATE SCHEMA after seeing CREATE [OR REPLACE] FUNCTION/PROCEDURE. Following this direction, I tried to make a fix.

I added some test cases, but I’m not sure whether 001_basic.pl is the right place for them. Should these tests be added to a new file, say 040_psqlscan.pl? I saw src/test/regress/sql/create_schema.sql, but I believe that is for server-side testing, so these client-side tests should not go there. Suggestions are appreciated.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

Tom Lane

tgl@sss.pgh.pa.us

1 day ago

In reply to: Chao Li (#1)

Re: psql: Fix CREATE SCHEMA scanning of nested routine bodies

Chao Li <li.evan.chao@gmail.com> writes:

See this repro:
```
evantest=# CREATE SCHEMA s CREATE VIEW begin AS SELECT 1;
evantest-# end;
WARNING: there is no transaction in progress
CREATE SCHEMA
COMMIT
```

Meh. Yeah, you can fool psql by using non-reserved keywords as
identifiers. I'm not super excited about adding complexity that
resolves specific edge cases of that kind, because there will always
be more of them. As an example, this fools both HEAD and the back
branches:

regression=# create function begin() returns int
regression-# begin atomic return 2+2; end;
regression-#

Is it really likely that anyone will use "begin" as an object name?

Having said that, I agree that what psqlscan.l is doing now is quite
hokey and maybe we should make it smarter. I don't like the details
of your patch too much though. It's too obviously bolted on: the
added parsing logic looks nothing like what was there before.
Also, I think it can still be fooled by

create schema s
create function f() ... as $$ function body $$
create view begin as ...

because you don't reset create_schema_routine_body unless you
see "end".

What I'd think about doing, if you want to pursue this, is to
have a buffer similar to cur_state->identifiers[] but it tracks
identifiers starting from the most recent CREATE sub-clause
start within CREATE SCHEMA (ie, a non-nested CREATE keyword),
and then applies basically the same logic as in the original code
to decide if we're within a CREATE FUNCTION sub-clause.

Also, you could avoid cluttering other productions by resetting
all the new state when identifier_count is zero, in the same place
where we reset cur_state->identifiers[] today.

It'd likely be appropriate to pull all this logic out of the
{identifier} production and put it in a subroutine, just because
it's getting unduly complicated.

regards, tom lane

Chao Li

li.evan.chao@gmail.com

1 day ago

In reply to: Tom Lane (#2)

Re: psql: Fix CREATE SCHEMA scanning of nested routine bodies

On Jun 23, 2026, at 00:04, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Chao Li <li.evan.chao@gmail.com> writes:

See this repro:
```
evantest=# CREATE SCHEMA s CREATE VIEW begin AS SELECT 1;
evantest-# end;
WARNING: there is no transaction in progress
CREATE SCHEMA
COMMIT
```

Meh. Yeah, you can fool psql by using non-reserved keywords as
identifiers. I'm not super excited about adding complexity that
resolves specific edge cases of that kind, because there will always
be more of them. As an example, this fools both HEAD and the back
branches:

regression=# create function begin() returns int
regression-# begin atomic return 2+2; end;
regression-#

Is it really likely that anyone will use "begin" as an object name?

Having said that, I agree that what psqlscan.l is doing now is quite
hokey and maybe we should make it smarter. I don't like the details
of your patch too much though. It's too obviously bolted on: the
added parsing logic looks nothing like what was there before.
Also, I think it can still be fooled by

create schema s
create function f() ... as $$ function body $$
create view begin as ...

because you don't reset create_schema_routine_body unless you
see "end".

What I'd think about doing, if you want to pursue this, is to
have a buffer similar to cur_state->identifiers[] but it tracks
identifiers starting from the most recent CREATE sub-clause
start within CREATE SCHEMA (ie, a non-nested CREATE keyword),
and then applies basically the same logic as in the original code
to decide if we're within a CREATE FUNCTION sub-clause.

Also, you could avoid cluttering other productions by resetting
all the new state when identifier_count is zero, in the same place
where we reset cur_state->identifiers[] today.

It'd likely be appropriate to pull all this logic out of the
{identifier} production and put it in a subroutine, just because
it's getting unduly complicated.

regards, tom lane

Thanks for the detailed explanation. I will try to rework the patch along the direction you suggested.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

Chao Li

li.evan.chao@gmail.com

about 19 hours ago

In reply to: Chao Li (#3)

Re: psql: Fix CREATE SCHEMA scanning of nested routine bodies

On Jun 23, 2026, at 07:08, Chao Li <li.evan.chao@gmail.com> wrote:

On Jun 23, 2026, at 00:04, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Chao Li <li.evan.chao@gmail.com> writes:

See this repro:
```
evantest=# CREATE SCHEMA s CREATE VIEW begin AS SELECT 1;
evantest-# end;
WARNING: there is no transaction in progress
CREATE SCHEMA
COMMIT
```

Meh. Yeah, you can fool psql by using non-reserved keywords as
identifiers. I'm not super excited about adding complexity that
resolves specific edge cases of that kind, because there will always
be more of them. As an example, this fools both HEAD and the back
branches:

regression=# create function begin() returns int
regression-# begin atomic return 2+2; end;
regression-#

Is it really likely that anyone will use "begin" as an object name?

Having said that, I agree that what psqlscan.l is doing now is quite
hokey and maybe we should make it smarter. I don't like the details
of your patch too much though. It's too obviously bolted on: the
added parsing logic looks nothing like what was there before.
Also, I think it can still be fooled by

create schema s
create function f() ... as $$ function body $$
create view begin as ...

because you don't reset create_schema_routine_body unless you
see "end".

What I'd think about doing, if you want to pursue this, is to
have a buffer similar to cur_state->identifiers[] but it tracks
identifiers starting from the most recent CREATE sub-clause
start within CREATE SCHEMA (ie, a non-nested CREATE keyword),
and then applies basically the same logic as in the original code
to decide if we're within a CREATE FUNCTION sub-clause.

Also, you could avoid cluttering other productions by resetting
all the new state when identifier_count is zero, in the same place
where we reset cur_state->identifiers[] today.

It'd likely be appropriate to pull all this logic out of the
{identifier} production and put it in a subroutine, just because
it's getting unduly complicated.

regards, tom lane

Thanks for the detailed explanation. I will try to rework the patch along the direction you suggested.

Following your suggestion, I moved the "{identifier}" logic into a new helper psqlscan_track_identifier(), and added “create_schema_identifiers" to PsqlScanStateData to track identifiers from the current top-level CREATE element within CREATE SCHEMA in the same way as the top level “identifiers". I also added a few more test cases.

Please see the attached v2 for details.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

Tom Lane

tgl@sss.pgh.pa.us

about 9 hours ago

In reply to: Chao Li (#4)

Re: psql: Fix CREATE SCHEMA scanning of nested routine bodies

Chao Li <li.evan.chao@gmail.com> writes:

Following your suggestion, I moved the "{identifier}" logic into a new helper psqlscan_track_identifier(), and added “create_schema_identifiers" to PsqlScanStateData to track identifiers from the current top-level CREATE element within CREATE SCHEMA in the same way as the top level “identifiers". I also added a few more test cases.

Please see the attached v2 for details.

Pushed with some mostly-cosmetic changes. Notably, I renamed the
existing "identifiers" field to help distinguish it from the new one.

I didn't use your test cases because it seemed unduly expensive to do
it that way. We can perfectly well exercise this as part of plain-SQL
regression testing, so I just added a booby-trap case in
create_schema.sql. Without this fix, psql marches on and tries to run
the various \d commands after the CREATE SCHEMA before it's sent the
CREATE SCHEMA command. So even though no error is reported, the
output is quite different.

regards, tom lane

Chao Li

li.evan.chao@gmail.com

about 2 hours ago

In reply to: Tom Lane (#5)

Re: psql: Fix CREATE SCHEMA scanning of nested routine bodies

On Jun 24, 2026, at 02:16, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Chao Li <li.evan.chao@gmail.com> writes:

Following your suggestion, I moved the "{identifier}" logic into a new helper psqlscan_track_identifier(), and added “create_schema_identifiers" to PsqlScanStateData to track identifiers from the current top-level CREATE element within CREATE SCHEMA in the same way as the top level “identifiers". I also added a few more test cases.

Please see the attached v2 for details.

Pushed with some mostly-cosmetic changes.

Thanks for pushing.

Notably, I renamed the
existing "identifiers" field to help distinguish it from the new one.

The renaming makes sense.

I didn't use your test cases because it seemed unduly expensive to do
it that way. We can perfectly well exercise this as part of plain-SQL
regression testing, so I just added a booby-trap case in
create_schema.sql Without this fix, psql marches on and tries to run
the various \d commands after the CREATE SCHEMA before it's sent the
CREATE SCHEMA command. So even though no error is reported, the
output is quite different.

I think I over worried about the test. Thank for simplifying the tests.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

psql: Fix CREATE SCHEMA scanning of nested routine bodies

Attachments:

Attachments: