some grammar refactoring
Here is a series of patches to do some refactoring in the grammar around
the commands COMMENT, DROP, SECURITY LABEL, and ALTER EXTENSION ...
ADD/DROP. In the grammar, these commands (with some exceptions)
basically just take a reference to an object and later look it up in C
code. Some of that was already generalized individually for each
command (drop_type_any_name, drop_type_name, etc.). This patch combines
it into common lists for all these commands.
Advantages:
- Avoids having to list each object type at least four times.
- Object types not supported by security labels or extensions are now
explicitly listed and give a proper error message. Previously, this was
just encoded in the grammar itself and specifying a non-supported object
type would just give a parse error.
- Reduces lines of code in gram.y.
- Removes some old cruft.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
v1-0003-Remove-deprecated-syntax-from-CREATE-DROP-LANGUAG.patchtext/plain; charset=UTF-8; name=v1-0003-Remove-deprecated-syntax-from-CREATE-DROP-LANGUAG.patch; x-mac-creator=0; x-mac-type=0Download+5-12
v1-0004-Refactor-DROP-LANGUAGE-grammar.patchtext/plain; charset=UTF-8; name=v1-0004-Refactor-DROP-LANGUAGE-grammar.patch; x-mac-creator=0; x-mac-type=0Download+2-26
v1-0005-Grammar-object-type-refactoring.patchtext/plain; charset=UTF-8; name=v1-0005-Grammar-object-type-refactoring.patch; x-mac-creator=0; x-mac-type=0Download+121-115
v1-0006-Refactor-AlterExtensionContentsStmt-grammar.patchtext/plain; charset=UTF-8; name=v1-0006-Refactor-AlterExtensionContentsStmt-grammar.patch; x-mac-creator=0; x-mac-type=0Download+31-150
v1-0001-Remove-redundant-grammar-symbols.patchtext/plain; charset=UTF-8; name=v1-0001-Remove-redundant-grammar-symbols.patch; x-mac-creator=0; x-mac-type=0Download+41-50
v1-0002-Fold-AlterForeignTableStmt-into-AlterTableStmt.patchtext/plain; charset=UTF-8; name=v1-0002-Fold-AlterForeignTableStmt-into-AlterTableStmt.patch; x-mac-creator=0; x-mac-type=0Download+19-31
On Tue, May 19, 2020 at 2:43 AM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
Here is a series of patches to do some refactoring in the grammar around
the commands COMMENT, DROP, SECURITY LABEL, and ALTER EXTENSION ...
ADD/DROP. In the grammar, these commands (with some exceptions)
basically just take a reference to an object and later look it up in C
code. Some of that was already generalized individually for each
command (drop_type_any_name, drop_type_name, etc.). This patch combines
it into common lists for all these commands.
I haven't reviewed the code, but +1 for the idea.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Tue, May 19, 2020 at 12:13 PM Peter Eisentraut <
peter.eisentraut@2ndquadrant.com> wrote:
Here is a series of patches to do some refactoring in the grammar around
the commands COMMENT, DROP, SECURITY LABEL, and ALTER EXTENSION ...
ADD/DROP. In the grammar, these commands (with some exceptions)
basically just take a reference to an object and later look it up in C
code. Some of that was already generalized individually for each
command (drop_type_any_name, drop_type_name, etc.). This patch combines
it into common lists for all these commands.Advantages:
- Avoids having to list each object type at least four times.
- Object types not supported by security labels or extensions are now
explicitly listed and give a proper error message. Previously, this was
just encoded in the grammar itself and specifying a non-supported object
type would just give a parse error.- Reduces lines of code in gram.y.
- Removes some old cruft.
I liked the idea.
I had quick glance through the patches and also did quick review and
testing.
I haven't found any issue with the patch.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Rushabh Lathia
On 2020-05-19 08:43, Peter Eisentraut wrote:
Here is a series of patches to do some refactoring in the grammar around
the commands COMMENT, DROP, SECURITY LABEL, and ALTER EXTENSION ...
ADD/DROP. In the grammar, these commands (with some exceptions)
basically just take a reference to an object and later look it up in C
code. Some of that was already generalized individually for each
command (drop_type_any_name, drop_type_name, etc.). This patch combines
it into common lists for all these commands.
While most of this patch set makes no behavior changes by design, I
should point out this little change hidden in the middle:
Remove deprecated syntax from CREATE/DROP LANGUAGE
Remove the option to specify the language name as a single-quoted
string. This has been obsolete since ee8ed85da3b. Removing it allows
better grammar refactoring.
The syntax of the CREATE FUNCTION LANGUAGE clause is not changed.
(ee8ed85da3b is in PG 7.2.)
I expect this to be uncontroversial, but it should be pointed out.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On May 18, 2020, at 11:43 PM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
Here is a series of patches to do some refactoring in the grammar around the commands COMMENT, DROP, SECURITY LABEL, and ALTER EXTENSION ... ADD/DROP. In the grammar, these commands (with some exceptions) basically just take a reference to an object and later look it up in C code. Some of that was already generalized individually for each command (drop_type_any_name, drop_type_name, etc.). This patch combines it into common lists for all these commands.
Advantages:
- Avoids having to list each object type at least four times.
- Object types not supported by security labels or extensions are now explicitly listed and give a proper error message. Previously, this was just encoded in the grammar itself and specifying a non-supported object type would just give a parse error.
- Reduces lines of code in gram.y.
- Removes some old cruft.
I like the general direction you are going with this, but the decision in v1-0006 to move the error for invalid object types out of gram.y and into extension.c raises an organizational question. At some places in gram.y, there is C code that checks parsed tokens and ereports if they are invalid, in some sense extending the grammar right within gram.y. In many other places, including what you are doing in this patch, the token is merely stored in a Stmt object with the error checking delayed until command processing. For tokens which need to be checked against the catalogs, that decision makes perfect sense. But for ones where all the information necessary to validate the token exists in the parser, it is not clear to me why it gets delayed until command processing. Is there a design principle behind when these checks are done in gram.y vs. when they are delayed to the command processing? I'm guessing in v1-0006 that you are doing it this way because there are multiple places in gram.y where tokens would need to be checked, and by delaying the check until ExecAlterExtensionContentsStmt, you can put the check all in one place. Is that all it is?
I have had reason in the past to want to reorganize gram.y to have all these types of checks in a single, consistent format and location, rather than scattered through gram.y and backend/commands/. Does anybody else have an interest in this?
My interest in this stems from the fact that bison can be run to generate data files that can then be used in reverse to generate random SQL. The more the parsing logic is visible to bison, the more useful the generated data files are. But a single, consistent design for extra-grammatical error checks could help augment those files fairly well, too.
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On 2020-05-22 18:53, Mark Dilger wrote:
I like the general direction you are going with this, but the decision in v1-0006 to move the error for invalid object types out of gram.y and into extension.c raises an organizational question. At some places in gram.y, there is C code that checks parsed tokens and ereports if they are invalid, in some sense extending the grammar right within gram.y. In many other places, including what you are doing in this patch, the token is merely stored in a Stmt object with the error checking delayed until command processing. For tokens which need to be checked against the catalogs, that decision makes perfect sense. But for ones where all the information necessary to validate the token exists in the parser, it is not clear to me why it gets delayed until command processing. Is there a design principle behind when these checks are done in gram.y vs. when they are delayed to the command processing? I'm guessing in v1-0006 that you are doing it this way because there are multiple places in gram.y where tokens would need to be checked, and by delaying the check until ExecAlterExtensionContentsStmt, you can put the check all in one place. Is that all it is?
We have been for some time moving to a style where we rely on switch
statements around OBJECT_* constants to (a) decide what is allowed with
certain object types, and (b) make sure we have an explicit decision on
each object type and don't forget any. This has worked well, I think.
This is more of that. Before this patch, it would have been pretty hard
to find out which object types are supported with extensions or security
labels, except by very carefully reading the grammar.
Moreover, you now get a proper error message for unsupported object
types rather than just a generic parse error.
I have had reason in the past to want to reorganize gram.y to have all these types of checks in a single, consistent format and location, rather than scattered through gram.y and backend/commands/. Does anybody else have an interest in this?
My interest in this stems from the fact that bison can be run to generate data files that can then be used in reverse to generate random SQL. The more the parsing logic is visible to bison, the more useful the generated data files are. But a single, consistent design for extra-grammatical error checks could help augment those files fairly well, too.
It's certainly already the case that the grammar accepts statements that
end up being invalid, even if you ignore catalog lookup. I don't think
my patch moves the needle on this in a significant way.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On May 25, 2020, at 2:55 AM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
On 2020-05-22 18:53, Mark Dilger wrote:
I like the general direction you are going with this, but the decision in v1-0006 to move the error for invalid object types out of gram.y and into extension.c raises an organizational question. At some places in gram.y, there is C code that checks parsed tokens and ereports if they are invalid, in some sense extending the grammar right within gram.y. In many other places, including what you are doing in this patch, the token is merely stored in a Stmt object with the error checking delayed until command processing. For tokens which need to be checked against the catalogs, that decision makes perfect sense. But for ones where all the information necessary to validate the token exists in the parser, it is not clear to me why it gets delayed until command processing. Is there a design principle behind when these checks are done in gram.y vs. when they are delayed to the command processing? I'm guessing in v1-0006 that you are doing it this way because there are multiple places in gram.y where tokens would need to be checked, and by delaying the check until ExecAlterExtensionContentsStmt, you can put the check all in one place. Is that all it is?
We have been for some time moving to a style where we rely on switch statements around OBJECT_* constants to (a) decide what is allowed with certain object types, and (b) make sure we have an explicit decision on each object type and don't forget any. This has worked well, I think.
Yes, I think so, too. I like that overall design.
This is more of that.
Yes, it is.
Before this patch, it would have been pretty hard to find out which object types are supported with extensions or security labels, except by very carefully reading the grammar.
Fair enough.
Moreover, you now get a proper error message for unsupported object types rather than just a generic parse error.
Sounds great.
I have had reason in the past to want to reorganize gram.y to have all these types of checks in a single, consistent format and location, rather than scattered through gram.y and backend/commands/. Does anybody else have an interest in this?
My interest in this stems from the fact that bison can be run to generate data files that can then be used in reverse to generate random SQL. The more the parsing logic is visible to bison, the more useful the generated data files are. But a single, consistent design for extra-grammatical error checks could help augment those files fairly well, too.It's certainly already the case that the grammar accepts statements that end up being invalid, even if you ignore catalog lookup. I don't think my patch moves the needle on this in a significant way.
I don't think it moves the needle too much, either. But since your patch is entirely a refactoring patch and not a feature patch, I thought it would be fair to ask larger questions about how the code should be structured. I like using enums and switch statements and getting better error messages, but there doesn't seem to be any fundamental reason why that should be in the command execution step. It feels like a layering violation to me.
I don't object to this patch getting committed. A subsequent patch to consolidate all the grammar checks into src/backend/parser and out of src/backend/commands won't be blocked by this.
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On 2020-05-25 21:09, Mark Dilger wrote:
I don't think it moves the needle too much, either. But since your patch is entirely a refactoring patch and not a feature patch, I thought it would be fair to ask larger questions about how the code should be structured. I like using enums and switch statements and getting better error messages, but there doesn't seem to be any fundamental reason why that should be in the command execution step. It feels like a layering violation to me.
Most utility commands don't have an intermediate parse analysis pass.
They just go straight from the grammar to the execution. Maybe that
could be rethought, but that's the way it is now.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, May 26, 2020 at 4:28 AM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
On 2020-05-25 21:09, Mark Dilger wrote:
I don't think it moves the needle too much, either. But since your patch is entirely a refactoring patch and not a feature patch, I thought it would be fair to ask larger questions about how the code should be structured. I like using enums and switch statements and getting better error messages, but there doesn't seem to be any fundamental reason why that should be in the command execution step. It feels like a layering violation to me.
Most utility commands don't have an intermediate parse analysis pass.
They just go straight from the grammar to the execution. Maybe that
could be rethought, but that's the way it is now.
I think it can and should be rethought at some point. The present
split leads to a lot of weird coding. We've had security
vulnerabilities that were due to things like passing the same RangeVar
to two different places, leading to two different lookups for the name
that could be induced to return different OIDs. It also leads to a lot
of fuzzy thinking about where locks are taken, in which order, and how
many times, and with what strength. The code for queries seems to have
been thought through a lot more carefully, because the existence of
prepared queries makes mistakes a lot more noticeable. I hope some day
someone will be motivated to improve the situation for DDL as well,
though it will probably be a thankless task.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes:
On Tue, May 26, 2020 at 4:28 AM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:Most utility commands don't have an intermediate parse analysis pass.
They just go straight from the grammar to the execution. Maybe that
could be rethought, but that's the way it is now.
I think it can and should be rethought at some point.
The other problem is that the ones that do have explicit parse analysis
tend to be doing it at the wrong time. I've fixed some ALTER TABLE
problems by rearranging that, but we still have open bugs that are due
to this type of mistake, eg [1]/messages/by-id/16272-6e32da020e9a9381@postgresql.org. I agree that we need a rethink, and
we need it badly.
If this patch is changing when any parse-analysis-like actions happen,
then I would say that it needs very careful review --- much more than
the "refactoring" label would suggest. Maybe it's making things better,
or maybe it doesn't matter; but this area is a minefield.
regards, tom lane
On 2020-05-19 08:43, Peter Eisentraut wrote:
Here is a series of patches to do some refactoring in the grammar around
the commands COMMENT, DROP, SECURITY LABEL, and ALTER EXTENSION ...
ADD/DROP.
These patches have been committed.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services