Getting rid of SQLValueFunction

Started by Michael Paquierover 3 years ago17 messageshackers
Jump to latest
#1Michael Paquier
michael@paquier.xyz

Hi all,

I have bumped a few days ago on the fact that COERCE_SQL_SYNTAX
(introduced by 40c24bf) and SQLValueFunction are around to do the
exact same thing, as known as enforcing single-function calls with
dedicated SQL keywords. For example, keywords like SESSION_USER,
CURRENT_DATE, etc. go through SQLValueFunction and rely on the parser
to set a state that gets then used in execExprInterp.c. And it is
rather easy to implement incorrect SQLValueFunctions, as these rely on
more hardcoded assumptions in the parser and executor than the
equivalent FuncCalls (like collation to assign when using a text-like
SQLValueFunctions).

There are two categories of single-value functions:
- The ones returning names, where we enforce a C collation in two
places of the code (current_role, role, current_catalog,
current_schema, current_database, current_user), even if
get_typcollation() should do that for name types.
- The ones working on time, date and timestamps (localtime[stamp],
current_date, current_time[stamp]), for 9 patterns as these accept an
optional typmod.

I have dug into the possibility to unify all that with a single
interface, and finish with the attached patch set which is a reduction
of code, where all the SQLValueFunctions are replaced by a set of
FuncCalls:
25 files changed, 338 insertions(+), 477 deletions(-)

0001 is the move done for the name-related functions, cleaning up two
places in the executor when a C collation is assigned to those
function expressions. 0002 is the remaining cleanup for the
time-related ones, moving a set of parser-side checks to the execution
path within each function, so as all this knowledge is now local to
each file holding the date and timestamp types. Most of the gain is
in 0002, obviously.

The pg_proc entries introduced for the sake of the move use the same
name as the SQL keywords. These should perhaps be prefixed with a
"pg_" at least. There would be an exception with pg_localtime[stamp],
though, where we could use a pg_localtime[stamp]_sql for the function
name for prosrc. I am open to suggestions for these names.

Thoughts?
--
Michael

Attachments:

0001-Remove-from-SQLValueFunctionOp-all-name-based-functi.patchtext/x-diff; charset=us-asciiDownload+68-94
0002-Replace-SQLValueFunction-by-direct-function-calls.patchtext/x-diff; charset=us-asciiDownload+272-396
#2Corey Huinker
corey.huinker@gmail.com
In reply to: Michael Paquier (#1)
Re: Getting rid of SQLValueFunction

On Fri, Sep 30, 2022 at 2:04 AM Michael Paquier <michael@paquier.xyz> wrote:

Hi all,

I have bumped a few days ago on the fact that COERCE_SQL_SYNTAX
(introduced by 40c24bf) and SQLValueFunction are around to do the
exact same thing, as known as enforcing single-function calls with
dedicated SQL keywords. For example, keywords like SESSION_USER,
CURRENT_DATE, etc. go through SQLValueFunction and rely on the parser
to set a state that gets then used in execExprInterp.c. And it is
rather easy to implement incorrect SQLValueFunctions, as these rely on
more hardcoded assumptions in the parser and executor than the
equivalent FuncCalls (like collation to assign when using a text-like
SQLValueFunctions).

There are two categories of single-value functions:
- The ones returning names, where we enforce a C collation in two
places of the code (current_role, role, current_catalog,
current_schema, current_database, current_user), even if
get_typcollation() should do that for name types.
- The ones working on time, date and timestamps (localtime[stamp],
current_date, current_time[stamp]), for 9 patterns as these accept an
optional typmod.

I have dug into the possibility to unify all that with a single
interface, and finish with the attached patch set which is a reduction
of code, where all the SQLValueFunctions are replaced by a set of
FuncCalls:
25 files changed, 338 insertions(+), 477 deletions(-)

0001 is the move done for the name-related functions, cleaning up two
places in the executor when a C collation is assigned to those
function expressions. 0002 is the remaining cleanup for the
time-related ones, moving a set of parser-side checks to the execution
path within each function, so as all this knowledge is now local to
each file holding the date and timestamp types. Most of the gain is
in 0002, obviously.

The pg_proc entries introduced for the sake of the move use the same
name as the SQL keywords. These should perhaps be prefixed with a
"pg_" at least. There would be an exception with pg_localtime[stamp],
though, where we could use a pg_localtime[stamp]_sql for the function
name for prosrc. I am open to suggestions for these names.

Thoughts?
--
Michael

I like this a lot. Deleted code is debugged code.

Patch applies and passes make check-world.

No trace of SQLValueFunction is left in the codebase, at least according to
`git grep -l`.

I have only one non-nitpick question about the code:

+ /*
+ * we're not too tense about good error message here because grammar
+ * shouldn't allow wrong number of modifiers for TIME
+ */
+ if (n != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid type modifier")));

I agree that we shouldn't spend too much effort on a good error message
here, but perhaps we should have the message mention that it is
date/time-related? A person git-grepping for this error message will get 4
hits in .c files (date.c, timestamp.c, varbit.c, varchar.c) so even a
slight variation in the message could save them some time.

This is an extreme nitpick, but the patchset seems like it should have been
1 file or 3 (remove name functions, remove time functions, remove
SQLValueFunction infrastructure), but that will only matter in the unlikely
case that we find a need for SQLValueFunction but we want to leave the
timestamp function as COERCE_SQL_SYNTAX.

#3Michael Paquier
michael@paquier.xyz
In reply to: Corey Huinker (#2)
Re: Getting rid of SQLValueFunction

(Adding Tom in CC, in case.)

On Tue, Oct 18, 2022 at 04:35:33PM -0400, Corey Huinker wrote:

I agree that we shouldn't spend too much effort on a good error message
here, but perhaps we should have the message mention that it is
date/time-related? A person git-grepping for this error message will get 4
hits in .c files (date.c, timestamp.c, varbit.c, varchar.c) so even a
slight variation in the message could save them some time.

The message is the same between HEAD and the patch and these have been
around for a long time, except that we would see it at parsing time on
HEAD, and at executor time with the patch. I would not mind changing
if there are better ideas than what's used now, of course ;)

This is an extreme nitpick, but the patchset seems like it should have been
1 file or 3 (remove name functions, remove time functions, remove
SQLValueFunction infrastructure), but that will only matter in the unlikely
case that we find a need for SQLValueFunction but we want to leave the
timestamp function as COERCE_SQL_SYNTAX.

Once the timestamp functions are removed, SQLValueFunction is just
dead code so including its removal in 0002 does not change much in my
opinion.

An other thing I had on my list for this patch was to check its
performance impact. So I have spent some time having a look at the
perf profiles produced on HEAD and with the patch using queries like
SELECT current_role FROM generate_series(1,N) where N > 10M and I have
not noticed any major differences in runtime or in the profiles, at
the difference that we don't have anymore SQLValueFunction() and its
internal functions called, hence they are missing from the stacks, but
that's the whole point of the patch.

With this in mind, would somebody complain if I commit that? That's a
nice reduction in code, while completing the work done in 40c24bf:
25 files changed, 338 insertions(+), 477 deletions(-)

Thanks,
--
Michael

#4Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#3)
Re: Getting rid of SQLValueFunction

On Wed, Oct 19, 2022 at 03:45:48PM +0900, Michael Paquier wrote:

With this in mind, would somebody complain if I commit that? That's a
nice reduction in code, while completing the work done in 40c24bf:
25 files changed, 338 insertions(+), 477 deletions(-)

On second look, there is something I have underestimated here with
FigureColnameInternal(). This function would create an attribute name
based on the SQL keyword given in input. For example, on HEAD we
would get that:
=# SELECT * FROM CURRENT_CATALOG;
current_catalog
-----------------
postgres
(1 row)

But the patch enforces the attribute name to be the underlying
function name, switching the previous "current_catalog" to
"current_database". For example:
=# SELECT * FROM CURRENT_CATALOG;
current_database
------------------
postgres
(1 row)

I am not sure how much it matters in practice, but this could break
some queries. One way to tackle that is to extend
FigureColnameInternal() so as we use a compatible name when the node
is a T_FuncCall, but that won't be entirely water-proof as long as
there is not a one-one mapping between the SQL keywords and the
underlying function names, aka we would need a current_catalog.
"user" would be also too generic as a catalog function name, so we
should name its proc entry to a pg_user anyway, requiring a shortcut
in FigureColnameInternal(). Or perhaps I am worrying too much and
keeping the code simpler is better? Does the SQL specification
require that the attribute name has to match its SQL keyword when
specified in a FROM clause when there is no aliases?

Thoughts?
--
Michael

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Michael Paquier (#4)
Re: Getting rid of SQLValueFunction

Michael Paquier <michael@paquier.xyz> writes:

But the patch enforces the attribute name to be the underlying
function name, switching the previous "current_catalog" to
"current_database".

The entire point of SQLValueFunction IMO was to hide the underlying
implementation(s). Replacing it with something that leaks
implementation details does not seem like a step forward.

regards, tom lane

#6Michael Paquier
michael@paquier.xyz
In reply to: Tom Lane (#5)
Re: Getting rid of SQLValueFunction

On Thu, Oct 20, 2022 at 11:10:22PM -0400, Tom Lane wrote:

The entire point of SQLValueFunction IMO was to hide the underlying
implementation(s). Replacing it with something that leaks
implementation details does not seem like a step forward.

Hmm.. Okay, thanks. So this just comes down that I am going to need
one different pg_proc entry per SQL keyword, then, or this won't fly
far. For example, note that on HEAD or with the patch, a view with a
SQL keyword in a FROM clause translates the same way with quotes
applied in the same places, as of:
=# create view test as select (SELECT * FROM CURRENT_USER) as cu;
CREATE VIEW
=# select pg_get_viewdef('test', true);
pg_get_viewdef
---------------------------------------------------------------------
SELECT ( SELECT "current_user"."current_user" +
FROM CURRENT_USER "current_user"("current_user")) AS cu;
(1 row)

A sticky point is that this would need the creation of a pg_proc entry
for "user" which is a generic word, or a shortcut around
FigureColnameInternal(). The code gain overall still looks appealing
in the executor, even if we do all that and the resulting backend code
gets kind of nicer and easier to maintain long-term IMO.
--
Michael

#7Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#6)
Re: Getting rid of SQLValueFunction

On Fri, Oct 21, 2022 at 12:34:23PM +0900, Michael Paquier wrote:

A sticky point is that this would need the creation of a pg_proc entry
for "user" which is a generic word, or a shortcut around
FigureColnameInternal(). The code gain overall still looks appealing
in the executor, even if we do all that and the resulting backend code
gets kind of nicer and easier to maintain long-term IMO.

I have looked at that, and the attribute mapping remains compatible
with past versions once the appropriate pg_proc entries are added.
The updated patch set attached does that (with a user() function as
well to keep the code a maximum simple), with more tests to cover the
attribute case mentioned upthread.
--
Michael

Attachments:

v2-0001-Remove-from-SQLValueFunction-all-the-entries-usin.patchtext/x-diff; charset=us-asciiDownload+105-103
v2-0002-Replace-SQLValueFunction-by-direct-function-calls.patchtext/x-diff; charset=us-asciiDownload+308-393
#8Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#7)
Re: Getting rid of SQLValueFunction

On Fri, Oct 21, 2022 at 02:27:07PM +0900, Michael Paquier wrote:

I have looked at that, and the attribute mapping remains compatible
with past versions once the appropriate pg_proc entries are added.
The updated patch set attached does that (with a user() function as
well to keep the code a maximum simple), with more tests to cover the
attribute case mentioned upthread.

Attached is a rebased patch set, as of the conflicts from 2e0d80c.
--
Michael

Attachments:

v3-0001-Remove-from-SQLValueFunction-all-the-entries-usin.patchtext/x-diff; charset=us-asciiDownload+55-99
v3-0002-Replace-SQLValueFunction-by-direct-function-calls.patchtext/x-diff; charset=us-asciiDownload+245-393
#9Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#8)
Re: Getting rid of SQLValueFunction

On Tue, Oct 25, 2022 at 02:20:12PM +0900, Michael Paquier wrote:

Attached is a rebased patch set, as of the conflicts from 2e0d80c.

So, this patch set has been sitting in the CF app for a few weeks now,
and I would like to apply them to remove a bit of code from the
executor.

Please note that in order to avoid tweaks when choosing the attribute
name of function call, this needs a total of 8 new catalog functions
mapping to the SQL keywords, which is what the test added by 2e0d80c
is about:
- current_role
- user
- current_catalog
- current_date
- current_time
- current_timestamp
- localtime
- localtimestamp

Any objections?
--
Michael

#10Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#9)
Re: Getting rid of SQLValueFunction

On Fri, Nov 18, 2022 at 10:23:58AM +0900, Michael Paquier wrote:

Please note that in order to avoid tweaks when choosing the attribute
name of function call, this needs a total of 8 new catalog functions
mapping to the SQL keywords, which is what the test added by 2e0d80c
is about:
- current_role
- user
- current_catalog
- current_date
- current_time
- current_timestamp
- localtime
- localtimestamp

Any objections?

Hearing nothing, I have gone through 0001 again and applied it as
fb32748 to remove the dependency between names and SQLValueFunction.
Attached is 0002, to bring back the CI to a green state.
--
Michael

Attachments:

v4-0001-Replace-SQLValueFunction-by-direct-function-calls.patchtext/x-diff; charset=us-asciiDownload+245-393
#11Ted Yu
yuzhihong@gmail.com
In reply to: Michael Paquier (#10)
Re: Getting rid of SQLValueFunction

On Sat, Nov 19, 2022 at 7:01 PM Michael Paquier <michael@paquier.xyz> wrote:

On Fri, Nov 18, 2022 at 10:23:58AM +0900, Michael Paquier wrote:

Please note that in order to avoid tweaks when choosing the attribute
name of function call, this needs a total of 8 new catalog functions
mapping to the SQL keywords, which is what the test added by 2e0d80c
is about:
- current_role
- user
- current_catalog
- current_date
- current_time
- current_timestamp
- localtime
- localtimestamp

Any objections?

Hearing nothing, I have gone through 0001 again and applied it as
fb32748 to remove the dependency between names and SQLValueFunction.
Attached is 0002, to bring back the CI to a green state.
--
Michael

Hi,
For get_func_sql_syntax(), the code for cases
of F_CURRENT_TIME, F_CURRENT_TIMESTAMP, F_LOCALTIME and F_LOCALTIMESTAMP is
mostly the same.
Maybe we can introduce a helper so that code duplication is reduced.

Cheers

#12Michael Paquier
michael@paquier.xyz
In reply to: Ted Yu (#11)
Re: Getting rid of SQLValueFunction

On Sun, Nov 20, 2022 at 08:21:10AM -0800, Ted Yu wrote:

For get_func_sql_syntax(), the code for cases
of F_CURRENT_TIME, F_CURRENT_TIMESTAMP, F_LOCALTIME and F_LOCALTIMESTAMP is
mostly the same.
Maybe we can introduce a helper so that code duplication is reduced.

It would. Thanks for the suggestion.

Do you like something like the patch 0002 attached? This reduces a
bit the overall size of the patch. Both ought to be merged in the
same commit, still it is easier to see the simplification created.
--
Michael

Attachments:

v5-0001-Replace-SQLValueFunction-by-direct-function-calls.patchtext/x-diff; charset=us-asciiDownload+245-393
v5-0002-Simplify-back-parsing-of-SQL-callable-timestamp-f.patchtext/x-diff; charset=us-asciiDownload+31-49
#13Ted Yu
yuzhihong@gmail.com
In reply to: Michael Paquier (#12)
Re: Getting rid of SQLValueFunction

On Sun, Nov 20, 2022 at 3:12 PM Michael Paquier <michael@paquier.xyz> wrote:

On Sun, Nov 20, 2022 at 08:21:10AM -0800, Ted Yu wrote:

For get_func_sql_syntax(), the code for cases
of F_CURRENT_TIME, F_CURRENT_TIMESTAMP, F_LOCALTIME and F_LOCALTIMESTAMP

is

mostly the same.
Maybe we can introduce a helper so that code duplication is reduced.

It would. Thanks for the suggestion.

Do you like something like the patch 0002 attached? This reduces a
bit the overall size of the patch. Both ought to be merged in the
same commit, still it is easier to see the simplification created.
--
Michael

Hi,
Thanks for the quick response.

+ * timestamp.  These require a specific handling with their typmod is given
+ * by the function caller through their SQL keyword.

typo: typmod is given -> typmod given

Other than the above, code looks good to me.

Cheers

#14Michael Paquier
michael@paquier.xyz
In reply to: Ted Yu (#13)
Re: Getting rid of SQLValueFunction

On Sun, Nov 20, 2022 at 03:15:34PM -0800, Ted Yu wrote:

+ * timestamp.  These require a specific handling with their typmod is given
+ * by the function caller through their SQL keyword.

typo: typmod is given -> typmod given

Other than the above, code looks good to me.

Thanks for double-checking. I intended a different wording, actually,
so fixed this one. And applied after an extra round of reviews.
--
Michael

#15Ian Lawrence Barwick
barwick@gmail.com
In reply to: Michael Paquier (#14)
Re: Getting rid of SQLValueFunction

Hi

2022年11月21日(月) 18:39 Michael Paquier <michael@paquier.xyz>:

On Sun, Nov 20, 2022 at 03:15:34PM -0800, Ted Yu wrote:

+ * timestamp.  These require a specific handling with their typmod is given
+ * by the function caller through their SQL keyword.

typo: typmod is given -> typmod given

Other than the above, code looks good to me.

Thanks for double-checking. I intended a different wording, actually,
so fixed this one. And applied after an extra round of reviews.

I noticed this commit (f193883f) introduces following regressions:

postgres=# SELECT current_timestamp(7);
WARNING: TIMESTAMP(7) WITH TIME ZONE precision reduced to maximum
allowed, 6
ERROR: timestamp(7) precision must be between 0 and 6

postgres=# SELECT localtimestamp(7);
WARNING: TIMESTAMP(7) precision reduced to maximum allowed, 6
ERROR: timestamp(7) precision must be between 0 and 6

Suggested fix attached.

Regards

Ian Barwick

Attachments:

v1-0001-Fix-current-local-timestamp-precision-reduction.patchtext/x-patch; charset=US-ASCII; name=v1-0001-Fix-current-local-timestamp-precision-reduction.patchDownload+20-9
#16Michael Paquier
michael@paquier.xyz
In reply to: Ian Lawrence Barwick (#15)
Re: Getting rid of SQLValueFunction

On Fri, Dec 30, 2022 at 10:57:52AM +0900, Ian Lawrence Barwick wrote:

I noticed this commit (f193883f) introduces following regressions:

postgres=# SELECT current_timestamp(7);
WARNING: TIMESTAMP(7) WITH TIME ZONE precision reduced to maximum
allowed, 6
ERROR: timestamp(7) precision must be between 0 and 6

postgres=# SELECT localtimestamp(7);
WARNING: TIMESTAMP(7) precision reduced to maximum allowed, 6
ERROR: timestamp(7) precision must be between 0 and 6

Suggested fix attached.

Thanks for the report, Ian. Will fix.
--
Michael

#17Michael Paquier
michael@paquier.xyz
In reply to: Ian Lawrence Barwick (#15)
Re: Getting rid of SQLValueFunction

On Fri, Dec 30, 2022 at 10:57:52AM +0900, Ian Lawrence Barwick wrote:

I noticed this commit (f193883f) introduces following regressions:

postgres=# SELECT current_timestamp(7);
WARNING: TIMESTAMP(7) WITH TIME ZONE precision reduced to maximum
allowed, 6
ERROR: timestamp(7) precision must be between 0 and 6

postgres=# SELECT localtimestamp(7);
WARNING: TIMESTAMP(7) precision reduced to maximum allowed, 6
ERROR: timestamp(7) precision must be between 0 and 6

Suggested fix attached.

The two changes in timestamp.c are fine, Now I can see that the same
mistake was introduced in date.c. The WARNINGs were issued and the
compilation went through the same way as the default, but they passed
down an incorrect precision, so I have fixed all that. Coverage has
been added for all four, while the patch proposed covered only two.
--
Michael