simplify regular expression locale global variables
We currently have
static PG_Locale_Strategy pg_regex_strategy;
static pg_locale_t pg_regex_locale;
static Oid pg_regex_collation;
but after the recent improvements to pg_locale_t handling, we don't need
all three anymore. All the information we have is contained in
pg_locale_t, so we just need to keep that one. This allows us to
structure the locale-using regular expression code more similar to other
locale-using code, mainly by provider, avoiding another layer that is
specific only to the regular expression code. The first patch
implements that.
The second patch removes a call to pg_set_regex_collation() that I think
is unnecessary.
The third patch adds a pg_unset_regex_collation() call that undoes what
pg_set_regex_collation() does. I mainly used this to verify the second
patch, but maybe it's also useful on its own, not sure.
(I don't have any plans to get rid of the remaining global variable.
That would certainly be nice from an intellectual point of view, but
fiddling this into the regular expression code looks quite messy. In
any case, it's probably easier with one variable instead of three, if
someone wants to try.)
Attachments:
0001-Remove-pg_regex_collation-and-pg_regex_strategy.patchtext/plain; charset=UTF-8; name=0001-Remove-pg_regex_collation-and-pg_regex_strategy.patchDownload+185-246
0002-Remove-unneeded-pg_set_regex_collation-call.patchtext/plain; charset=UTF-8; name=0002-Remove-unneeded-pg_set_regex_collation-call.patchDownload+0-4
0003-WIP-Add-pg_unset_regex_collation.patchtext/plain; charset=UTF-8; name=0003-WIP-Add-pg_unset_regex_collation.patchDownload+24-5
On 2024-Oct-15, Peter Eisentraut wrote:
@@ -253,8 +241,9 @@ pg_set_regex_collation(Oid collation) * catalog access is available, so we can't call * pg_newlocale_from_collation(). */ + static struct pg_locale_struct dummy_locale = {.ctype_is_c = true}; + + locale = &dummy_locale; } else { @@ -264,121 +253,80 @@ pg_set_regex_collation(Oid collation) ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("nondeterministic collations are not supported for regular expressions"))); [...] }pg_regex_locale = locale;
}
Hmm, is it valid to make pg_regex_locale point to a function-local
static here? The lifetime of this static is not clear to me, and I
think this pattern works with at least some compilers, but I remember
comments on previous patch review threads that this pattern isn't
kosher.
--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
On 15.10.24 12:08, Alvaro Herrera wrote:
On 2024-Oct-15, Peter Eisentraut wrote:
@@ -253,8 +241,9 @@ pg_set_regex_collation(Oid collation) * catalog access is available, so we can't call * pg_newlocale_from_collation(). */ + static struct pg_locale_struct dummy_locale = {.ctype_is_c = true}; + + locale = &dummy_locale; } else { @@ -264,121 +253,80 @@ pg_set_regex_collation(Oid collation) ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("nondeterministic collations are not supported for regular expressions"))); [...] }pg_regex_locale = locale;
}Hmm, is it valid to make pg_regex_locale point to a function-local
static here? The lifetime of this static is not clear to me, and I
think this pattern works with at least some compilers, but I remember
comments on previous patch review threads that this pattern isn't
kosher.
I think this must be okay. Some classic non-thread-safe C library
functions essentially work that way, e.g.,
char *strerror(int errnum)
{
static char buf[...];
strcpy(buf, ....);
return buf;
}
and then you can use the return pointer wherever you want.
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
Hmm, is it valid to make pg_regex_locale point to a function-local
static here? The lifetime of this static is not clear to me, and I
think this pattern works with at least some compilers, but I remember
comments on previous patch review threads that this pattern isn't
kosher.
We use function-local statics in other places, and I have never
heard that it's not kosher. There would be little point in
declaring such a variable static at all if that didn't cause
it to have persistent storage.
regards, tom lane
Peter Eisentraut <peter@eisentraut.org> writes:
but after the recent improvements to pg_locale_t handling, we don't need
all three anymore. All the information we have is contained in
pg_locale_t, so we just need to keep that one. This allows us to
structure the locale-using regular expression code more similar to other
locale-using code, mainly by provider, avoiding another layer that is
specific only to the regular expression code. The first patch
implements that.
I didn't read that patch in detail; somebody who's more familiar than
I with the recent locale-code changes ought to read it and confirm
that no subtle behavioral changes are sneaking in. But +1 for
concept.
The second patch removes a call to pg_set_regex_collation() that I think
is unnecessary.
I think this is actively wrong. pg_regprefix is engaged in
determining whether there's a fixed prefix of the regex, which
at least involves a sort of symbolic execution. As an example,
whether '^x' has a fixed prefix surely depends on whether the locale
is case-insensitive. (It may be that we get such cases wrong today,
since pg_regprefix was written before we had ICU locales and I don't
know if anyone has revisited it with this in mind. But removing this
pg_set_regex_collation call is surely not going to make that better.
In any case, the gain of removing it must be microscopic.)
(I don't have any plans to get rid of the remaining global variable.
That would certainly be nice from an intellectual point of view, but
fiddling this into the regular expression code looks quite messy. In
any case, it's probably easier with one variable instead of three, if
someone wants to try.)
Yeah. Those global variables are my fault. I did try hard to avoid
having them, but came to the same conclusion that it was not worth
contorting the regex code to pass a locale pointer through it.
Maybe if we ever completely give up on maintaining code similarity
with the Tcl version, we should just bull ahead and do that; but for
now I don't want to.
regards, tom lane
On 15.10.24 17:04, Tom Lane wrote:
Peter Eisentraut <peter@eisentraut.org> writes:
but after the recent improvements to pg_locale_t handling, we don't need
all three anymore. All the information we have is contained in
pg_locale_t, so we just need to keep that one. This allows us to
structure the locale-using regular expression code more similar to other
locale-using code, mainly by provider, avoiding another layer that is
specific only to the regular expression code. The first patch
implements that.I didn't read that patch in detail; somebody who's more familiar than
I with the recent locale-code changes ought to read it and confirm
that no subtle behavioral changes are sneaking in. But +1 for
concept.
Ok, I'll wait for someone to give it a detailed review.
The second patch removes a call to pg_set_regex_collation() that I think
is unnecessary.I think this is actively wrong. pg_regprefix is engaged in> determining whether there's a fixed prefix of the regex, which
at least involves a sort of symbolic execution. As an example,
whether '^x' has a fixed prefix surely depends on whether the locale
is case-insensitive.
Hmm, okay, I'll leave this out for now and maybe come back to it later.
For the time being, here is a new patch with this part omitted.
Attachments:
v2-0001-Remove-pg_regex_collation-and-pg_regex_strategy.patchtext/plain; charset=UTF-8; name=v2-0001-Remove-pg_regex_collation-and-pg_regex_strategy.patchDownload+185-247
On 10/15/24 8:12 AM, Peter Eisentraut wrote:
We currently have
static PG_Locale_Strategy pg_regex_strategy;
static pg_locale_t pg_regex_locale;
static Oid pg_regex_collation;but after the recent improvements to pg_locale_t handling, we don't need
all three anymore. All the information we have is contained in
pg_locale_t, so we just need to keep that one. This allows us to
structure the locale-using regular expression code more similar to other
locale-using code, mainly by provider, avoiding another layer that is
specific only to the regular expression code. The first patch
implements that.
Jeff Davis has a patch which also fixes this while refactoring other
stuff too which I prefer over your patch since it also cleans up the
collation code in general.
/messages/by-id/2830211e1b6e6a2e26d845780b03e125281ea17b.camel@j-davis.com
The second patch removes a call to pg_set_regex_collation() that I think
is unnecessary.The third patch adds a pg_unset_regex_collation() call that undoes what
pg_set_regex_collation() does. I mainly used this to verify the second
patch, but maybe it's also useful on its own, not sure.(I don't have any plans to get rid of the remaining global variable.
That would certainly be nice from an intellectual point of view, but
fiddling this into the regular expression code looks quite messy. In
any case, it's probably easier with one variable instead of three, if
someone wants to try.)
I have not looked at your other two patches yet.
Andreas
On 25.10.24 10:16, Andreas Karlsson wrote:
On 10/15/24 8:12 AM, Peter Eisentraut wrote:
We currently have
static PG_Locale_Strategy pg_regex_strategy;
static pg_locale_t pg_regex_locale;
static Oid pg_regex_collation;but after the recent improvements to pg_locale_t handling, we don't
need all three anymore. All the information we have is contained in
pg_locale_t, so we just need to keep that one. This allows us to
structure the locale-using regular expression code more similar to
other locale-using code, mainly by provider, avoiding another layer
that is specific only to the regular expression code. The first patch
implements that.Jeff Davis has a patch which also fixes this while refactoring other
stuff too which I prefer over your patch since it also cleans up the
collation code in general.https://www.postgresql.org/message-
id/2830211e1b6e6a2e26d845780b03e125281ea17b.camel%40j-davis.com
That patch set looks like a good direction.
But it doesn't remove pg_regex_collation, only pg_regex_strategy. So I
have split my v2 into two patches, the first removes pg_regex_collation
and the second removes pg_regex_strategy. The first patch is useful on
its own, I think; the second one will presumably be replaced by the
other patch series above.
On Tue, 2024-12-03 at 17:06 +0100, Peter Eisentraut wrote:
But it doesn't remove pg_regex_collation, only pg_regex_strategy. So
I
have split my v2 into two patches, the first removes
pg_regex_collation
and the second removes pg_regex_strategy. The first patch is useful
on
its own, I think;
+1, looks committable now.
the second one will presumably be replaced by the
other patch series above.
Sounds good.
Regards,
Jeff Davis
On 03.12.24 20:19, Jeff Davis wrote:
On Tue, 2024-12-03 at 17:06 +0100, Peter Eisentraut wrote:
But it doesn't remove pg_regex_collation, only pg_regex_strategy. So
I
have split my v2 into two patches, the first removes
pg_regex_collation
and the second removes pg_regex_strategy. The first patch is useful
on
its own, I think;+1, looks committable now.
done