tiny step toward threading: reduce dependence on setlocale()
There was an unconference session at pgconf.dev related to threading
support. One of the problems identified was setlocale().
The attached series of patches make collation not depend on
setlocale(), even if the database collation uses the libc provider.
Since commit 8d9a9f034e, all supported platforms have locale_t, so we
can use strcoll_l(), etc., or uselocale() when no "_l" variant is
available.
A brief test shows that there may be a performance regression for libc
default collations. But if so, I'm not sure that's avoidable if the
goal is to take away setlocale. I'll see if removing the extra branches
mitigates it.
--
Jeff Davis
PostgreSQL Contributor Team - AWS
Attachments:
v1-0001-Make-database-default-collation-internal-to-pg_lo.patchtext/x-patch; charset=UTF-8; name=v1-0001-Make-database-default-collation-internal-to-pg_lo.patchDownload+66-37
v1-0002-Make-database-collation-pg_locale_t-always-non-NU.patchtext/x-patch; charset=UTF-8; name=v1-0002-Make-database-collation-pg_locale_t-always-non-NU.patchDownload+52-7
v1-0003-ts_locale.c-do-not-use-NULL-to-mean-the-database-.patchtext/x-patch; charset=UTF-8; name=v1-0003-ts_locale.c-do-not-use-NULL-to-mean-the-database-.patchDownload+9-8
v1-0004-Remove-support-for-null-pg_locale_t.patchtext/x-patch; charset=UTF-8; name=v1-0004-Remove-support-for-null-pg_locale_t.patchDownload+27-45
v1-0005-Avoid-setlocale-in-lc_collate_is_c-and-lc_ctype_i.patchtext/x-patch; charset=UTF-8; name=v1-0005-Avoid-setlocale-in-lc_collate_is_c-and-lc_ctype_i.patchDownload+57-49
On Wed, 2024-06-05 at 17:23 -0700, Jeff Davis wrote:
A brief test shows that there may be a performance regression for
libc
default collations. But if so, I'm not sure that's avoidable if the
goal is to take away setlocale. I'll see if removing the extra
branches
mitigates it.
I redid the test and didn't see a difference, and then I ran a
standalone microbenchmark to compare strcoll() vs strcoll_l(), and
didn't see a difference there, either.
Another implementation may show a difference, but it doesn't seem to be
a problem for glibc.
I think this patch series is a nice cleanup, as well, making libc more
like the other providers and not dependent on global state.
Regards,
Jeff Davis
On Thu, 2024-06-06 at 11:37 -0700, Jeff Davis wrote:
I think this patch series is a nice cleanup, as well, making libc
more
like the other providers and not dependent on global state.
New rebased series attached with additional cleanup. Now that
pg_locale_t is never NULL, we can simplify the way the collation cache
works, eliminating ~100 lines.
--
Jeff Davis
PostgreSQL Contributor Team - AWS
Attachments:
v2-0001-Make-database-default-collation-internal-to-pg_lo.patchtext/x-patch; charset=UTF-8; name=v2-0001-Make-database-default-collation-internal-to-pg_lo.patchDownload+66-37
v2-0002-Make-database-collation-pg_locale_t-always-non-NU.patchtext/x-patch; charset=UTF-8; name=v2-0002-Make-database-collation-pg_locale_t-always-non-NU.patchDownload+52-7
v2-0003-ts_locale.c-do-not-use-NULL-to-mean-the-database-.patchtext/x-patch; charset=UTF-8; name=v2-0003-ts_locale.c-do-not-use-NULL-to-mean-the-database-.patchDownload+9-8
v2-0004-Remove-support-for-null-pg_locale_t.patchtext/x-patch; charset=UTF-8; name=v2-0004-Remove-support-for-null-pg_locale_t.patchDownload+27-45
v2-0005-Avoid-setlocale-in-lc_collate_is_c-and-lc_ctype_i.patchtext/x-patch; charset=UTF-8; name=v2-0005-Avoid-setlocale-in-lc_collate_is_c-and-lc_ctype_i.patchDownload+57-49
v2-0006-Simplify-collation-cache.patchtext/x-patch; charset=UTF-8; name=v2-0006-Simplify-collation-cache.patchDownload+28-135
On 15.06.24 01:35, Jeff Davis wrote:
On Thu, 2024-06-06 at 11:37 -0700, Jeff Davis wrote:
I think this patch series is a nice cleanup, as well, making libc
more
like the other providers and not dependent on global state.New rebased series attached with additional cleanup. Now that
pg_locale_t is never NULL, we can simplify the way the collation cache
works, eliminating ~100 lines.
Overall, this is great. I have wished for a simplification like this
for a long time. It is the logical continuation of relying on
locale_t stuff rather than process-global settings.
* v2-0001-Make-database-default-collation-internal-to-pg_lo.patch
+void
+pg_init_database_collation()
The function argument should be (void).
The prefix pg_ makes it sound like it's a user-facing function of some
sort. Is there a reason for it?
Maybe add a small comment before the function.
* v2-0002-Make-database-collation-pg_locale_t-always-non-NU.patch
There is quite a bit of code duplication from
pg_newlocale_from_collation(). Can we do this better?
* v2-0003-ts_locale.c-do-not-use-NULL-to-mean-the-database-.patch
The "TODO" markers should be left, because what they refer to is that
these functions just use the default collation rather than something
passed in from the expression machinery. This is not addressed by
this change. (Obviously, the comments could have been clearer about
this.)
* v2-0004-Remove-support-for-null-pg_locale_t.patch
I found a few more places that should be adjusted in a similar way.
- In src/backend/regex/regc_pg_locale.c, the whole business with
pg_regex_locale, in particular in pg_set_regex_collation().
- In src/backend/utils/adt/formatting.c, various places such as
str_tolower().
- In src/backend/utils/adt/pg_locale.c, wchar2char() and char2wchar().
(Also, for wchar2char() there is one caller that explicitly passes
NULL.)
The changes at the call sites of pg_locale_deterministic() are
unfortunate, because they kind of go into the opposite direction: They
add checks for NULL locales instead of removing them. (For a minute I
was thinking I was reading your patch backwards.) We should think of
a way to write this clearer.
Looking for example at hashtext() after 0001..0004 applied, it is
if (!lc_collate_is_c(collid))
mylocale = pg_newlocale_from_collation(collid);
if (!mylocale || pg_locale_deterministic(mylocale))
{
But then after your 0006 patch, lc_locale_is_c() internally also calls
pg_newlocale_from_collation(), so this code just becomes redundant.
Better might be if pg_locale_deterministic() itself checked if collate
is C, and then hashtext() would just need to write:
mylocale = pg_newlocale_from_collation(collid);
if (pg_locale_deterministic(mylocale))
{
The patch sequencing might be a bit tricky here. Maybe it's ok if
patch 0004 stays as is in this respect if 0006 were to fix it back.
* v2-0005-Avoid-setlocale-in-lc_collate_is_c-and-lc_ctype_i.patch
Nothing uses this, AFAICT, so why?
Also, this creates more duplication between
pg_init_database_collation() and pg_newlocale_from_collation(), as
mentioned at patch 0002.
* v2-0006-Simplify-collation-cache.patch
ok
On Wed, 2024-06-19 at 11:15 +0200, Peter Eisentraut wrote:
* v2-0001-Make-database-default-collation-internal-to-pg_lo.patch
+void
+pg_init_database_collation()The function argument should be (void).
The prefix pg_ makes it sound like it's a user-facing function of >
some
sort. Is there a reason for it?Maybe add a small comment before the function.
Agreed, done.
* v2-0002-Make-database-collation-pg_locale_t-always-non-NU.patch
There is quite a bit of code duplication from
pg_newlocale_from_collation(). Can we do this better?
I refactored it into make_libc_collator().
* v2-0003-ts_locale.c-do-not-use-NULL-to-mean-the-database-.patch
The "TODO" markers should be left, because what they refer to is
that
these functions just use the default collation rather than
something
passed in from the expression machinery. This is not addressed by
this change. (Obviously, the comments could have been clearer
about
this.)
Done.
* v2-0004-Remove-support-for-null-pg_locale_t.patch
I found a few more places that should be adjusted in a similar way.
Fixed, thank you.
The changes at the call sites of pg_locale_deterministic() are
unfortunate
Yeah, that part was a bit annoying.
But then after your 0006 patch, lc_locale_is_c() internally also >
calls
pg_newlocale_from_collation()
Not always. It still returns early for C_COLLATION_OID or
POSIX_COLLATION_OID, and that's actually required because
pg_regcomp(..., C_COLLATION_OID) is called when parsing pg_hba.conf,
before catalog access is available. I don't think that detail is
relevant in the cases you brought up, but it got in the way of some
other refactoring I was trying to do.
, so this code just becomes redundant.
Better might be if pg_locale_deterministic() itself checked if >
collate
is C, and then hashtext() would just need to write:mylocale = pg_newlocale_from_collation(collid);
if (pg_locale_deterministic(mylocale))
{The patch sequencing might be a bit tricky here. Maybe it's ok if
patch 0004 stays as is in this respect if 0006 were to fix it back.
Addressed in v3-0006.
* v2-0005-Avoid-setlocale-in-lc_collate_is_c-and-lc_ctype_i.patch
Nothing uses this, AFAICT, so why?
You're right. It was used to avoid setlocale() in lc_collate_is_c(),
but that's eliminated in the next patch anyway.
Also, in v3-0005, I had to also check for "C" or "POSIX" in
make_libc_collator(), so that it wouldn't try to actually create the
locale_t in that case.
Regards,
Jeff Davis
Attachments:
v3-0001-Make-database-default-collation-internal-to-pg_lo.patchtext/x-patch; charset=UTF-8; name=v3-0001-Make-database-default-collation-internal-to-pg_lo.patchDownload+69-37
v3-0002-Make-database-collation-pg_locale_t-always-non-NU.patchtext/x-patch; charset=UTF-8; name=v3-0002-Make-database-collation-pg_locale_t-always-non-NU.patchDownload+92-80
v3-0003-ts_locale.c-do-not-use-NULL-to-mean-the-database-.patchtext/x-patch; charset=UTF-8; name=v3-0003-ts_locale.c-do-not-use-NULL-to-mean-the-database-.patchDownload+22-8
v3-0004-Remove-support-for-null-pg_locale_t.patchtext/x-patch; charset=UTF-8; name=v3-0004-Remove-support-for-null-pg_locale_t.patchDownload+56-235
v3-0005-Simplify-collation-cache.patchtext/x-patch; charset=UTF-8; name=v3-0005-Simplify-collation-cache.patchDownload+64-157
v3-0006-Clean-up-more-checks-for-NULL-pg_locale_t.patchtext/x-patch; charset=UTF-8; name=v3-0006-Clean-up-more-checks-for-NULL-pg_locale_t.patchDownload+27-46
Nice refactoring!
Two small comments about CheckMyDatabase().
- Shouldn't we look at the default_locale.ctype_is_c when setting
database_ctype_is_c instead of doing a strcmp()? or maybe we should even
remove the global variable and always look at the default_locale?
- I think that the lookup of Anum_pg_database_datlocale could be done
later in the code since it is not needed when we use a libc locale. E.g.
as below.
if (dbform->datlocprovider == COLLPROVIDER_LIBC)
locale = collate;
else
{
datum = SysCacheGetAttr(DATABASEOID, tup,
Anum_pg_database_datlocale, &isnull);
if (!isnull)
locale = TextDatumGetCString(datum);
}
Also is there any reaosn you do not squash th 4th and the 6th patch?
Andreas
On Fri, 2024-07-26 at 19:38 +0200, Andreas Karlsson wrote:
Nice refactoring!
Two small comments about CheckMyDatabase().
- Shouldn't we look at the default_locale.ctype_is_c when setting
database_ctype_is_c instead of doing a strcmp()? or maybe we should
even
remove the global variable and always look at the default_locale?
database_ctype_is_c refers to the LC_CTYPE environment of the database
-- pg_database.datctype. default_locale.ctype_is_c is the ctype of the
database's default collation.
Confusing, I know, but it matters for a few things that still depend on
the LC_CTYPE, such as tsearch and maybe a few extensions. See
f413941f41.
- I think that the lookup of Anum_pg_database_datlocale could be done
later in the code since it is not needed when we use a libc locale.
E.g.
as below.
Done, thank you.
Also is there any reaosn you do not squash th 4th and the 6th patch?
Done. I had to rearrange the patch ordering a bit because prior to the
cache refactoring patch, it's unsafe to call
pg_newlocale_from_collation() without checking lc_collate_is_c() or
lc_ctype_is_c() first.
Regards,
Jeff Davis
Attachments:
v4-0001-Make-database-default-collation-internal-to-pg_lo.patchtext/x-patch; charset=UTF-8; name=v4-0001-Make-database-default-collation-internal-to-pg_lo.patchDownload+74-43
v4-0002-Make-database-collation-pg_locale_t-always-non-NU.patchtext/x-patch; charset=UTF-8; name=v4-0002-Make-database-collation-pg_locale_t-always-non-NU.patchDownload+110-82
v4-0003-Refactor-collation-cache.patchtext/x-patch; charset=UTF-8; name=v4-0003-Refactor-collation-cache.patchDownload+40-155
v4-0004-ts_locale.c-do-not-use-NULL-to-mean-the-database-.patchtext/x-patch; charset=UTF-8; name=v4-0004-ts_locale.c-do-not-use-NULL-to-mean-the-database-.patchDownload+22-8
v4-0005-Remove-support-for-null-pg_locale_t.patchtext/x-patch; charset=UTF-8; name=v4-0005-Remove-support-for-null-pg_locale_t.patchDownload+69-266
On 7/26/24 10:35 PM, Jeff Davis wrote:
database_ctype_is_c refers to the LC_CTYPE environment of the database
-- pg_database.datctype. default_locale.ctype_is_c is the ctype of the
database's default collation.Confusing, I know, but it matters for a few things that still depend on
the LC_CTYPE, such as tsearch and maybe a few extensions. See
f413941f41.
Ah, right! That was thinko on my behalf.
The set of patches looks good to me now. There is further refactoring
that can be done in this area (and should be done given all calls e.g to
isalpha()) but I think this set of patches improves code readability
while moving us away from setlocale().
And even if we take a tiny performance hit here, which your tests did
not measure, I would say it is worth it both due to code clarity and due
to not relying on thread unsafe state.
I do not see these patches in the commitfest app but if they were I
would have marked them as ready for committer.
Andreas
On 27.07.24 21:03, Andreas Karlsson wrote:
On 7/26/24 10:35 PM, Jeff Davis wrote:
database_ctype_is_c refers to the LC_CTYPE environment of the database
-- pg_database.datctype. default_locale.ctype_is_c is the ctype of the
database's default collation.Confusing, I know, but it matters for a few things that still depend on
the LC_CTYPE, such as tsearch and maybe a few extensions. See
f413941f41.Ah, right! That was thinko on my behalf.
The set of patches looks good to me now. There is further refactoring
that can be done in this area (and should be done given all calls e.g to
isalpha()) but I think this set of patches improves code readability
while moving us away from setlocale().And even if we take a tiny performance hit here, which your tests did
not measure, I would say it is worth it both due to code clarity and due
to not relying on thread unsafe state.I do not see these patches in the commitfest app but if they were I
would have marked them as ready for committer.
Here: https://commitfest.postgresql.org/48/5023/
I have also re-reviewed the patches and I agree they are good to go.
On Mon, 2024-07-29 at 21:45 +0200, Peter Eisentraut wrote:
I have also re-reviewed the patches and I agree they are good to go.
I found a couple issues with the later patches:
* There was still some confusion about the default collation vs.
datcollate/datctype for callers of wchar2char() and char2wchar() (those
functions only work for libc). I introduced a new pg_locale_t structure
to represent datcollate/datctype regardless of datlocprovider to solve
this.
* Another loose end relying on setlocale(): in selfuncs.c, there's
still a call directly to strxfrm(), which depends on setlocale(). I
changed this to lookup the collation and then use pg_strxfrm(). That
should improve histogram selectivity estimates because it uses the
correct provider, rather than relying on setlocale(), right?
New series attached.
Regards,
Jeff Davis
Attachments:
v6-0001-Make-datcollate-datctype-accessible-as-a-pg_local.patchtext/x-patch; charset=UTF-8; name=v6-0001-Make-datcollate-datctype-accessible-as-a-pg_local.patchDownload+74-25
v6-0002-Remove-support-for-null-pg_locale_t.patchtext/x-patch; charset=UTF-8; name=v6-0002-Remove-support-for-null-pg_locale_t.patchDownload+69-265
v6-0003-selfuncs.c-use-pg_strxfrm-instead-of-strxfrm.patchtext/x-patch; charset=UTF-8; name=v6-0003-selfuncs.c-use-pg_strxfrm-instead-of-strxfrm.patchDownload+23-10
On Tue, 2024-07-30 at 12:13 -0700, Jeff Davis wrote:
I found a couple issues with the later patches:
* There was still some confusion about the default collation vs.
datcollate/datctype for callers of wchar2char() and char2wchar()
(those
functions only work for libc). I introduced a new pg_locale_t
structure
to represent datcollate/datctype regardless of datlocprovider to
solve
this.
I didn't quite like the API I introduced in 0001, so I skipped 0001.
For 0002 I left char2wchar() and wchar2char() as-is, where they expect
libc and accept a NULL pg_locale_t. I committed the rest of 0002.
* Another loose end relying on setlocale(): in selfuncs.c, there's
still a call directly to strxfrm(), which depends on setlocale(). I
changed this to lookup the collation and then use pg_strxfrm(). That
should improve histogram selectivity estimates because it uses the
correct provider, rather than relying on setlocale(), right?
Committed 0003.
With these changes, collations are no longer dependent on the
environment locale (setlocale()) at all for either collation behavior
(ORDER BY) or ctype behavior (LOWER(), etc.).
Additionally, unless I missed something, nothing in the server is
dependent on LC_COLLATE at all.
There are still some things that depend on setlocale() in one way or
another:
- char2wchar() & wchar2char()
- ts_locale.c
- various places that depend on LC_CTYPE unrelated to the collation
infrastructure
- things that depend on other locale settings, like LC_NUMERIC
We can address those as part of a separate thread. I'll count this as
committed.
Regards,
Jeff Davis
On 06.08.24 23:40, Jeff Davis wrote:
With these changes, collations are no longer dependent on the
environment locale (setlocale()) at all for either collation behavior
(ORDER BY) or ctype behavior (LOWER(), etc.).Additionally, unless I missed something, nothing in the server is
dependent on LC_COLLATE at all.There are still some things that depend on setlocale() in one way or
another:- char2wchar() & wchar2char()
- ts_locale.c
- various places that depend on LC_CTYPE unrelated to the collation
infrastructure
- things that depend on other locale settings, like LC_NUMERICWe can address those as part of a separate thread. I'll count this as
committed.
I have a couple of small follow-up patches for this.
First, in like.c, SB_lower_char() now reads:
static char
SB_lower_char(unsigned char c, pg_locale_t locale, bool locale_is_c)
{
if (locale_is_c)
return pg_ascii_tolower(c);
else if (locale)
return tolower_l(c, locale->info.lt);
else
return pg_tolower(c);
}
But after this patch set, locale cannot be NULL anymore, so the third
branch is obsolete.
(Now that I look at it, pg_tolower() has some short-circuiting for ASCII
letters, so it would not handle Turkish-i correctly if that had been the
global locale. By removing the use of pg_tolower(), we fix that issue
in passing.)
Second, there are a number of functions in like.c like the above that
take separate arguments like pg_locale_t locale, bool locale_is_c.
Because pg_locale_t now contains the locale_is_c information, these can
be combined.
On Wed, 2024-08-07 at 22:44 +0200, Peter Eisentraut wrote:
But after this patch set, locale cannot be NULL anymore, so the third
branch is obsolete.
...
Second, there are a number of functions in like.c like the above that
take separate arguments like pg_locale_t locale, bool locale_is_c.
Because pg_locale_t now contains the locale_is_c information, these
can
be combined.
I believe these patches are correct, but the reasoning is fairly
complex:
1. Some MatchText variants are called with 0 for locale. But that's OK
because ...
2. A MatchText variant only cares about the locale if MATCH_LOWER(t) is
defined, and ...
3. Only one variant, SB_IMatchText() defines MATCH_LOWER(), and ...
4. SB_IMatchText() is called with a non-zero locale.
All of these are a bit confusing to follow because it's generated code.
#2 is particularly non-obvious, because "locale" is not even an
argument of the MATCH_LOWER(t) or GETCHAR(t) macros, it's taken
implicitly from the outer scope.
I don't think your patches cause this confusion, but is there a way you
can clarify some of this along the way?
Regards,
Jeff Davis
On 07.08.24 22:44, Peter Eisentraut wrote:
(Now that I look at it, pg_tolower() has some short-circuiting for ASCII
letters, so it would not handle Turkish-i correctly if that had been the
global locale. By removing the use of pg_tolower(), we fix that issue
in passing.)
It occurred to me that this issue also surfaces in a more prominent
place. These arguably-wrong pg_tolower() and pg_toupper() calls were
also used by the normal SQL lower() and upper() functions before commit
e9931bfb751 if you used a single byte encoding.
For example, in PG17, multi-byte encoding:
initdb --locale=tr_TR.utf8
select upper('hij'); --> HİJ
PG17, single-byte encoding:
initdb --locale=tr_TR # uses LATIN5
select upper('hij'); --> HIJ
With current master, after commit e9931bfb751, you get the first result
in both cases.
So this could break indexes across pg_upgrade in such configurations.
On 08.08.24 22:00, Jeff Davis wrote:
On Wed, 2024-08-07 at 22:44 +0200, Peter Eisentraut wrote:
But after this patch set, locale cannot be NULL anymore, so the third
branch is obsolete....
Second, there are a number of functions in like.c like the above that
take separate arguments like pg_locale_t locale, bool locale_is_c.
Because pg_locale_t now contains the locale_is_c information, these
can
be combined.I believe these patches are correct, but the reasoning is fairly
complex:1. Some MatchText variants are called with 0 for locale. But that's OK
because ...2. A MatchText variant only cares about the locale if MATCH_LOWER(t) is
defined, and ...3. Only one variant, SB_IMatchText() defines MATCH_LOWER(), and ...
4. SB_IMatchText() is called with a non-zero locale.
All of these are a bit confusing to follow because it's generated code.
#2 is particularly non-obvious, because "locale" is not even an
argument of the MATCH_LOWER(t) or GETCHAR(t) macros, it's taken
implicitly from the outer scope.I don't think your patches cause this confusion, but is there a way you
can clarify some of this along the way?
Yes, this is also my analysis. The patch in
</messages/by-id/700d2e86-bf75-4607-9cf2-f5b7802f6e88@eisentraut.org>
would replace passing 0 with an actual locale object. The changes in
GenericMatchText() could also be applied independently so that we'd
always pass in a non-zero locale value, even if it would not be used in
some cases. I need to update that patch to cover your latest changes.
I'll see if I can propose something here that looks a bit nicer.
On Thu, 2024-08-08 at 23:09 +0200, Peter Eisentraut wrote:
With current master, after commit e9931bfb751, you get the first
result
in both cases.So this could break indexes across pg_upgrade in such configurations.
Good observation. Is there any action we should take here, or just add
that to the release notes for 18?
Regards,
Jeff Davis
Jeff Davis <pgsql@j-davis.com> writes:
We can address those as part of a separate thread. I'll count this as
committed.
Coverity has a nit about this:
*** CID 1616189: Null pointer dereferences (REVERSE_INULL)
/srv/coverity/git/pgsql-git/postgresql/src/backend/utils/adt/like.c: 206 in Generic_Text_IC_like()
200 * on the pattern and text, but instead call SB_lower_char on each
201 * character. In the multi-byte case we don't have much choice :-(. Also,
202 * ICU does not support single-character case folding, so we go the long
203 * way.
204 */
205
CID 1616189: Null pointer dereferences (REVERSE_INULL)
Null-checking "locale" suggests that it may be null, but it has already been dereferenced on all paths leading to the check.
206 if (pg_database_encoding_max_length() > 1 || (locale && locale->provider == COLLPROVIDER_ICU))
207 {
208 pat = DatumGetTextPP(DirectFunctionCall1Coll(lower, collation,
209 PointerGetDatum(pat)));
210 p = VARDATA_ANY(pat);
211 plen = VARSIZE_ANY_EXHDR(pat);
I assume it would now be okay to take out "locale &&" here?
regards, tom lane
On 11.08.24 18:33, Tom Lane wrote:
Jeff Davis <pgsql@j-davis.com> writes:
We can address those as part of a separate thread. I'll count this as
committed.Coverity has a nit about this:
*** CID 1616189: Null pointer dereferences (REVERSE_INULL)
/srv/coverity/git/pgsql-git/postgresql/src/backend/utils/adt/like.c: 206 in Generic_Text_IC_like()
200 * on the pattern and text, but instead call SB_lower_char on each
201 * character. In the multi-byte case we don't have much choice :-(. Also,
202 * ICU does not support single-character case folding, so we go the long
203 * way.
204 */
205CID 1616189: Null pointer dereferences (REVERSE_INULL)
Null-checking "locale" suggests that it may be null, but it has already been dereferenced on all paths leading to the check.206 if (pg_database_encoding_max_length() > 1 || (locale && locale->provider == COLLPROVIDER_ICU))
207 {
208 pat = DatumGetTextPP(DirectFunctionCall1Coll(lower, collation,
209 PointerGetDatum(pat)));
210 p = VARDATA_ANY(pat);
211 plen = VARSIZE_ANY_EXHDR(pat);I assume it would now be okay to take out "locale &&" here?
Correct, that check is no longer necessary.
On Mon, 2024-08-12 at 09:00 +0200, Peter Eisentraut wrote:
I assume it would now be okay to take out "locale &&" here?
Correct, that check is no longer necessary.
Thank you, fixed.
Regards,
Jeff Davis
I'm wondering if after this patch series, lc_collate_is_c() and
lc_ctype_is_c() are still useful.
They used to be completely separate from pg_newlocale_from_collation(),
but now they are just mostly a thin wrapper around it. Except there is
some hardcoded handling of C_COLLATION_OID and POSIX_COLLATION_OID. Do
we care about that?
In many places, the notational and structural complexity would be
significantly improved if we changed code like
if (pg_collate_is_c(colloid))
{
...
}
else
{
pg_locale_t locale = pg_newlocale_from_collation(colloid);
if (locale->provider == ...)
{
...
}
to more like
pg_locale_t locale = pg_newlocale_from_collation(colloid);
if (locale->collate_is_c)
{
...
}
else if (locale->provider == ...)
...
}
...
However, it's not clear whether the hardcoded handling of some
collations is needed for performance parity or perhaps some
bootstrapping reasons. It would be useful to get that cleared up.
Thoughts?