tiny step toward threading: reduce dependence on setlocale()
There was an unconference session at pgconf.dev related to threading
support. One of the problems identified was setlocale().
The attached series of patches make collation not depend on
setlocale(), even if the database collation uses the libc provider.
Since commit 8d9a9f034e, all supported platforms have locale_t, so we
can use strcoll_l(), etc., or uselocale() when no "_l" variant is
available.
A brief test shows that there may be a performance regression for libc
default collations. But if so, I'm not sure that's avoidable if the
goal is to take away setlocale. I'll see if removing the extra branches
mitigates it.
--
Jeff Davis
PostgreSQL Contributor Team - AWS
Attachments:
On Wed, 2024-06-05 at 17:23 -0700, Jeff Davis wrote:
A brief test shows that there may be a performance regression for
libc
default collations. But if so, I'm not sure that's avoidable if the
goal is to take away setlocale. I'll see if removing the extra
branches
mitigates it.
I redid the test and didn't see a difference, and then I ran a
standalone microbenchmark to compare strcoll() vs strcoll_l(), and
didn't see a difference there, either.
Another implementation may show a difference, but it doesn't seem to be
a problem for glibc.
I think this patch series is a nice cleanup, as well, making libc more
like the other providers and not dependent on global state.
Regards,
Jeff Davis
On Thu, 2024-06-06 at 11:37 -0700, Jeff Davis wrote:
I think this patch series is a nice cleanup, as well, making libc
more
like the other providers and not dependent on global state.
New rebased series attached with additional cleanup. Now that
pg_locale_t is never NULL, we can simplify the way the collation cache
works, eliminating ~100 lines.
--
Jeff Davis
PostgreSQL Contributor Team - AWS
Attachments:
On 15.06.24 01:35, Jeff Davis wrote:
On Thu, 2024-06-06 at 11:37 -0700, Jeff Davis wrote:
I think this patch series is a nice cleanup, as well, making libc
more
like the other providers and not dependent on global state.New rebased series attached with additional cleanup. Now that
pg_locale_t is never NULL, we can simplify the way the collation cache
works, eliminating ~100 lines.
Overall, this is great. I have wished for a simplification like this
for a long time. It is the logical continuation of relying on
locale_t stuff rather than process-global settings.
* v2-0001-Make-database-default-collation-internal-to-pg_lo.patch
+void
+pg_init_database_collation()
The function argument should be (void).
The prefix pg_ makes it sound like it's a user-facing function of some
sort. Is there a reason for it?
Maybe add a small comment before the function.
* v2-0002-Make-database-collation-pg_locale_t-always-non-NU.patch
There is quite a bit of code duplication from
pg_newlocale_from_collation(). Can we do this better?
* v2-0003-ts_locale.c-do-not-use-NULL-to-mean-the-database-.patch
The "TODO" markers should be left, because what they refer to is that
these functions just use the default collation rather than something
passed in from the expression machinery. This is not addressed by
this change. (Obviously, the comments could have been clearer about
this.)
* v2-0004-Remove-support-for-null-pg_locale_t.patch
I found a few more places that should be adjusted in a similar way.
- In src/backend/regex/regc_pg_locale.c, the whole business with
pg_regex_locale, in particular in pg_set_regex_collation().
- In src/backend/utils/adt/formatting.c, various places such as
str_tolower().
- In src/backend/utils/adt/pg_locale.c, wchar2char() and char2wchar().
(Also, for wchar2char() there is one caller that explicitly passes
NULL.)
The changes at the call sites of pg_locale_deterministic() are
unfortunate, because they kind of go into the opposite direction: They
add checks for NULL locales instead of removing them. (For a minute I
was thinking I was reading your patch backwards.) We should think of
a way to write this clearer.
Looking for example at hashtext() after 0001..0004 applied, it is
if (!lc_collate_is_c(collid))
mylocale = pg_newlocale_from_collation(collid);
if (!mylocale || pg_locale_deterministic(mylocale))
{
But then after your 0006 patch, lc_locale_is_c() internally also calls
pg_newlocale_from_collation(), so this code just becomes redundant.
Better might be if pg_locale_deterministic() itself checked if collate
is C, and then hashtext() would just need to write:
mylocale = pg_newlocale_from_collation(collid);
if (pg_locale_deterministic(mylocale))
{
The patch sequencing might be a bit tricky here. Maybe it's ok if
patch 0004 stays as is in this respect if 0006 were to fix it back.
* v2-0005-Avoid-setlocale-in-lc_collate_is_c-and-lc_ctype_i.patch
Nothing uses this, AFAICT, so why?
Also, this creates more duplication between
pg_init_database_collation() and pg_newlocale_from_collation(), as
mentioned at patch 0002.
* v2-0006-Simplify-collation-cache.patch
ok
On Wed, 2024-06-19 at 11:15 +0200, Peter Eisentraut wrote:
* v2-0001-Make-database-default-collation-internal-to-pg_lo.patch
+void
+pg_init_database_collation()The function argument should be (void).
The prefix pg_ makes it sound like it's a user-facing function of >
some
sort. Is there a reason for it?Maybe add a small comment before the function.
Agreed, done.
* v2-0002-Make-database-collation-pg_locale_t-always-non-NU.patch
There is quite a bit of code duplication from
pg_newlocale_from_collation(). Can we do this better?
I refactored it into make_libc_collator().
* v2-0003-ts_locale.c-do-not-use-NULL-to-mean-the-database-.patch
The "TODO" markers should be left, because what they refer to is
that
these functions just use the default collation rather than
something
passed in from the expression machinery. This is not addressed by
this change. (Obviously, the comments could have been clearer
about
this.)
Done.
* v2-0004-Remove-support-for-null-pg_locale_t.patch
I found a few more places that should be adjusted in a similar way.
Fixed, thank you.
The changes at the call sites of pg_locale_deterministic() are
unfortunate
Yeah, that part was a bit annoying.
But then after your 0006 patch, lc_locale_is_c() internally also >
calls
pg_newlocale_from_collation()
Not always. It still returns early for C_COLLATION_OID or
POSIX_COLLATION_OID, and that's actually required because
pg_regcomp(..., C_COLLATION_OID) is called when parsing pg_hba.conf,
before catalog access is available. I don't think that detail is
relevant in the cases you brought up, but it got in the way of some
other refactoring I was trying to do.
, so this code just becomes redundant.
Better might be if pg_locale_deterministic() itself checked if >
collate
is C, and then hashtext() would just need to write:mylocale = pg_newlocale_from_collation(collid);
if (pg_locale_deterministic(mylocale))
{The patch sequencing might be a bit tricky here. Maybe it's ok if
patch 0004 stays as is in this respect if 0006 were to fix it back.
Addressed in v3-0006.
* v2-0005-Avoid-setlocale-in-lc_collate_is_c-and-lc_ctype_i.patch
Nothing uses this, AFAICT, so why?
You're right. It was used to avoid setlocale() in lc_collate_is_c(),
but that's eliminated in the next patch anyway.
Also, in v3-0005, I had to also check for "C" or "POSIX" in
make_libc_collator(), so that it wouldn't try to actually create the
locale_t in that case.
Regards,
Jeff Davis
Attachments:
Nice refactoring!
Two small comments about CheckMyDatabase().
- Shouldn't we look at the default_locale.ctype_is_c when setting
database_ctype_is_c instead of doing a strcmp()? or maybe we should even
remove the global variable and always look at the default_locale?
- I think that the lookup of Anum_pg_database_datlocale could be done
later in the code since it is not needed when we use a libc locale. E.g.
as below.
if (dbform->datlocprovider == COLLPROVIDER_LIBC)
locale = collate;
else
{
datum = SysCacheGetAttr(DATABASEOID, tup,
Anum_pg_database_datlocale, &isnull);
if (!isnull)
locale = TextDatumGetCString(datum);
}
Also is there any reaosn you do not squash th 4th and the 6th patch?
Andreas
On Fri, 2024-07-26 at 19:38 +0200, Andreas Karlsson wrote:
Nice refactoring!
Two small comments about CheckMyDatabase().
- Shouldn't we look at the default_locale.ctype_is_c when setting
database_ctype_is_c instead of doing a strcmp()? or maybe we should
even
remove the global variable and always look at the default_locale?
database_ctype_is_c refers to the LC_CTYPE environment of the database
-- pg_database.datctype. default_locale.ctype_is_c is the ctype of the
database's default collation.
Confusing, I know, but it matters for a few things that still depend on
the LC_CTYPE, such as tsearch and maybe a few extensions. See
f413941f41.
- I think that the lookup of Anum_pg_database_datlocale could be done
later in the code since it is not needed when we use a libc locale.
E.g.
as below.
Done, thank you.
Also is there any reaosn you do not squash th 4th and the 6th patch?
Done. I had to rearrange the patch ordering a bit because prior to the
cache refactoring patch, it's unsafe to call
pg_newlocale_from_collation() without checking lc_collate_is_c() or
lc_ctype_is_c() first.
Regards,
Jeff Davis
Attachments:
On 7/26/24 10:35 PM, Jeff Davis wrote:
database_ctype_is_c refers to the LC_CTYPE environment of the database
-- pg_database.datctype. default_locale.ctype_is_c is the ctype of the
database's default collation.Confusing, I know, but it matters for a few things that still depend on
the LC_CTYPE, such as tsearch and maybe a few extensions. See
f413941f41.
Ah, right! That was thinko on my behalf.
The set of patches looks good to me now. There is further refactoring
that can be done in this area (and should be done given all calls e.g to
isalpha()) but I think this set of patches improves code readability
while moving us away from setlocale().
And even if we take a tiny performance hit here, which your tests did
not measure, I would say it is worth it both due to code clarity and due
to not relying on thread unsafe state.
I do not see these patches in the commitfest app but if they were I
would have marked them as ready for committer.
Andreas
On 27.07.24 21:03, Andreas Karlsson wrote:
On 7/26/24 10:35 PM, Jeff Davis wrote:
database_ctype_is_c refers to the LC_CTYPE environment of the database
-- pg_database.datctype. default_locale.ctype_is_c is the ctype of the
database's default collation.Confusing, I know, but it matters for a few things that still depend on
the LC_CTYPE, such as tsearch and maybe a few extensions. See
f413941f41.Ah, right! That was thinko on my behalf.
The set of patches looks good to me now. There is further refactoring
that can be done in this area (and should be done given all calls e.g to
isalpha()) but I think this set of patches improves code readability
while moving us away from setlocale().And even if we take a tiny performance hit here, which your tests did
not measure, I would say it is worth it both due to code clarity and due
to not relying on thread unsafe state.I do not see these patches in the commitfest app but if they were I
would have marked them as ready for committer.
Here: https://commitfest.postgresql.org/48/5023/
I have also re-reviewed the patches and I agree they are good to go.
On Mon, 2024-07-29 at 21:45 +0200, Peter Eisentraut wrote:
I have also re-reviewed the patches and I agree they are good to go.
I found a couple issues with the later patches:
* There was still some confusion about the default collation vs.
datcollate/datctype for callers of wchar2char() and char2wchar() (those
functions only work for libc). I introduced a new pg_locale_t structure
to represent datcollate/datctype regardless of datlocprovider to solve
this.
* Another loose end relying on setlocale(): in selfuncs.c, there's
still a call directly to strxfrm(), which depends on setlocale(). I
changed this to lookup the collation and then use pg_strxfrm(). That
should improve histogram selectivity estimates because it uses the
correct provider, rather than relying on setlocale(), right?
New series attached.
Regards,
Jeff Davis
Attachments:
On Tue, 2024-07-30 at 12:13 -0700, Jeff Davis wrote:
I found a couple issues with the later patches:
* There was still some confusion about the default collation vs.
datcollate/datctype for callers of wchar2char() and char2wchar()
(those
functions only work for libc). I introduced a new pg_locale_t
structure
to represent datcollate/datctype regardless of datlocprovider to
solve
this.
I didn't quite like the API I introduced in 0001, so I skipped 0001.
For 0002 I left char2wchar() and wchar2char() as-is, where they expect
libc and accept a NULL pg_locale_t. I committed the rest of 0002.
* Another loose end relying on setlocale(): in selfuncs.c, there's
still a call directly to strxfrm(), which depends on setlocale(). I
changed this to lookup the collation and then use pg_strxfrm(). That
should improve histogram selectivity estimates because it uses the
correct provider, rather than relying on setlocale(), right?
Committed 0003.
With these changes, collations are no longer dependent on the
environment locale (setlocale()) at all for either collation behavior
(ORDER BY) or ctype behavior (LOWER(), etc.).
Additionally, unless I missed something, nothing in the server is
dependent on LC_COLLATE at all.
There are still some things that depend on setlocale() in one way or
another:
- char2wchar() & wchar2char()
- ts_locale.c
- various places that depend on LC_CTYPE unrelated to the collation
infrastructure
- things that depend on other locale settings, like LC_NUMERIC
We can address those as part of a separate thread. I'll count this as
committed.
Regards,
Jeff Davis
On 06.08.24 23:40, Jeff Davis wrote:
With these changes, collations are no longer dependent on the
environment locale (setlocale()) at all for either collation behavior
(ORDER BY) or ctype behavior (LOWER(), etc.).Additionally, unless I missed something, nothing in the server is
dependent on LC_COLLATE at all.There are still some things that depend on setlocale() in one way or
another:- char2wchar() & wchar2char()
- ts_locale.c
- various places that depend on LC_CTYPE unrelated to the collation
infrastructure
- things that depend on other locale settings, like LC_NUMERICWe can address those as part of a separate thread. I'll count this as
committed.
I have a couple of small follow-up patches for this.
First, in like.c, SB_lower_char() now reads:
static char
SB_lower_char(unsigned char c, pg_locale_t locale, bool locale_is_c)
{
if (locale_is_c)
return pg_ascii_tolower(c);
else if (locale)
return tolower_l(c, locale->info.lt);
else
return pg_tolower(c);
}
But after this patch set, locale cannot be NULL anymore, so the third
branch is obsolete.
(Now that I look at it, pg_tolower() has some short-circuiting for ASCII
letters, so it would not handle Turkish-i correctly if that had been the
global locale. By removing the use of pg_tolower(), we fix that issue
in passing.)
Second, there are a number of functions in like.c like the above that
take separate arguments like pg_locale_t locale, bool locale_is_c.
Because pg_locale_t now contains the locale_is_c information, these can
be combined.
Attachments:
On Wed, 2024-08-07 at 22:44 +0200, Peter Eisentraut wrote:
But after this patch set, locale cannot be NULL anymore, so the third
branch is obsolete.
...
Second, there are a number of functions in like.c like the above that
take separate arguments like pg_locale_t locale, bool locale_is_c.
Because pg_locale_t now contains the locale_is_c information, these
can
be combined.
I believe these patches are correct, but the reasoning is fairly
complex:
1. Some MatchText variants are called with 0 for locale. But that's OK
because ...
2. A MatchText variant only cares about the locale if MATCH_LOWER(t) is
defined, and ...
3. Only one variant, SB_IMatchText() defines MATCH_LOWER(), and ...
4. SB_IMatchText() is called with a non-zero locale.
All of these are a bit confusing to follow because it's generated code.
#2 is particularly non-obvious, because "locale" is not even an
argument of the MATCH_LOWER(t) or GETCHAR(t) macros, it's taken
implicitly from the outer scope.
I don't think your patches cause this confusion, but is there a way you
can clarify some of this along the way?
Regards,
Jeff Davis
On 07.08.24 22:44, Peter Eisentraut wrote:
(Now that I look at it, pg_tolower() has some short-circuiting for ASCII
letters, so it would not handle Turkish-i correctly if that had been the
global locale. By removing the use of pg_tolower(), we fix that issue
in passing.)
It occurred to me that this issue also surfaces in a more prominent
place. These arguably-wrong pg_tolower() and pg_toupper() calls were
also used by the normal SQL lower() and upper() functions before commit
e9931bfb751 if you used a single byte encoding.
For example, in PG17, multi-byte encoding:
initdb --locale=tr_TR.utf8
select upper('hij'); --> HİJ
PG17, single-byte encoding:
initdb --locale=tr_TR # uses LATIN5
select upper('hij'); --> HIJ
With current master, after commit e9931bfb751, you get the first result
in both cases.
So this could break indexes across pg_upgrade in such configurations.
On 08.08.24 22:00, Jeff Davis wrote:
On Wed, 2024-08-07 at 22:44 +0200, Peter Eisentraut wrote:
But after this patch set, locale cannot be NULL anymore, so the third
branch is obsolete....
Second, there are a number of functions in like.c like the above that
take separate arguments like pg_locale_t locale, bool locale_is_c.
Because pg_locale_t now contains the locale_is_c information, these
can
be combined.I believe these patches are correct, but the reasoning is fairly
complex:1. Some MatchText variants are called with 0 for locale. But that's OK
because ...2. A MatchText variant only cares about the locale if MATCH_LOWER(t) is
defined, and ...3. Only one variant, SB_IMatchText() defines MATCH_LOWER(), and ...
4. SB_IMatchText() is called with a non-zero locale.
All of these are a bit confusing to follow because it's generated code.
#2 is particularly non-obvious, because "locale" is not even an
argument of the MATCH_LOWER(t) or GETCHAR(t) macros, it's taken
implicitly from the outer scope.I don't think your patches cause this confusion, but is there a way you
can clarify some of this along the way?
Yes, this is also my analysis. The patch in
</messages/by-id/700d2e86-bf75-4607-9cf2-f5b7802f6e88@eisentraut.org>
would replace passing 0 with an actual locale object. The changes in
GenericMatchText() could also be applied independently so that we'd
always pass in a non-zero locale value, even if it would not be used in
some cases. I need to update that patch to cover your latest changes.
I'll see if I can propose something here that looks a bit nicer.
On Thu, 2024-08-08 at 23:09 +0200, Peter Eisentraut wrote:
With current master, after commit e9931bfb751, you get the first
result
in both cases.So this could break indexes across pg_upgrade in such configurations.
Good observation. Is there any action we should take here, or just add
that to the release notes for 18?
Regards,
Jeff Davis
Jeff Davis <pgsql@j-davis.com> writes:
We can address those as part of a separate thread. I'll count this as
committed.
Coverity has a nit about this:
*** CID 1616189: Null pointer dereferences (REVERSE_INULL)
/srv/coverity/git/pgsql-git/postgresql/src/backend/utils/adt/like.c: 206 in Generic_Text_IC_like()
200 * on the pattern and text, but instead call SB_lower_char on each
201 * character. In the multi-byte case we don't have much choice :-(. Also,
202 * ICU does not support single-character case folding, so we go the long
203 * way.
204 */
205
CID 1616189: Null pointer dereferences (REVERSE_INULL)
Null-checking "locale" suggests that it may be null, but it has already been dereferenced on all paths leading to the check.
206 if (pg_database_encoding_max_length() > 1 || (locale && locale->provider == COLLPROVIDER_ICU))
207 {
208 pat = DatumGetTextPP(DirectFunctionCall1Coll(lower, collation,
209 PointerGetDatum(pat)));
210 p = VARDATA_ANY(pat);
211 plen = VARSIZE_ANY_EXHDR(pat);
I assume it would now be okay to take out "locale &&" here?
regards, tom lane
On 11.08.24 18:33, Tom Lane wrote:
Jeff Davis <pgsql@j-davis.com> writes:
We can address those as part of a separate thread. I'll count this as
committed.Coverity has a nit about this:
*** CID 1616189: Null pointer dereferences (REVERSE_INULL)
/srv/coverity/git/pgsql-git/postgresql/src/backend/utils/adt/like.c: 206 in Generic_Text_IC_like()
200 * on the pattern and text, but instead call SB_lower_char on each
201 * character. In the multi-byte case we don't have much choice :-(. Also,
202 * ICU does not support single-character case folding, so we go the long
203 * way.
204 */
205CID 1616189: Null pointer dereferences (REVERSE_INULL)
Null-checking "locale" suggests that it may be null, but it has already been dereferenced on all paths leading to the check.206 if (pg_database_encoding_max_length() > 1 || (locale && locale->provider == COLLPROVIDER_ICU))
207 {
208 pat = DatumGetTextPP(DirectFunctionCall1Coll(lower, collation,
209 PointerGetDatum(pat)));
210 p = VARDATA_ANY(pat);
211 plen = VARSIZE_ANY_EXHDR(pat);I assume it would now be okay to take out "locale &&" here?
Correct, that check is no longer necessary.
On Mon, 2024-08-12 at 09:00 +0200, Peter Eisentraut wrote:
I assume it would now be okay to take out "locale &&" here?
Correct, that check is no longer necessary.
Thank you, fixed.
Regards,
Jeff Davis
I'm wondering if after this patch series, lc_collate_is_c() and
lc_ctype_is_c() are still useful.
They used to be completely separate from pg_newlocale_from_collation(),
but now they are just mostly a thin wrapper around it. Except there is
some hardcoded handling of C_COLLATION_OID and POSIX_COLLATION_OID. Do
we care about that?
In many places, the notational and structural complexity would be
significantly improved if we changed code like
if (pg_collate_is_c(colloid))
{
...
}
else
{
pg_locale_t locale = pg_newlocale_from_collation(colloid);
if (locale->provider == ...)
{
...
}
to more like
pg_locale_t locale = pg_newlocale_from_collation(colloid);
if (locale->collate_is_c)
{
...
}
else if (locale->provider == ...)
...
}
...
However, it's not clear whether the hardcoded handling of some
collations is needed for performance parity or perhaps some
bootstrapping reasons. It would be useful to get that cleared up.
Thoughts?
On Tue, 2024-08-13 at 17:11 +0200, Peter Eisentraut wrote:
They used to be completely separate from
pg_newlocale_from_collation(),
but now they are just mostly a thin wrapper around it. Except there
is
some hardcoded handling of C_COLLATION_OID and POSIX_COLLATION_OID.
Do
we care about that?
...
However, it's not clear whether the hardcoded handling of some
collations is needed for performance parity or perhaps some
bootstrapping reasons. It would be useful to get that cleared up.
Thoughts?
There's at least one place where we expect lc_collate_is_c() to work
without catalog access at all: libpq/hba.c uses regexes with
C_COLLATION_OID.
But I don't think that's a major problem -- we can just move the
hardcoded test into pg_newlocale_from_collation() and return a
predefined struct with collate_is_c/ctype_is_c already set.
+1.
Regards,
Jeff Davis
On 8/13/24 7:56 PM, Jeff Davis wrote:
But I don't think that's a major problem -- we can just move the
hardcoded test into pg_newlocale_from_collation() and return a
predefined struct with collate_is_c/ctype_is_c already set.
I tried that out but thought it felt cleaner to do the hardcoding in
pg_set_regex_collation(). What do you think?
I have attached patches removing lc_collate_is_c() and lc_ctype_is_c().
I have not checked if there are any performance regressions when using
the C and POSIX locales but remove these special cases makes the code a
lot cleaner in my book.
I also attach some other clean up patches I did while touching this code.
0001-Remove-lc_collate_is_c.patch
Removes lc_collate_is_c().
0002-Remove-lc_ctype_is_c.patch
Removes lc_ctype_is_c() and POSIX_COLLATION_OID which is no longer
necessary.
0003-Remove-dubious-check-against-default-locale.patch
This patch removes a check against DEFAULT_COLLATION_OID which I thought
looked really dubious. Shouldn't this just be a simple check for if the
locale is deterministic? Since we know we have a valid locale that
should be enough, right?
0004-Do-not-check-both-for-collate_is_c-and-deterministic.patch
It is redundant to check both for "collation_is_c && deterministic", right?
0005-Remove-pg_collate_deterministic-and-check-field-dire.patch
Since after my patches we look a lot directly at the collation_is_c and
ctype_is_c fields I think the thin wrapper around the deterministic
field makes it seem like there is more to it so I suggest that we should
just remove it.
0006-Slightly-refactor-varstr_sortsupport-to-improve-read.patch
Small refactor to make a hard to read function a bit easier to read.
Andreas
Attachments:
On Wed, 2024-08-14 at 01:31 +0200, Andreas Karlsson wrote:
0001-Remove-lc_collate_is_c.patch
Removes lc_collate_is_c().
This could use some performance testing, as the commit message says,
otherwise it looks good.
0002-Remove-lc_ctype_is_c.patch
Removes lc_ctype_is_c() and POSIX_COLLATION_OID which is no longer
necessary.
This overlaps a bit with what Peter already proposed here:
/messages/by-id/4f562d84-87f4-44dc-8946-01d6c437936f@eisentraut.org
right?
0003-Remove-dubious-check-against-default-locale.patch
This patch removes a check against DEFAULT_COLLATION_OID which I
thought
looked really dubious. Shouldn't this just be a simple check for if
the
locale is deterministic? Since we know we have a valid locale that
should be enough, right?
Looks good to me. The code was correct in the sense that the default
collation is always deterministic, but (a) Peter is working on non-
deterministic default collations; and (b) it was a redundant check.
0004-Do-not-check-both-for-collate_is_c-and-deterministic.patch
It is redundant to check both for "collation_is_c && deterministic",
right?
+1.
0005-Remove-pg_collate_deterministic-and-check-field-dire.patch
Since after my patches we look a lot directly at the collation_is_c
and
ctype_is_c fields I think the thin wrapper around the deterministic
field makes it seem like there is more to it so I suggest that we
should
just remove it.
+1. When I added that, there was also a NULL check, but that was
removed so we might as well just read the field.
0006-Slightly-refactor-varstr_sortsupport-to-improve-read.patch
Small refactor to make a hard to read function a bit easier to read.
Looks good to me.
Regards,
Jeff Davis
On 8/15/24 12:55 AM, Jeff Davis wrote:
This overlaps a bit with what Peter already proposed here:
/messages/by-id/4f562d84-87f4-44dc-8946-01d6c437936f@eisentraut.org
right?
Maybe I am missing something but not as far as I can see. It is related
but I do not see any overlap.
0003-Remove-dubious-check-against-default-locale.patch
This patch removes a check against DEFAULT_COLLATION_OID which I
thought
looked really dubious. Shouldn't this just be a simple check for if
the
locale is deterministic? Since we know we have a valid locale that
should be enough, right?Looks good to me. The code was correct in the sense that the default
collation is always deterministic, but (a) Peter is working on non-
deterministic default collations; and (b) it was a redundant check.
Suspected so.
I have attached a set of rebased patches with an added assert. Maybe I
should start a new thread though.
Andreas
Attachments:
On Wed, 2024-08-28 at 18:43 +0200, Andreas Karlsson wrote:
On 8/15/24 12:55 AM, Jeff Davis wrote:
This overlaps a bit with what Peter already proposed here:
/messages/by-id/4f562d84-87f4-44dc-8946-01d6c437936f@eisentraut.org
right?
Maybe I am missing something but not as far as I can see. It is
related
but I do not see any overlap.
v2-0002:
Oh, I see, pattern_char_isalpha() only has one caller, and it never
passes a NULL for pg_locale_t, so my complaint doesn't affect your
patch. This is about ready, then.
0003-Remove-dubious-check-against-default-locale.patch
This patch removes a check against DEFAULT_COLLATION_OID which I
thought
looked really dubious. Shouldn't this just be a simple check for
if
the
locale is deterministic? Since we know we have a valid locale
that
should be enough, right?Looks good to me. The code was correct in the sense that the
default
collation is always deterministic, but (a) Peter is working on non-
deterministic default collations; and (b) it was a redundant check.Suspected so.
I have attached a set of rebased patches with an added assert. Maybe
I
should start a new thread though.
v2-0001:
* There was a performance regression for repeated lookups, such as a "t
< 'xyz'" predicate. I committed 12d3345c0d (to remember the last
collation lookup), which will prevent that regression.
* This patch may change the handling of collation oid 0, and I'm not
sure whether that was intentional or not. lc_collate_is_c(0) returned
false, whereas pg_newlocale_from_collation(0)->collate_is_c raised
Assert or threw an cache lookup error. I'm not sure if this is an
actual problem, but looking at the callers, they should be more
defensive if they expect a collation oid of 0 to work at all.
* The comment in lc_collate_is_c() said that it returned false with the
idea that the caller would throw a useful error, and I think that
comment has been wrong for a while. If it returns false, the caller is
expected to call pg_newlocale_from_collation(), which would Assert on a
debug build. Should we remove that Assert, too, so that it will
consistently throw a cache lookup failure?
Regards,
Jeff Davis
Committed v2-0001.
On Tue, 2024-09-03 at 22:04 -0700, Jeff Davis wrote:
* This patch may change the handling of collation oid 0, and I'm not
sure whether that was intentional or not. lc_collate_is_c(0) returned
false, whereas pg_newlocale_from_collation(0)->collate_is_c raised
Assert or threw an cache lookup error. I'm not sure if this is an
actual problem, but looking at the callers, they should be more
defensive if they expect a collation oid of 0 to work at all.
For functions that do call pg_newlocale_from_collation() when
lc_collation_is_c() returns false, the behavior is unchanged.
There are only 3 callers which don't follow that pattern:
* spg_text_inner_consistent: gets collation ID from the index, and
text is a collatable type so it will be valid
* match_pattern_prefix: same
* make_greater_string: I changed the order of the tests in
make_greater_string so that if len=0 and collation=0, it won't throw an
error. If len !=0, it goes into varstr_cmp(), which will call
pg_newlocale_from_collation().
* The comment in lc_collate_is_c() said that it returned false with
the
idea that the caller would throw a useful error, and I think that
comment has been wrong for a while. If it returns false, the caller
is
expected to call pg_newlocale_from_collation(), which would Assert on
a
debug build. Should we remove that Assert, too, so that it will
consistently throw a cache lookup failure?
I fixed this by replacing the assert with an elog(ERROR, ...), so that
it will consistently show a "cache lookup failed for collation 0"
regardless of whether it's a debug build or not. It's not expected that
the error will be encountered.
Regards,
Jeff Davis
On 08.08.24 22:00, Jeff Davis wrote:
On Wed, 2024-08-07 at 22:44 +0200, Peter Eisentraut wrote:
But after this patch set, locale cannot be NULL anymore, so the third
branch is obsolete....
Second, there are a number of functions in like.c like the above that
take separate arguments like pg_locale_t locale, bool locale_is_c.
Because pg_locale_t now contains the locale_is_c information, these
can
be combined.I believe these patches are correct, but the reasoning is fairly
complex:1. Some MatchText variants are called with 0 for locale. But that's OK
because ...2. A MatchText variant only cares about the locale if MATCH_LOWER(t) is
defined, and ...3. Only one variant, SB_IMatchText() defines MATCH_LOWER(), and ...
4. SB_IMatchText() is called with a non-zero locale.
All of these are a bit confusing to follow because it's generated code.
#2 is particularly non-obvious, because "locale" is not even an
argument of the MATCH_LOWER(t) or GETCHAR(t) macros, it's taken
implicitly from the outer scope.I don't think your patches cause this confusion, but is there a way you
can clarify some of this along the way?
In the end, I figured the best thing to do here is to add an explicit
locale argument to MATCH_LOWER() and GETCHAR() so one can actually
understand this code by reading it. New patch attached.
Attachments:
On Mon, 2024-09-09 at 15:37 +0200, Peter Eisentraut wrote:
In the end, I figured the best thing to do here is to add an explicit
locale argument to MATCH_LOWER() and GETCHAR() so one can actually
understand this code by reading it. New patch attached.
Looks good to me, thank you.
Regards,
Jeff Davis
On 9/4/24 11:45 PM, Jeff Davis wrote:
Committed v2-0001.
[...]
I fixed this by replacing the assert with an elog(ERROR, ...), so that
it will consistently show a "cache lookup failed for collation 0"
regardless of whether it's a debug build or not. It's not expected that
the error will be encountered.
Thanks!
Andreas
On 12.09.24 22:01, Jeff Davis wrote:
On Mon, 2024-09-09 at 15:37 +0200, Peter Eisentraut wrote:
In the end, I figured the best thing to do here is to add an explicit
locale argument to MATCH_LOWER() and GETCHAR() so one can actually
understand this code by reading it. New patch attached.Looks good to me, thank you.
committed, thanks
Hi Jeff,
On Thu, 25 Sept 2025 at 11:08, Jeff Davis <pgsql@j-davis.com> wrote:
Committed v2-0001.
06421b084364 commit broke possibility to call GetSharedSecurityLabel()
from ClientAuthentication_hook.
Now GetSharedSecurityLabel() calls fail with the error "cannot read
pg_class without having selected a database".
Here is gdb stack trace:
#0 ScanPgRelation (targetRelId=3456, indexOK=true,
force_non_historic=false) at relcache.c:354
#1 0x0000565eec153a1e in RelationBuildDesc (targetRelId=3456,
insertIt=<optimized out>) at relcache.c:1113
#2 0x0000565eec155a7d in RelationIdGetRelation (relationId=<optimized
out>, relationId@entry=3456) at relcache.c:2142
#3 0x0000565eebcbafb7 in relation_open (relationId=3456,
lockmode=lockmode@entry=1) at relation.c:58
#4 0x0000565eebd2ea69 in table_open (relationId=<optimized out>,
lockmode=lockmode@entry=1) at table.c:44
#5 0x0000565eec1434e4 in CatalogCacheInitializeCache
(cache=0x565f17d35d80) at catcache.c:1124
#6 0x0000565eec144610 in ConditionalCatalogCacheInitializeCache
(cache=0x565f17d35d80) at catcache.c:1084
#7 SearchCatCacheInternal (cache=0x565f17d35d80, nkeys=nkeys@entry=1,
v1=v1@entry=950, v2=v2@entry=0, v3=v3@entry=0, v4=v4@entry=0) at
catcache.c:1410
#8 0x0000565eec144f65 in SearchCatCache1 (cache=<optimized out>,
v1=v1@entry=950) at catcache.c:1360
#9 0x0000565eec158033 in SearchSysCache1 (cacheId=cacheId@entry=16,
key1=key1@entry=950) at syscache.c:228
#10 0x0000565eec0d72e8 in create_pg_locale (context=0x565f17cd0b00,
collid=950) at ../../../../src/include/postgres.h:259
#11 pg_newlocale_from_collation (collid=950) at pg_locale.c:1226
#12 0x0000565eec1333de in varstr_cmp (arg1=0x770ce64fdf61 "<redacted>",
len1=13, arg2=0x565f17ccaacc "<redacted>\252\314\027_V", len2=13,
collid=<optimized out>)
at varlena.c:1617
#13 0x0000565eec133c74 in bttextcmp (fcinfo=0x7ffd34181740) at
varlena.c:1892
#14 0x0000565eec165ab6 in FunctionCall2Coll
(flinfo=flinfo@entry=0x7ffd34182358,
collation=<optimized out>, arg1=<optimized out>, arg2=<optimized out>) at
fmgr.c:1161
#15 0x0000565eebd13bfe in _bt_compare (rel=<optimized out>,
key=0x7ffd341822a0, page=<optimized out>, offnum=<optimized out>) at
nbtsearch.c:770
#16 0x0000565eebd13eb2 in _bt_binsrch (rel=rel@entry=0x770ce1cdd0c0,
key=key@entry=0x7ffd341822a0, buf=<optimized out>) at nbtsearch.c:402
#17 0x0000565eebd14bab in _bt_first (scan=scan@entry=0x565f17ccaaf0,
dir=dir@entry=ForwardScanDirection) at nbtsearch.c:1551
#18 0x0000565eebd10e46 in btgettuple (scan=0x565f17ccaaf0,
dir=ForwardScanDirection) at nbtree.c:245
#19 0x0000565eebd063c9 in index_getnext_tid (scan=0x565f17ccaaf0,
direction=<optimized out>) at indexam.c:637
#20 0x0000565eebd0651d in index_getnext_slot (scan=0x565f17ccaaf0,
direction=direction@entry=ForwardScanDirection, slot=0x565f17cca680) at
indexam.c:729
#21 0x0000565eebd05627 in systable_getnext
(sysscan=sysscan@entry=0x565f17cca638)
at genam.c:520
#22 0x0000565eebe14c65 in GetSharedSecurityLabel (provider=0x770d6bc2a7dd
"<redacted>", object=0x7ffd34182e14) at seclabel.c:252
#23 0x0000770d6bc1fb0b in custom_check_auth (port=0x565f17b50788,
am_walsender=<optimized out>) at custom_auth.c:2414
#24 0x0000565eebea37fc in ClientAuthentication (port=port@entry=0x565f17b50788)
at auth.c:750
#25 0x0000565eec16cc95 in PerformAuthentication (port=0x565f17b50788) at
postinit.c:265
#26 InitPostgres (in_dbname=in_dbname@entry=0x565f17c9a930 "postgres",
dboid=dboid@entry=0, username=username@entry=0x565f17b51468 "<redacted>",
useroid=useroid@entry=0, flags=1, out_dbname=out_dbname@entry=0x0) at
postinit.c:923
#27 0x0000565eec01b2d0 in PostgresMain (dbname=0x565f17c9a930 "postgres",
username=0x565f17b51468 "<redacted>") at postgres.c:4289
#28 0x0000565eec0174ef in BackendMain (startup_data=<optimized out>,
startup_data_len=<optimized out>) at backend_startup.c:124
#29 0x0000565eebf80152 in postmaster_child_launch (child_type=B_BACKEND,
child_slot=1, startup_data=startup_data@entry=0x7ffd34184300,
startup_data_len=startup_data_len@entry=24,
client_sock=client_sock@entry=0x7ffd34184320) at launch_backend.c:290
#30 0x0000565eebf83ab7 in BackendStartup (client_sock=0x7ffd34184320) at
postmaster.c:3587
#31 ServerLoop () at postmaster.c:1702
#32 0x0000565eebf853e1 in PostmasterMain (argc=argc@entry=1,
argv=argv@entry=0x565f17b4f450)
at postmaster.c:1400
#33 0x0000565eebcab360 in main (argc=1, argv=0x565f17b4f450) at main.c:227
Line numbers could be a bit off, but I think it is not very important.
pg_shseclabel.provider data type is TEXT with collation C.
When doing btree lookup by (objoid, classoid, provider) we need to take
into account collation.
varstr_cmp() was used to make an exception for C collation, and now it has
to do a cache lookup for pg_collation to determine that this is a C
collation.
However, caches are not yet initialized at the moment
when ClientAuthentication_hook is executed, and they can't be, because
database access is not yet allowed.
Or, in other words, this commit made shared pg_catalog relations depend on
non shared relations.
Regards,
--
Alexander Kukushkin
On Thu, 2025-09-25 at 11:31 +0200, Alexander Kukushkin wrote:
Hi Jeff,
On Thu, 25 Sept 2025 at 11:08, Jeff Davis <pgsql@j-davis.com> wrote:
Committed v2-0001.
06421b084364 commit broke possibility to
call GetSharedSecurityLabel() from ClientAuthentication_hook.
Now GetSharedSecurityLabel() calls fail with the error "cannot read
pg_class without having selected a database".
...
pg_shseclabel.provider data type is TEXT with collation C.
When doing btree lookup by (objoid, classoid, provider) we need to
take into account collation.
varstr_cmp() was used to make an exception for C collation, and now
it has to do a cache lookup for pg_collation to determine that this
is a C collation.
However, caches are not yet initialized at the moment
when ClientAuthentication_hook is executed, and they can't be,
because database access is not yet allowed.Or, in other words, this commit made shared pg_catalog relations
depend on non shared relations.
Thank you, patch attached, intended for backport to 18.
Are you aware of any way to encounter this bug without using an
extension?
Regards,
Jeff Davis
Attachments:
Hi Jeff,
On Fri, 24 Oct 2025 at 04:37, Jeff Davis <pgsql@j-davis.com> wrote:
Thank you, patch attached, intended for backport to 18.
I didn't check it with 18 because changes in regc_pg_locale.c heavily rely
on 5a38104.
However, I think changes to pg_locale.c should be enough to fix the problem
in 18.
Are you aware of any way to encounter this bug without using an
extension?
I don't think so, but it would be nice to have a test for it anyway.
Unfortunately this test will require a shared_preload_library module.
Regards,
--
Alexander Kukushkin